6. Complex Data Types
Table of Contents
Arrays
Just as in MATLAB, or Python, or any number of other high-level languages, C provides a data structure called an array. An array in C is a set of ordered items. You can think of this as a vector, a list, etc, there are many names. Typically in C we call these an array.
Defining an Array
As an example, let's say we want to store a list of 5 grades. We can define an array as follows:
int grades[5];
This declaration says we want to set aside space in memory for 5 integer values, and we can refer to that block in memory using the variable name "grades". Note that we have only allocated the space in memory, we have not initialized any values of the array. Whatever values happened to be in memory at the locations set aside when we declared the grades variable, will still be there. We can have a look at what is there by indexing into the array like this:
#include <stdio.h> int main () { int grades[5]; int i; for (i=0; i<5; i++) { printf("grades[%d]=%d\n", i, grades[i]); } return 0; }
grades[0]=0 grades[1]=0 grades[2]=4195344 grades[3]=0 grades[4]=1247893184
Note a couple of things.
- The first index always starts with 0 (this is the same in Python, but in MATLAB for example, indices start at 1).
- The values in the array right after declaring the variable will not be initialized, they will contain whatever values happened to be in memory at those locations before. (You will probably see different values when you run your code).
Indexing Array Elements
We have seen that array indices always start at 0 (not 1 as in
MATLAB). It's also important to know that once an array of a given
size is declared (and memory is allocated), the array size is
fixed. That is, you cannot extend the array and make it larger (or
make it smaller). We will see exceptions to this later, when we talk
about dynamically allocated memory using malloc()
, but for now,
assume arrays are fixed in size once declared.
If we try to access an element of an array beyond its bounds, like for
example accessing the 6th element of the grades
array defined above,
C will not prevent us from doing that.
#include <stdio.h> int main () { int grades[5]; int i; for (i=0; i<5; i++) { printf("grades[%d]=%d\n", i, grades[i]); } printf("grades[5]=%d\n", grades[5]); printf("grades[500]=%d\n", grades[500]); return 0; }
grades[0]=0 grades[1]=0 grades[2]=4195344 grades[3]=0 grades[4]=1247893184 grades[5]=32767 grades[500]=0
In fact, we will not even get a warning from the compiler. This highlights a general difference in approach between C and other high-level languages (especially interpreted languages like Python and MATLAB) — C will do exactly what you tell it to.
To understand why asking for the 6th element of a 5-element array is
not nonsensical, we have to understand some details about how C
represents arrays in memory, and how accessing memory by indexing
arrays works in C. When you declare a variable grades
that is a
5-element array of integers, what C actually does, is two (main)
things:
- A consecutive block of memory is "set aside" (reserved for
use). The amount of memory that is set aside is equal to the number
of elements of the array multiplied by the size of the declared
element type. So if we declare
int grades[5];
, then 20 bytes (5 x 4 bytes) of consecutive memory is set aside. - The variable name
grades
is assigned to the block of memory. In fact what actually happens, is that the variable namegrades
is assigned a pointer to the address in memory corresponding to the first element of the array (the beginning of the block of consecutive memory that was set aside). Section 5.3 of the Kernighan and Ritchie book will take you through this in more detail.
Then when you index into the grades
array, for example like
grades[0]
or grades[3]
, C will look at the appropriate place in
memory, defined by the beginning of the array (the grades
pointer)
plus the appropriate number of "steps" into the memory block, as
defined by the index, and the size of the basic type as defined in the
original array declaration. So if you access the 3rd element of the
grades
array with grades[2]
, what C actually does is jumps 12
bytes (3 x 4) into the block of memory that was set aside, and reads
the value it finds there. So if you ask for the 500th element of the
array, C simply reads the memory location 500 x 4 = 2000 bytes past
the beginning of the memory block, whatever that may be.
As you can imagine, this lack of "protection" represents a prime opportunity to generate programming errors that can be difficult to debug. What's particularly dangerous however, is that this feature of C also applies to assigning values to an array.
Assigning values to array elements
I can assign the integers 1 through 5 to my grades
array like this:
#include <stdio.h> int main () { int grades[5]; int i; for (i=0; i<5; i++) { grades[i] = i+1; } for (i=0; i<5; i++) { printf("grades[%d]=%d\n", i, grades[i]); } return 0; }
grades[0]=1 grades[1]=2 grades[2]=3 grades[3]=4 grades[4]=5
What can be dangerous however, is that I can also ask C to assign values to elements beyond the bounds of my array, and C won't complain:
#include <stdio.h> int main () { int grades[5]; int i; for (i=0; i<5; i++) { grades[i] = i+1; } grades[5] = 999; grades[500] = 12345; for (i=0; i<6; i++) { printf("grades[%d]=%d\n", i, grades[i]); } printf("grades[500]=%d\n", grades[500]); return 0; }
grades[0]=1 grades[1]=2 grades[2]=3 grades[3]=4 grades[4]=5 grades[5]=999 grades[500]=12345
The reason this is dangerous, is that only 20 bytes of memory were
set aside for the array yet we are writing to locations in memory
that are outside of these boundaries. The location in memory that is
24 bytes beyond the start of the grades
array (the location accessed
by the expression grades[5]
), and the location in memory that is
2004 bytes beyond the start of the array (the location accessed by the
expression grades[500]
), have not been set aside for the grades
array, and these locations in memory may be used to store other
important things. They may correspond to some other variable in your
program, or they may be storing some information that's not even part
of your program at all, but is part of the operating system — for
example some part of the OS that is controlling the hard disk, or the
video screen, or the network, or the fans, etc. By writing your data
in memory locations that are not within the bounds of memory that has
been set aside by you, anything can happen.
One example: imagine you are writing a C program to control a robot arm (as we do in experiments in our lab). What do you think would happen if suddenly you over-wrote some random location in memory? Answer: nothing good!
Initializing Arrays
In general, we can initialize values of an array using curly brackets {}
like this:
int grades[5] = {4, 3, 2, 5, 1};
This says allocate the grades
array to hold 5 values, and assign 4
to the first value, 3
to the second value, 2
to the third value,
5
to the fourth value, and 1
to the 5th value.
We can initialize specific values (and not others) like this:
#include <stdio.h> int main () { int grades[5] = {[0]=1, [2]=3, [4]=5}; int i; for (i=0; i<5; i++) { printf("grades[%d]=%d\n", i, grades[i]); } return 0; }
grades[0]=1 grades[1]=0 grades[2]=3 grades[3]=0 grades[4]=5
This says initialize the 0th (the first) element of the array to
contain the integer 1
, set the third element (grades[2]
) to
contain the integer 3
, and set the fifth element (grades[4]
) to
contain the integer 5
. All other elements are set to zero.
Multidimensional Arrays
We have seen single-dimensional arrays. C provides for multi-dimensional arrays as well. Here is how we can define a two-dimensional array of integers, initialize values, and index into it to read off values:
#include <stdio.h> int main () { int grades[2][2] = {1,2,3,4}; int i,j; for (i=0; i<2; i++) { for (j=0; j<2; j++) { printf("grades[%d][%d]=%d\n", i, j, grades[i][j]); } } return 0; }
grades[0][0]=1 grades[0][1]=2 grades[1][0]=3 grades[1][1]=4
Here is how we would define a three-dimensional array:
#include <stdio.h> int main () { int grades[2][2][2] = {1,2,3,4,5,6,7,8}; int i,j,k; for (i=0; i<2; i++) { for (j=0; j<2; j++) { for (k=0; k<2; k++) { printf("grades[%d][%d][%d]=%d\n", i, j, k, grades[i][j][k]); } } } return 0; }
grades[0][0][0]=1 grades[0][0][1]=2 grades[0][1][0]=3 grades[0][1][1]=4 grades[1][0][0]=5 grades[1][0][1]=6 grades[1][1][0]=7 grades[1][1][1]=8
Matrix Calculations
Although you can use multidimensional arrays to represent matrices, if you are going to be doing a lot of matrix calculation in your C code, it will probably be better to make use of one of the pre-existing APIs for matrix algebra, rather than coding up this stuff yourself. Two common choices are:
- The GNU Scientific Library Vectors and Matrices
- LAPACK (and BLAS) libraries
Variable-Length Arrays
In the array examples above, we have hard-coded the size of the array. In modern C (the C99 standard and above) it is possible to declare arrays without knowing until run-time their size.
Here is how to do it:
#include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { if (argc < 2) { printf("please provide an integer argument\n"); return 1; } else { int n = atoi(argv[1]); int grades[n]; int i; for (i=0; i<n; i++) { grades[i] = i; } for (i=0; i<n; i++) { printf("grades[%d]=%d\n", i, grades[i]); } return 0; } }
$ ./go 5 grades[0]=0 grades[1]=1 grades[2]=2 grades[3]=3 grades[4]=4
The obvious benefit of allowing variable-length arrays, is that you don't have to know in advance of your program running, how much memory to allocate for your array variables. The downside of this, is that you have to guard against the possibility that your program will attempt to allocate too much memory.
Command-line arguments
The example above also illustrates how to pass command-line arguments
to your C programs. The main()
function can be defined using two
arguments, int argc
and char *argv[]
. When your program is
executed, the first argument argc
, will contain the number of
command-line arguments passed to your program, plus one. The first
argument is always the name of your program. So in the example above,
we check to see if argc<2
because we want at least one extra
argument passed to our program in addition to the automatic program
name argument.
The second argument argv
is a one-dimensional array of strings. As
we will see later, the (char *)
data type, which is actually a
pointer to a character, is typically used in C to represent strings
(as an array of characters).
In the example above, we use the atoi()
function to convert ascii
to integer, in order to convert the command-line argument (the
second one, hence argv[1]
) which is an ascii string, into an integer
n
. We then declare an array grades[n]
to be of length n
.
Here is some code that does nothing except output to the screen the number of input arguments, and their value:
#include <stdio.h> int main(int argc, char* argv[]) { int i; printf("argc = %d\n", argc); for (i=0; i<argc; i++) { printf("argv[%d] = %s\n", i, argv[i]); } return 0; }
So for example:
plg:Desktop plg$ gcc -o dryrun dryrun.c plg:Desktop plg$ ./dryrun argc = 1 argv[0] = ./dryrun plg:Desktop plg$ plg:Desktop plg$ ./dryrun hello argc = 2 argv[0] = ./dryrun argv[1] = hello plg:Desktop plg$ plg:Desktop plg$ ./dryrun 1 2 3 hi "the dude" 3.14159 argc = 7 argv[0] = ./dryrun argv[1] = 1 argv[2] = 2 argv[3] = 3 argv[4] = hi argv[5] = the dude argv[6] = 3.14159
Remember, all inputs are treated as null-terminated strings, so if you
want to pass integers or floating point values in, you will have to
convert them from strings in your code. Also remember that the first
argument is always the name of the program, so if you expect to have
to pass n
input arguments to your program, then expect argc
to be
n+1
.
Structures
Structures are another way of grouping together different elements of data, into one named variable. Unlike arrays, in which all elements have to be of the same type, in structures, you can store any combination of any data types you want. Structures can even contain other structures.
Here is what a structure definition looks like:
struct Point3D{ int x; int y; int z; };
The struct
keyword tells the compiler that what's coming next is a
structure. Here I have named the data type Point3D
and I have
defined this structure as containing three integer variables, named
x
, y
and z
.
Here is how to use our new structure in a program:
#include <stdio.h> int main () { struct Point3D { int x; int y; int z; }; struct Point3D p1; p1.x = 0; p1.y = 0; p1.z = 0; struct Point3D p2 = {.x=1, .y=2, .z=3}; printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z); return 0; }
p1 = (0,0,0) and p2 = (1,2,3)
Using typedef to shorten structure declarations
We saw before how to use typedef
to define new names for data
types. We can take advantage of this trick to shorten how we declare
structure variables. Using typedef to define our structure, we can
avoid having to use the struct
keyword every time we declare a new
variable that is our structure type:
#include <stdio.h> int main () { typedef struct { int x; int y; int z; } Point3D; Point3D p1; p1.x = 0; p1.y = 0; p1.z = 0; Point3D p2 = {.x=1, .y=2, .z=3}; printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z); return 0; }
Structures Containing Arrays
As we have seen, structures can contain any combination of any data types you wish. Here is an example of a structure that contains a couple of integers, and an array of floating-point values (doubles).
#include <stdio.h> int main () { typedef struct { int a; int b; double myVector[5]; } myStruct; myStruct s1; s1.a = 10; s1.b = 20; int i; for (i=0; i<5; i++) { s1.myVector[i] = i; } return 0; }
Arrays of Structures
We have seen how to use arrays to store numeric data types. Arrays can
also be used to store structures. Here we use an array to store 10 of
our Point3D
structure data types:
#include <stdio.h> int main () { typedef struct { int x; int y; int z; } Point3D; Point3D myPoints[10]; int i; for (i=0; i<10; i++) { myPoints[i].x = i; myPoints[i].y = i; myPoints[i].z = i; } return 0; }
Exercises
1 Below is a program that uses a
struct
to encapsulate a two dimensional matrix. The struct contains the matrix values, and the dimensions of the matrix. We assume that the matrix is filled row by row (that is, across columns). Since we haven't covered dynamic allocation of memory yet, for now we assume a matrix can hold a maximum number of values equal to 1024. We will cover dynamic allocation of memory later.Your first task is to write a function that prints a matrix to the screen. A function skeleton is provided (
printmat()
).
// gcc -Wall -o go 6_1_go.c #include <stdio.h> #define MAXDATA 1024 typedef struct { double data[MAXDATA]; int nrows; int ncols; } Matrix; void printmat(Matrix M); Matrix matrixmult(Matrix A, Matrix B); int main(int argc, char *argv[]) { Matrix A = { {1.2, 2.3, 3.4, 4.5, 5.6, 6.7}, 3, 2}; Matrix B = { {5.5, 6.6, 7.7, 1.2, 2.1, 3.3}, 2, 3}; printmat(A); printmat(B); // Matrix C = matrixmult(A, B); // printmat(C); return 0; } // your code goes below... void printmat(Matrix M) { // fill in the code here printf("so far printmat does nothing\n"); } Matrix matrixmult(Matrix A, Matrix B) { // fill in the code here printf("so far matrixmult does nothing\n"); Matrix C; return C; }
2 Your next task is to write a function that performs matrix multiplication. A function skeleton is provided (
matrixmult()
).You can check your solution against Wolfram Alpha.
Here is an example of what the output might look like of your finished program:
$ gcc -o go 6_1_go.c $ ./go [ 1.200 2.300 3.400 4.500 5.600 6.700 ] [ 5.500 6.600 7.700 1.200 2.100 3.300 ] [ 9.360 12.750 16.830 24.100 31.890 41.030 38.840 51.030 65.230 ]
- 3 We saw in this section how to use command-line arguments to your C programs. Modify the code from the Functions section (the Fibonacci function that you wrote here) so that it prints the $n$th Fibonacci number, passed through as a command line argument. Your program should be able to be run like this:
$ ./go 10 fib(10)=55 $ ./go 12 fib(12)=144 ... and so on