# 6. Complex Data Types

## Arrays

Just as in MATLAB, or Python, or any number of other high-level languages, C provides a data structure called an array. An array in C is a set of ordered items. You can think of this as a vector, a list, etc, there are many names. Typically in C we call these an array.

### Defining an Array

As an example, let's say we want to store a list of 5 grades. We can define an array as follows:

```int grades[5];
```

This declaration says we want to set aside space in memory for 5 integer values, and we can refer to that block in memory using the variable name "grades". Note that we have only allocated the space in memory, we have not initialized any values of the array. Whatever values happened to be in memory at the locations set aside when we declared the grades variable, will still be there. We can have a look at what is there by indexing into the array like this:

```#include <stdio.h>

int main ()
{
int i;
for (i=0; i<5; i++) {
}
return 0;
}
```
```grades[0]=0
```

Note a couple of things.

1. The first index always starts with 0 (this is the same in Python, but in MATLAB for example, indices start at 1).
2. The values in the array right after declaring the variable will not be initialized, they will contain whatever values happened to be in memory at those locations before. (You will probably see different values when you run your code).

### Indexing Array Elements

We have seen that array indices always start at 0 (not 1 as in MATLAB). It's also important to know that once an array of a given size is declared (and memory is allocated), the array size is fixed. That is, you cannot extend the array and make it larger (or make it smaller). We will see exceptions to this later, when we talk about dynamically allocated memory using `malloc()`, but for now, assume arrays are fixed in size once declared.

If we try to access an element of an array beyond its bounds, like for example accessing the 6th element of the `grades` array defined above, C will not prevent us from doing that.

```#include <stdio.h>

int main ()
{
int i;
for (i=0; i<5; i++) {
}
return 0;
}
```
```grades[0]=0
```

In fact, we will not even get a warning from the compiler. This highlights a general difference in approach between C and other high-level languages (especially interpreted languages like Python and MATLAB) — C will do exactly what you tell it to.

To understand why asking for the 6th element of a 5-element array is not nonsensical, we have to understand some details about how C represents arrays in memory, and how accessing memory by indexing arrays works in C. When you declare a variable `grades` that is a 5-element array of integers, what C actually does, is two (main) things:

1. A consecutive block of memory is "set aside" (reserved for use). The amount of memory that is set aside is equal to the number of elements of the array multiplied by the size of the declared element type. So if we declare `int grades[5];`, then 20 bytes (5 x 4 bytes) of consecutive memory is set aside.
2. The variable name `grades` is assigned to the block of memory. In fact what actually happens, is that the variable name `grades` is assigned a pointer to the address in memory corresponding to the first element of the array (the beginning of the block of consecutive memory that was set aside). Section 5.3 of the Kernighan and Ritchie book will take you through this in more detail.

Then when you index into the `grades` array, for example like `grades[0]` or `grades[3]`, C will look at the appropriate place in memory, defined by the beginning of the array (the `grades` pointer) plus the appropriate number of "steps" into the memory block, as defined by the index, and the size of the basic type as defined in the original array declaration. So if you access the 3rd element of the `grades` array with `grades[2]`, what C actually does is jumps 12 bytes (3 x 4) into the block of memory that was set aside, and reads the value it finds there. So if you ask for the 500th element of the array, C simply reads the memory location 500 x 4 = 2000 bytes past the beginning of the memory block, whatever that may be.

As you can imagine, this lack of "protection" represents a prime opportunity to generate programming errors that can be difficult to debug. What's particularly dangerous however, is that this feature of C also applies to assigning values to an array.

### Assigning values to array elements

I can assign the integers 1 through 5 to my `grades` array like this:

```#include <stdio.h>

int main ()
{
int i;
for (i=0; i<5; i++) {
}
for (i=0; i<5; i++) {
}
return 0;
}
```
```grades[0]=1
```

What can be dangerous however, is that I can also ask C to assign values to elements beyond the bounds of my array, and C won't complain:

```#include <stdio.h>

int main ()
{
int i;
for (i=0; i<5; i++) {
}
for (i=0; i<6; i++) {
}
return 0;
}
```
```grades[0]=1
```

The reason this is dangerous, is that only 20 bytes of memory were set aside for the array yet we are writing to locations in memory that are outside of these boundaries. The location in memory that is 24 bytes beyond the start of the `grades` array (the location accessed by the expression `grades[5]`), and the location in memory that is 2004 bytes beyond the start of the array (the location accessed by the expression `grades[500]`), have not been set aside for the `grades` array, and these locations in memory may be used to store other important things. They may correspond to some other variable in your program, or they may be storing some information that's not even part of your program at all, but is part of the operating system — for example some part of the OS that is controlling the hard disk, or the video screen, or the network, or the fans, etc. By writing your data in memory locations that are not within the bounds of memory that has been set aside by you, anything can happen.

One example: imagine you are writing a C program to control a robot arm (as we do in experiments in our lab). What do you think would happen if suddenly you over-wrote some random location in memory? Answer: nothing good!

### Initializing Arrays

In general, we can initialize values of an array using curly brackets `{}` like this:

```int grades[5] = {4, 3, 2, 5, 1};
```

This says allocate the `grades` array to hold 5 values, and assign `4` to the first value, `3` to the second value, `2` to the third value, `5` to the fourth value, and `1` to the 5th value.

We can initialize specific values (and not others) like this:

```#include <stdio.h>

int main ()
{
int grades[5] = {[0]=1, [2]=3, [4]=5};
int i;
for (i=0; i<5; i++) {
}
return 0;
}
```
```grades[0]=1
```

This says initialize the 0th (the first) element of the array to contain the integer `1`, set the third element (`grades[2]`) to contain the integer `3`, and set the fifth element (`grades[4]`) to contain the integer `5`. All other elements are set to zero.

## Multidimensional Arrays

We have seen single-dimensional arrays. C provides for multi-dimensional arrays as well. Here is how we can define a two-dimensional array of integers, initialize values, and index into it to read off values:

```#include <stdio.h>

int main ()
{
int i,j;
for (i=0; i<2; i++) {
for (j=0; j<2; j++) {
}
}
return 0;
}
```
```grades[0][0]=1
```

Here is how we would define a three-dimensional array:

```#include <stdio.h>

int main ()
{
int i,j,k;
for (i=0; i<2; i++) {
for (j=0; j<2; j++) {
for (k=0; k<2; k++) {
}
}
}
return 0;
}
```
```grades[0][0][0]=1
```

### Matrix Calculations

Although you can use multidimensional arrays to represent matrices, if you are going to be doing a lot of matrix calculation in your C code, it will probably be better to make use of one of the pre-existing APIs for matrix algebra, rather than coding up this stuff yourself. Two common choices are:

1. The GNU Scientific Library Vectors and Matrices
2. LAPACK (and BLAS) libraries

## Variable-Length Arrays

In the array examples above, we have hard-coded the size of the array. In modern C (the C99 standard and above) it is possible to declare arrays without knowing until run-time their size.

Here is how to do it:

```#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
if (argc < 2) {
return 1;
}
else {
int n = atoi(argv[1]);
int i;
for (i=0; i<n; i++) {
}
for (i=0; i<n; i++) {
}
return 0;
}
}
```
```\$ ./go 5
```

The obvious benefit of allowing variable-length arrays, is that you don't have to know in advance of your program running, how much memory to allocate for your array variables. The downside of this, is that you have to guard against the possibility that your program will attempt to allocate too much memory.

## Command-line arguments

The example above also illustrates how to pass command-line arguments to your C programs. The `main()` function can be defined using two arguments, `int argc` and `char *argv[]`. When your program is executed, the first argument `argc`, will contain the number of command-line arguments passed to your program, plus one. The first argument is always the name of your program. So in the example above, we check to see if `argc<2` because we want at least one extra argument passed to our program in addition to the automatic program name argument.

The second argument `argv` is a one-dimensional array of strings. As we will see later, the `(char *)` data type, which is actually a pointer to a character, is typically used in C to represent strings (as an array of characters).

In the example above, we use the `atoi()` function to convert ascii to integer, in order to convert the command-line argument (the second one, hence `argv[1]`) which is an ascii string, into an integer `n`. We then declare an array `grades[n]` to be of length `n`.

Here is some code that does nothing except output to the screen the number of input arguments, and their value:

```#include <stdio.h>

int main(int argc, char* argv[]) {
int i;
printf("argc = %d\n", argc);
for (i=0; i<argc; i++) {
printf("argv[%d] = %s\n", i, argv[i]);
}
return 0;
}
```

So for example:

```plg:Desktop plg\$ gcc -o dryrun dryrun.c
plg:Desktop plg\$ ./dryrun
argc = 1
argv[0] = ./dryrun
plg:Desktop plg\$
plg:Desktop plg\$ ./dryrun hello
argc = 2
argv[0] = ./dryrun
argv[1] = hello
plg:Desktop plg\$
plg:Desktop plg\$ ./dryrun 1 2 3 hi "the dude" 3.14159
argc = 7
argv[0] = ./dryrun
argv[1] = 1
argv[2] = 2
argv[3] = 3
argv[4] = hi
argv[5] = the dude
argv[6] = 3.14159
```

Remember, all inputs are treated as null-terminated strings, so if you want to pass integers or floating point values in, you will have to convert them from strings in your code. Also remember that the first argument is always the name of the program, so if you expect to have to pass `n` input arguments to your program, then expect `argc` to be `n+1`.

## Structures

Structures are another way of grouping together different elements of data, into one named variable. Unlike arrays, in which all elements have to be of the same type, in structures, you can store any combination of any data types you want. Structures can even contain other structures.

Here is what a structure definition looks like:

```struct Point3D{
int x;
int y;
int z;
};
```

The `struct` keyword tells the compiler that what's coming next is a structure. Here I have named the data type `Point3D` and I have defined this structure as containing three integer variables, named `x`, `y` and `z`.

Here is how to use our new structure in a program:

```#include <stdio.h>

int main ()
{

struct Point3D {
int x;
int y;
int z;
};

struct Point3D p1;
p1.x = 0;
p1.y = 0;
p1.z = 0;

struct Point3D p2 = {.x=1, .y=2, .z=3};

printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z);

return 0;
}
```
```p1 = (0,0,0) and p2 = (1,2,3)
```

### Using typedef to shorten structure declarations

We saw before how to use `typedef` to define new names for data types. We can take advantage of this trick to shorten how we declare structure variables. Using typedef to define our structure, we can avoid having to use the `struct` keyword every time we declare a new variable that is our structure type:

```#include <stdio.h>

int main ()
{

typedef struct {
int x;
int y;
int z;
}  Point3D;

Point3D p1;
p1.x = 0;
p1.y = 0;
p1.z = 0;

Point3D p2 = {.x=1, .y=2, .z=3};

printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z);

return 0;
}
```

### Structures Containing Arrays

As we have seen, structures can contain any combination of any data types you wish. Here is an example of a structure that contains a couple of integers, and an array of floating-point values (doubles).

```#include <stdio.h>

int main ()
{

typedef struct {
int a;
int b;
double myVector[5];
}  myStruct;

myStruct s1;

s1.a = 10;
s1.b = 20;
int i;
for (i=0; i<5; i++) {
s1.myVector[i] = i;
}

return 0;
}
```

### Arrays of Structures

We have seen how to use arrays to store numeric data types. Arrays can also be used to store structures. Here we use an array to store 10 of our `Point3D` structure data types:

```#include <stdio.h>

int main ()
{

typedef struct {
int x;
int y;
int z;
}  Point3D;

Point3D myPoints[10];
int i;
for (i=0; i<10; i++) {
myPoints[i].x = i;
myPoints[i].y = i;
myPoints[i].z = i;
}

return 0;
}
```

## Exercises

• 1 Below is a program that uses a `struct` to encapsulate a two dimensional matrix. The struct contains the matrix values, and the dimensions of the matrix. We assume that the matrix is filled row by row (that is, across columns). Since we haven't covered dynamic allocation of memory yet, for now we assume a matrix can hold a maximum number of values equal to 1024. We will cover dynamic allocation of memory later.

Your first task is to write a function that prints a matrix to the screen. A function skeleton is provided (`printmat()`).

```// gcc -Wall -o go 6_1_go.c

#include <stdio.h>

#define MAXDATA 1024

typedef struct {
double data[MAXDATA];
int nrows;
int ncols;
} Matrix;

void printmat(Matrix M);
Matrix matrixmult(Matrix A, Matrix B);

int main(int argc, char *argv[])
{
Matrix A = { {1.2, 2.3,
3.4, 4.5,
5.6, 6.7},
3,
2};
Matrix B = { {5.5, 6.6, 7.7,
1.2, 2.1, 3.3},
2,
3};
printmat(A);
printmat(B);

//  Matrix C = matrixmult(A, B);
//  printmat(C);

return 0;
}

void printmat(Matrix M)
{
// fill in the code here
printf("so far printmat does nothing\n");
}

Matrix matrixmult(Matrix A, Matrix B)
{
// fill in the code here
printf("so far matrixmult does nothing\n");
Matrix C;
return C;
}
```
• 2 Your next task is to write a function that performs matrix multiplication. A function skeleton is provided (`matrixmult()`).

You can check your solution against Wolfram Alpha.

Here is an example of what the output might look like of your finished program:

```\$ gcc -o go 6_1_go.c
\$ ./go
[
1.200  2.300
3.400  4.500
5.600  6.700
]

[
5.500  6.600  7.700
1.200  2.100  3.300
]

[
9.360 12.750 16.830
24.100 31.890 41.030
38.840 51.030 65.230
]
```
• 3 We saw in this section how to use command-line arguments to your C programs. Modify the code from the Functions section (the Fibonacci function that you wrote here) so that it prints the \$n\$th Fibonacci number, passed through as a command line argument. Your program should be able to be run like this:
```\$ ./go 10
fib(10)=55

\$ ./go 12
fib(12)=144

... and so on
```

### Solutions

Paul Gribble | Summer 2012