6. Complex Data Types

Arrays
Multidimensional Arrays
- Matrix Calculations
Variable-Length Arrays
Command-line arguments
Structures
Exercises
- Solutions

Arrays

Just as in MATLAB, or Python, or any number of other high-level languages, C provides a data structure called an array. An array in C is a set of ordered items. You can think of this as a vector, a list, etc, there are many names. Typically in C we call these an array.

Defining an Array

As an example, let's say we want to store a list of 5 grades. We can define an array as follows:

int grades[5];

This declaration says we want to set aside space in memory for 5 integer values, and we can refer to that block in memory using the variable name "grades". Note that we have only allocated the space in memory, we have not initialized any values of the array. Whatever values happened to be in memory at the locations set aside when we declared the grades variable, will still be there. We can have a look at what is there by indexing into the array like this:

#include <stdio.h>

int main ()
{
  int grades[5];
  int i;
  for (i=0; i<5; i++) {
    printf("grades[%d]=%d\n", i, grades[i]);
  }
  return 0;
}

grades[0]=0
grades[1]=0
grades[2]=4195344
grades[3]=0
grades[4]=1247893184

Note a couple of things.

The first index always starts with 0 (this is the same in Python, but in MATLAB for example, indices start at 1).
The values in the array right after declaring the variable will not be initialized, they will contain whatever values happened to be in memory at those locations before. (You will probably see different values when you run your code).

Indexing Array Elements

We have seen that array indices always start at 0 (not 1 as in MATLAB). It's also important to know that once an array of a given size is declared (and memory is allocated), the array size is fixed. That is, you cannot extend the array and make it larger (or make it smaller). We will see exceptions to this later, when we talk about dynamically allocated memory using malloc(), but for now, assume arrays are fixed in size once declared.

If we try to access an element of an array beyond its bounds, like for example accessing the 6th element of the grades array defined above, C will not prevent us from doing that.

#include <stdio.h>

int main ()
{
  int grades[5];
  int i;
  for (i=0; i<5; i++) {
    printf("grades[%d]=%d\n", i, grades[i]);
  }
  printf("grades[5]=%d\n", grades[5]);
  printf("grades[500]=%d\n", grades[500]);
  return 0;
}

grades[0]=0
grades[1]=0
grades[2]=4195344
grades[3]=0
grades[4]=1247893184
grades[5]=32767
grades[500]=0

In fact, we will not even get a warning from the compiler. This highlights a general difference in approach between C and other high-level languages (especially interpreted languages like Python and MATLAB) — C will do exactly what you tell it to.

To understand why asking for the 6th element of a 5-element array is not nonsensical, we have to understand some details about how C represents arrays in memory, and how accessing memory by indexing arrays works in C. When you declare a variable grades that is a 5-element array of integers, what C actually does, is two (main) things:

A consecutive block of memory is "set aside" (reserved for use). The amount of memory that is set aside is equal to the number of elements of the array multiplied by the size of the declared element type. So if we declare int grades[5];, then 20 bytes (5 x 4 bytes) of consecutive memory is set aside.
The variable name grades is assigned to the block of memory. In fact what actually happens, is that the variable name grades is assigned a pointer to the address in memory corresponding to the first element of the array (the beginning of the block of consecutive memory that was set aside). Section 5.3 of the Kernighan and Ritchie book will take you through this in more detail.

Then when you index into the grades array, for example like grades[0] or grades[3], C will look at the appropriate place in memory, defined by the beginning of the array (the grades pointer) plus the appropriate number of "steps" into the memory block, as defined by the index, and the size of the basic type as defined in the original array declaration. So if you access the 3rd element of the grades array with grades[2], what C actually does is jumps 12 bytes (3 x 4) into the block of memory that was set aside, and reads the value it finds there. So if you ask for the 500th element of the array, C simply reads the memory location 500 x 4 = 2000 bytes past the beginning of the memory block, whatever that may be.

As you can imagine, this lack of "protection" represents a prime opportunity to generate programming errors that can be difficult to debug. What's particularly dangerous however, is that this feature of C also applies to assigning values to an array.

Assigning values to array elements

I can assign the integers 1 through 5 to my grades array like this:

#include <stdio.h>

int main ()
{
  int grades[5];
  int i;
  for (i=0; i<5; i++) {
    grades[i] = i+1;
  }
  for (i=0; i<5; i++) {
    printf("grades[%d]=%d\n", i, grades[i]);
  }
  return 0;
}

grades[0]=1
grades[1]=2
grades[2]=3
grades[3]=4
grades[4]=5

What can be dangerous however, is that I can also ask C to assign values to elements beyond the bounds of my array, and C won't complain:

#include <stdio.h>

int main ()
{
  int grades[5];
  int i;
  for (i=0; i<5; i++) {
    grades[i] = i+1;
  }
  grades[5] = 999;
  grades[500] = 12345;
  for (i=0; i<6; i++) {
    printf("grades[%d]=%d\n", i, grades[i]);
  }
  printf("grades[500]=%d\n", grades[500]);
  return 0;
}

grades[0]=1
grades[1]=2
grades[2]=3
grades[3]=4
grades[4]=5
grades[5]=999
grades[500]=12345

The reason this is dangerous, is that only 20 bytes of memory were set aside for the array yet we are writing to locations in memory that are outside of these boundaries. The location in memory that is 24 bytes beyond the start of the grades array (the location accessed by the expression grades[5]), and the location in memory that is 2004 bytes beyond the start of the array (the location accessed by the expression grades[500]), have not been set aside for the grades array, and these locations in memory may be used to store other important things. They may correspond to some other variable in your program, or they may be storing some information that's not even part of your program at all, but is part of the operating system — for example some part of the OS that is controlling the hard disk, or the video screen, or the network, or the fans, etc. By writing your data in memory locations that are not within the bounds of memory that has been set aside by you, anything can happen.

One example: imagine you are writing a C program to control a robot arm (as we do in experiments in our lab). What do you think would happen if suddenly you over-wrote some random location in memory? Answer: nothing good!

Initializing Arrays

In general, we can initialize values of an array using curly brackets {} like this:

int grades[5] = {4, 3, 2, 5, 1};

This says allocate the grades array to hold 5 values, and assign 4 to the first value, 3 to the second value, 2 to the third value, 5 to the fourth value, and 1 to the 5th value.

We can initialize specific values (and not others) like this:

#include <stdio.h>

int main ()
{
  int grades[5] = {[0]=1, [2]=3, [4]=5};
  int i;
  for (i=0; i<5; i++) {
    printf("grades[%d]=%d\n", i, grades[i]);
  }
  return 0;
}

grades[0]=1
grades[1]=0
grades[2]=3
grades[3]=0
grades[4]=5

This says initialize the 0th (the first) element of the array to contain the integer 1, set the third element (grades[2]) to contain the integer 3, and set the fifth element (grades[4]) to contain the integer 5. All other elements are set to zero.

Multidimensional Arrays

We have seen single-dimensional arrays. C provides for multi-dimensional arrays as well. Here is how we can define a two-dimensional array of integers, initialize values, and index into it to read off values:

#include <stdio.h>

int main ()
{
  int grades[2][2] = {1,2,3,4};
  int i,j;
  for (i=0; i<2; i++) {
    for (j=0; j<2; j++) {
      printf("grades[%d][%d]=%d\n", i, j, grades[i][j]);
    }
  }
  return 0;
}

grades[0][0]=1
grades[0][1]=2
grades[1][0]=3
grades[1][1]=4

Here is how we would define a three-dimensional array:

#include <stdio.h>

int main ()
{
  int grades[2][2][2] = {1,2,3,4,5,6,7,8};
  int i,j,k;
  for (i=0; i<2; i++) {
    for (j=0; j<2; j++) {
      for (k=0; k<2; k++) {
        printf("grades[%d][%d][%d]=%d\n", i, j, k, grades[i][j][k]);
      }
    }
  }
  return 0;
}

grades[0][0][0]=1
grades[0][0][1]=2
grades[0][1][0]=3
grades[0][1][1]=4
grades[1][0][0]=5
grades[1][0][1]=6
grades[1][1][0]=7
grades[1][1][1]=8

Matrix Calculations

Although you can use multidimensional arrays to represent matrices, if you are going to be doing a lot of matrix calculation in your C code, it will probably be better to make use of one of the pre-existing APIs for matrix algebra, rather than coding up this stuff yourself. Two common choices are:

The GNU Scientific Library Vectors and Matrices
LAPACK (and BLAS) libraries

Variable-Length Arrays

In the array examples above, we have hard-coded the size of the array. In modern C (the C99 standard and above) it is possible to declare arrays without knowing until run-time their size.

Here is how to do it:

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
  if (argc < 2) {
    printf("please provide an integer argument\n");
    return 1;
  }
  else {
    int n = atoi(argv[1]);
    int grades[n];
    int i;
    for (i=0; i<n; i++) {
      grades[i] = i;
    }
    for (i=0; i<n; i++) {
      printf("grades[%d]=%d\n", i, grades[i]);
    }
    return 0;
  }
}

$ ./go 5
grades[0]=0
grades[1]=1
grades[2]=2
grades[3]=3
grades[4]=4

The obvious benefit of allowing variable-length arrays, is that you don't have to know in advance of your program running, how much memory to allocate for your array variables. The downside of this, is that you have to guard against the possibility that your program will attempt to allocate too much memory.

Command-line arguments

The example above also illustrates how to pass command-line arguments to your C programs. The main() function can be defined using two arguments, int argc and char *argv[]. When your program is executed, the first argument argc, will contain the number of command-line arguments passed to your program, plus one. The first argument is always the name of your program. So in the example above, we check to see if argc<2 because we want at least one extra argument passed to our program in addition to the automatic program name argument.

The second argument argv is a one-dimensional array of strings. As we will see later, the (char *) data type, which is actually a pointer to a character, is typically used in C to represent strings (as an array of characters).

In the example above, we use the atoi() function to convert ascii to integer, in order to convert the command-line argument (the second one, hence argv[1]) which is an ascii string, into an integer n. We then declare an array grades[n] to be of length n.

Here is some code that does nothing except output to the screen the number of input arguments, and their value:

#include <stdio.h>

int main(int argc, char* argv[]) {
  int i;
  printf("argc = %d\n", argc);
  for (i=0; i<argc; i++) {
    printf("argv[%d] = %s\n", i, argv[i]);
  }
  return 0;
}

So for example:

plg:Desktop plg$ gcc -o dryrun dryrun.c
plg:Desktop plg$ ./dryrun
argc = 1
argv[0] = ./dryrun
plg:Desktop plg$ 
plg:Desktop plg$ ./dryrun hello
argc = 2
argv[0] = ./dryrun
argv[1] = hello
plg:Desktop plg$ 
plg:Desktop plg$ ./dryrun 1 2 3 hi "the dude" 3.14159
argc = 7
argv[0] = ./dryrun
argv[1] = 1
argv[2] = 2
argv[3] = 3
argv[4] = hi
argv[5] = the dude
argv[6] = 3.14159

Remember, all inputs are treated as null-terminated strings, so if you want to pass integers or floating point values in, you will have to convert them from strings in your code. Also remember that the first argument is always the name of the program, so if you expect to have to pass n input arguments to your program, then expect argc to be n+1.

Structures

Structures are another way of grouping together different elements of data, into one named variable. Unlike arrays, in which all elements have to be of the same type, in structures, you can store any combination of any data types you want. Structures can even contain other structures.

Here is what a structure definition looks like:

struct Point3D{
  int x;
  int y;
  int z;
};

The struct keyword tells the compiler that what's coming next is a structure. Here I have named the data type Point3D and I have defined this structure as containing three integer variables, named x, y and z.

Here is how to use our new structure in a program:

#include <stdio.h>

int main ()
{

  struct Point3D {
    int x;
    int y;
    int z;
  };

  struct Point3D p1;
  p1.x = 0;
  p1.y = 0;
  p1.z = 0;

  struct Point3D p2 = {.x=1, .y=2, .z=3};

  printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z);

  return 0;
}

p1 = (0,0,0) and p2 = (1,2,3)

Using typedef to shorten structure declarations

We saw before how to use typedef to define new names for data types. We can take advantage of this trick to shorten how we declare structure variables. Using typedef to define our structure, we can avoid having to use the struct keyword every time we declare a new variable that is our structure type:

#include <stdio.h>

int main ()
{

typedef struct {
    int x;
    int y;
    int z;
  }  Point3D;

  Point3D p1;
  p1.x = 0;
  p1.y = 0;
  p1.z = 0;

  Point3D p2 = {.x=1, .y=2, .z=3};

  printf("p1 = (%d,%d,%d) and p2 = (%d,%d,%d)\n", p1.x, p1.y, p1.z, p2.x, p2.y, p2.z);

  return 0;
}

Structures Containing Arrays

As we have seen, structures can contain any combination of any data types you wish. Here is an example of a structure that contains a couple of integers, and an array of floating-point values (doubles).

#include <stdio.h>

int main ()
{

typedef struct {
    int a;
    int b;
    double myVector[5];
  }  myStruct;

 myStruct s1;

 s1.a = 10;
 s1.b = 20;
 int i;
 for (i=0; i<5; i++) {
   s1.myVector[i] = i;
 }

 return 0;
}

Arrays of Structures

We have seen how to use arrays to store numeric data types. Arrays can also be used to store structures. Here we use an array to store 10 of our Point3D structure data types:

#include <stdio.h>

int main ()
{

typedef struct {
    int x;
    int y;
    int z;
  }  Point3D;

  Point3D myPoints[10];
  int i;
  for (i=0; i<10; i++) {
    myPoints[i].x = i;
    myPoints[i].y = i;
    myPoints[i].z = i;
  }

  return 0;
}

Exercises

1 Below is a program that uses a struct to encapsulate a two dimensional matrix. The struct contains the matrix values, and the dimensions of the matrix. We assume that the matrix is filled row by row (that is, across columns). Since we haven't covered dynamic allocation of memory yet, for now we assume a matrix can hold a maximum number of values equal to 1024. We will cover dynamic allocation of memory later.

Your first task is to write a function that prints a matrix to the screen. A function skeleton is provided (printmat()).

// gcc -Wall -o go 6_1_go.c

#include <stdio.h>

#define MAXDATA 1024

typedef struct {
  double data[MAXDATA];
  int nrows;
  int ncols;
} Matrix;

void printmat(Matrix M);
Matrix matrixmult(Matrix A, Matrix B);

int main(int argc, char *argv[])
{
  Matrix A = { {1.2, 2.3,
                3.4, 4.5,
                5.6, 6.7},
               3,
               2};
  Matrix B = { {5.5, 6.6, 7.7,
                1.2, 2.1, 3.3},
               2,
               3}; 
  printmat(A);
  printmat(B);

  //  Matrix C = matrixmult(A, B);
  //  printmat(C);

  return 0;
}

// your code goes below...

void printmat(Matrix M)
{
  // fill in the code here
  printf("so far printmat does nothing\n");
}

Matrix matrixmult(Matrix A, Matrix B)
{
  // fill in the code here
  printf("so far matrixmult does nothing\n");
  Matrix C;
  return C;
}

2 Your next task is to write a function that performs matrix multiplication. A function skeleton is provided (matrixmult()).

You can check your solution against Wolfram Alpha.

Here is an example of what the output might look like of your finished program:

$ gcc -o go 6_1_go.c
$ ./go
[
 1.200  2.300 
 3.400  4.500 
 5.600  6.700 
]

[
 5.500  6.600  7.700 
 1.200  2.100  3.300 
]

[
 9.360 12.750 16.830 
24.100 31.890 41.030 
38.840 51.030 65.230 
]

3 We saw in this section how to use command-line arguments to your C programs. Modify the code from the Functions section (the Fibonacci function that you wrote here) so that it prints the $n$th Fibonacci number, passed through as a command line argument. Your program should be able to be run like this:

$ ./go 10
fib(10)=55

$ ./go 12
fib(12)=144

... and so on