12. Compiling, linking, Makefile, header files
Table of Contents
Splitting your program into multiple files
The programs we have seen so far have all been stored in a single source file. As your programs become larger, and as you start to deal with other people's code (e.g. other C libraries) you will have to deal with code that resides in multiple files. Indeed you may build up your own library of C functions and data structures, that you can re-use in your own scientific programming and data analysis. Here we will see how to place C functions and data structures in their own file(s) and how to incorporate them into a new program.
We saw in the section on functions, that one way of including
custom-written functions in your C code, is to simply place them in
your main source file, above the declaration of the main()
function. A better way to re-use functions that you commonly
incorporate into your C programs is to place them in their own file,
and to include a statement above main()
to include that file. When
compiled, it's just like copying and pasting the code above main()
,
but for the purpose of editing and writing your code, this allows you
to keep things in separate files. It also means that if you ever
decide to change one of those re-usable functions (for example if
you find and fix an error) that you only have to change it in one
place, and you don't have to go searching through all of your
programs and change each one.
Header files
A common convention in C programs is to write a header file (with .h suffix) for each source file (.c suffix) that you link to your main source code. The logic is that the .c source file contains all of the code and the header file contains the function prototypes, that is, just a declaration of which functions can be found in the source file.
This is done for libraries that are provided by others, sometimes only as compiled binary "blobs" (i.e. you can't look at the source code). Pairing them with plain-text header files allows you see what functions are defined, and what arguments they take (and return).
An example
Here is a program that computes the preferred direction of a neuron recorded in primary motor cortex of rhesus macaques, during a whole-arm reaching task (e.g. from Gribble & Scott, 2002). The monkey moved his hand from a central start target to one of 8 peripheral targets around the circumference of a circle. Movements to each of the 8 targets were repeated 5 times for a total of 40 movements. The order of target directions was randomized.
The data for each neuron are represented by two sets of values: first, an array of 40 values that indicate for each movement, the direction of the movement (specifically, the direction, in radians of the velocity vector at peak hand velocity), and second, another array of 40 values that indicate for each movement, the mean number of spikes per second averaged over a window starting 150ms before movement onset and ending at peak hand velocity.
For each neuron, the goal is to compute a preferred direction. That
is, the direction of movement for which the neuron fires most
enthusiastically (most spikes per second). The details of how these
computations are done are not so important to consider here, but
imagine for the moment that we have already written (or someone has
provided us with) a C function called compute_PD()
that performs
this calculation, and that the code is stored in a file called
neuron.c, with a header file neuron.h.
Here is our main program which is called go.c:
// gcc -std=c99 -o go go.c neuron.c -lm #include <stdio.h> #include <stdlib.h> #include <string.h> #include "neuron.h" int main(int argc, char *argv[]) { int ncells = 100; // # cells to process char fnum[4], fname[128]; // filename strings double celldirs[40], cellspks[40]; // data for each cell double PD[ncells], plate_out[9]; // store cell PDs // loop to process each cell int i; for (i=1; i<=ncells; i++) { // construct the numeric part of the filename if (i<10) sprintf(fnum, "00%d", i); else if (i<100) sprintf(fnum, "0%d", i); else sprintf(fnum, "%d", i); // read in a dirs data file sprintf(fname, "data/cell_dirs_%s.txt", fnum); readcell(fname, celldirs, 40, 0); // read in a spks data file sprintf(fname, "data/cell_spks_%s.txt", fnum); readcell(fname, cellspks, 40, 0); // compute PD PD[i-1] = compute_PD(celldirs, cellspks, 40); } // print vector of PDs to screen and write to a file show_double_vec(PD, ncells); write_double_vec(PD, ncells, "PDs.txt"); return 0; }
What we can see is that on line 6, we use an #include
statement to
include the neuron.h header file (shown below). This is a way of
essentially telling the compiler, that these functions (in neurons.h)
will be described fully later … and for now, "trust" that they are
defined, and in this particular way (inputs and outputs).
On line 1 (commented out) I show the gcc command to compile this
program, and you can see that we simple add neuron.c to the list of
files to send to the compiler. This is where the compiler actually
integrates the code in neuron.c into our program. We also need the
-lm
flag, to instruct the compiler to load the standard C math
library (since the #include <math.h>
directive appears in
neuron.h
).
Here is the header file neuron.h:
#include <stdlib.h> #include <stdio.h> #include <math.h> #ifndef M_PI #define M_PI 3.1415926535897932384626433832795 #endif void show_int_vec(int vec[], int n); void show_double_vec(double vec[], int n); void write_double_vec(double vec[], int n, char fname[]); void showmat(double mat[], int nrows, int ncols); void readcell(char fname[], double data[], int n, int msg); void oneplate(double r1, double r2, double a1, double a2, double output[6]); void platemethod(double a[], double r[], int n, double output[9]); int mycomp(const void *a, const void *b); double getangle(double x, double y); int mycomp_struct(const void *a, const void *b); double compute_PDr(double celldirs[], double cellspks[], int ntrials, int ndirs); double compute_PD(double celldirs[], double cellspks[], int ntrials);
What you can see is that this is simply a listing of the functions
that are defined in neuron.c, and we simply list the function
prototypes, not the "meat" of the functions themselves. When we write
the #include "neuron.h" statement at the top of go.c (line 6), these
function prototypes are loaded in, so that the code in the main
program "knows about" these functions. Note that there are some
functions in here that we do not use in the go.c
code above.
An important point to remember is that as long as we can look at the header file, we have all the information about what functions are defined in the source file, and how they are used (what their input arguments are and what output arguments, if any, they return). If we want to look "inside" those functions, we can look at the source file. The header file then can be thought of as an interface between the source code and the programmer.
To summarize then, in order to include external code in your C program (code that is located in a separate file), you need to make sure two things happen:
- The external C source code is passed to the compiler at compile time
- Your own C program has access to the function prototypes associated with the external code
Another example
Another small example: let's say you want to write a program that takes at the command-line one integer input, and determines whether that integer is a prime number or not. Now let's say that you don't want to write your own code for determining primality, so you ask your friend, who you know has written such a function already. She sends you a pair of files (primes.h and primes.c). Her header file (primes.h) looks like this:
int isPrime(int n); // returns 0 if n is not prime, 1 if n is prime
So we know a couple of things from this header file. It declares a
function prototype for isPrime()
. We can see this function takes a
single integer as an input argument, and returns an integer value: 1
if the input value is a prime number, and 0 if it is not. Now we know
all we need to know in order to use this function (without even
looking at the function's source code, which resides in primes.c
).
Here is what the primes.c file look like:
int isPrime(int n) { // returns 0 if not prime, 1 if prime if (n<2) return 0; // first prime number is 2 if (n==2) return 1; // ensure 2 is identified as a prime if ((n % 2)==0) return 0; // all even numbers above 2 are not prime int i; for (i=3; i*i < n; i++) { // test divisibility up to sqrt(n) if ((n % i) == 0) { return 0; } } return 1; }
Here is what our program go.c looks like:
/* go.c Takes one input argument from the command line, an integer, and returns 1 if the number is prime, and 0 if it is not. Compile with: gcc -o go go.c primes.c */ #include <stdio.h> #include <stdlib.h> #include "primes.h" int main(int argc, char *argv[]) { if (argc < 2) { printf("error: must provide a single integer value to test\n"); return 1; } else { int n = atoi(argv[1]); int prime = isPrime(n); printf("isPrime(%d) = %d\n", n, prime); return 0; } }
Here is the result of running the program on a small selection of input values:
plg@wildebeest:~/Desktop$ gcc -o go go.c primes.c plg@wildebeest:~/Desktop$ ./go 1 isPrime(1) = 0 plg@wildebeest:~/Desktop$ ./go 2 isPrime(2) = 1 plg@wildebeest:~/Desktop$ ./go 3 isPrime(3) = 1 plg@wildebeest:~/Desktop$ ./go 63 isPrime(63) = 0 plg@wildebeest:~/Desktop$ ./go 67 isPrime(67) = 1 plg@wildebeest:~/Desktop$ ./go 12347 isPrime(12347) = 1
Search path
The compiler will look in several places for header files that you
include with the #include
directive, depending on how you use it. If
you use include with the angled brackets (e.g. #include <stdio.h>
)
then the compiler will look in a series of "default" system-wide
locations (see Search Path for details). If you use double-quotes
(e.g. #include "neuron.h") then the compiler will look in the
directory containing the current file. It's possible to add other
directories to the search path by using the -Idir
compiler option,
where dir
is the other directory. You might have to do this if you
link your code to an external C library that is not part of the
standard C library, and does not reside in the usual "system" default
locations.
External variables
Just as we can include external code in our C programs, we can make a
declaration in our C program that a variable exists and has been
declared elsewhere (e.g. in some other source file). This is done
using the extern
keyword. See here for more details.
The GNU make
utility and Makefiles
There is a UNIX tool called make
that is commonly used to compile C
programs that are made up of several files, and (sometimes) involve
several compilation steps. There is a lot of power in the make
tool,
but what I want to introduce here is a simple use of it, which lets
you avoid having to remember a long, complicated compile command
(e.g. in line 1 of the output from the prime number program above).
The make
utility uses a special plain-text file that you write that
has to reside in the same directory as your program, and has to be
called Makefile
. You can think of a Makefile as a recipe for
making your program (i.e. linking and compiling).
A simple Makefile for our prime number program above might look like this:
go: go.c primes.c gcc -o go go.c primes.c
The first word and colon go:
on line 1 represents the name of a
recipe called go. The list of files after the colon (go.c
primes.c
) represent all of the things that go depends upon. On the
next line, there is a TAB
(not spaces) followed by a compile
command. This represents the step (there could be more lines for more
steps if there were any) that are required to "make" the "go"
recipe. Here we simply have put our compile command.
Now all we have to do from the command-line is type make
, and make
will "make" the recipe for "go". (Running make
with no arguments
executes the first rule (recipe) in the Makefile). The make
program
knows that the "go" rule needs to be executed if any of the files that
it depends upon, (because they follow the colon on line 1) changes.
Here is what it looks like when we run make
using Makefile1.txt:
plg@wildebeest:~/Desktop/CBootCamp/code/primes$ cp Makefile1.txt Makefile plg@wildebeest:~/Desktop/CBootCamp/code/primes$ make gcc -o go go.c primes.c
You can see the command (line 3) that ends up being executed by
make
.
Introducing more generalization to the Makefile
There are a number of features of a Makefile we can utilize to make the whole idea more useful. We can introduce "macros" (like a variable) to generalize the name of the C compiler to use, the flags to pass the compiler, the location of any library files, etc etc. Here is what a more generalized Makefile might look like for our primes example from above:
CC = gcc CFLAGS = -Wall DEPS = primes.h OBJ = go.o primes.o %.o: %.c $(DEPS) $(CC) $(CFLAGS) -c -o $@ $< go: $(OBJ) gcc $(CFLAGS) -o $@ $^
You can see we have moved all the specific details (filenames,
compiler flags, etc) into the macros on the top, and what remains
below in the rules themselves, is expressed only in terms of those
macros. There's nothing wrong with using a Makefile that is simple (as
in Makefile1.txt, it is a choice for you about how fancy to get. The one
limitation of Makefile1.txt is that the header file primes.h
doesn't
appear anywhere … this means if that file changes, then make
will
not think it has to recompile anything (because nothing in the rule
"go" depends on primes.h
). In Makefile2.txt, we introduce a dependency
of .o
files (on line 6) on $(DEPS)
, which is defined above on line
3, and includes the header file primes.h
.
Note that we have term in the CFLAGS
macro that looks like this:
-Wall
. This is a flag to the compiler to turn on all Warnings. There
are many warnings that the compiler will tell you about, like
variables that are never used, uninitialized variables, etc. Consult
the documentation for full details.
Here is what it looks like when we run make
using Makefile2.txt:
plg@wildebeest:~/Desktop/CBootCamp/code/primes$ cp Makefile2.txt Makefile plg@wildebeest:~/Desktop/CBootCamp/code/primes$ make gcc -c -o go.o go.c gcc -c -o primes.o primes.c gcc -o go go.o primes.o
You can see that in this case, three commands end up being run (lines 3-5).
Now the neat thing is, if we type make
again (without changing anything) we get this:
plg@wildebeest:~/Desktop/CBootCamp/code/primes$ make make: `go' is up to date.
We are told that the "go" rule is "up to date". The make
program
checks to see which files have changed since the last make, and only
executes the step in the rule(s) (if any) that need to be done (it
figures this out based on the dependencies that you set up in the
rules).
In the long run, using Makefiles is a good idea, because:
- it's faster to recompile things (less typing, and it only recompiles based on what's changed and leaves the rest)
- it organizes all the "steps" in a (potentially complex) compilation into one place (the Makefile), which makes it easier for other people to compile your code
See the links below for more details and more examples of how GNU make can be used to your advantage.