3. Basic Types, Operators and Expressions
Table of Contents
Variables
Like in other high-level programming languages, in C, we can assign symbolic names, known as variables, for storing information in memory. Then we can refer to those pieces of information in memory by using the symbolic variable name, instead of having to use the raw address in memory. Variables can be used to store floating-point numbers, characters, and even pointers to other locations in memory.
Variable names
There are some restrictions on the names of variables in C. Names are
made up of letters and digits, but the first character must be a
letter (not a digit). The underscore "_" counts as a letter. Also
remember in unix, uppercase and lowercase letters are distinct and so
Age
is distinct from age
.
Here is a list of reserved keywords in C that cannot be used as variable names:
_Bool |
default |
if |
sizeof |
while |
_Complex |
do |
inline |
static |
|
_Imaginary |
double |
int |
struct |
|
auto |
else |
long |
switch |
|
break |
enum |
register |
typedef |
|
case |
extern |
restrict |
union |
|
char |
float |
return |
unsigned |
|
const |
for |
short |
void |
|
continue |
goto |
signed |
volatile |
Data Types and Sizes
There are four basic data types in C, their meaning, and their size (on my MacBook Pro 15-inch, Mid 2010):
Type | Meaning | Size (bytes) | Size (bits) |
---|---|---|---|
char |
a single byte, capable of holding one character | 1 byte | 8 bits |
int |
an integer | 4 bytes | 32 bits |
float |
single-precision floating point number | 4 bytes | 32 bits |
double |
double-precision floating point number | 8 bytes | 64 bits |
Binary representation
There are also qualifiers short
, long
, signed
and unsigned
,
that can be applied to these basic types.
Qualifier | Size (bytes) | Size (bits) |
---|---|---|
short int |
2 bytes | 16 bits |
long int |
8 bytes | 64 bits |
long double |
16 bytes | 128 bits |
We have been talking about variable types and how many bytes they take up in memory. An important quantity to know about is that one byte is made up of 8 bits. One bit can take on two possible values: 0 or 1. An unsigned 8-bit variable can take on values between 0 and (2^{8})-1 = 255. A signed 8-bit variable can take on values between -127 and +127.
So when a variable is signed, it can take on negative values, and half of it's total range is spread below zero, and the other half above zero.
A signed int
can take on values between -2,147,483,648 and
+2,147,483,648. If we want to be able to represent integers larger
than +2,147,483,648 then we can either use more bits (e.g. by using a
long int
), or by forcing all 32 bits of our int
to be used on the
positive side of zero. An unsigned int
(4 bytes or 32 bits) can take
on values between 0 and 4,294,967,295.
How many bytes on your machine?
Here is a small C program that will print out the size of some basic C
types on your machine. Enter it into your source code editor, and save
it to a file called my_types.c
#include <stdio.h> int main(int argc, char *argv[]) { printf("a char is %ld bytes\n", sizeof(char)); printf("an int is %ld bytes\n", sizeof(int)); printf("an float is %ld bytes\n", sizeof(float)); printf("a double is %ld bytes\n", sizeof(double)); printf("a short int is %ld bytes\n", sizeof(short int)); printf("a long int is %ld bytes\n", sizeof(long int)); printf("a long double is %ld bytes\n", sizeof(long double)); return 0; }
Then compile and run using the following commands:
gcc -o my_types my_types.c ./my_types
and you should see something like the following output:
a char is 1 bytes an int is 4 bytes an float is 4 bytes a double is 8 bytes a short int is 2 bytes a long int is 8 bytes a long double is 16 bytes
Constants
Constants are values that that do not change after they have been defined.
Numeric Constants
An example of an int
constant is the number 1234
. An example of a
floating-point constant (by default typed as a double
) is 123.4
and 1e-2
. We can write numbers in octal or hexadecimal instead of
decimal: octal by using a leading zero (0) and hexadecimal by using a
leading zero-x (0x). Decimal 31 can be written as 037 in octal and
0x1f or 0X1F in hexadecimal. Here are some examples of defining
numeric constants:
int year = 1984; // integer constant 1984 int octalYear = 03700; // 1984 in octal int hexYear = 0x7c0; // 1984 in hexadecimal
Here is some code to show how to print integers in various
representations. Type it into your source code editor, and save it as
numerics.c
.
#include <stdio.h> int main() { printf("1984 in decimal is %d\n", 1984); printf("1984 in octal is 0%o\n", 1984); printf("1984 in hexadecimal is 0x%x\n", 1984); printf("0123 is octal for %d\n", 0123); printf("0x12f is hexadecimal for %d\n", 0x12f); return 0; }
gcc -o numerics numerics.c ./numerics 1984 in decimal is 1984 1984 in octal is 03700 1984 in hexadecimal is 0x7c0 0123 is octal for 83 0x12f is hexadecimal for 303
Character Constants
A character constant is written between single quotes, for example,
'x
'. Characters in C are represented using integer values, from
the ASCII character set. ASCII codes range between 0 and 255. The
upper-case alphabet starts at 65 (A
) and ends at 90 (Z
); the
lowercase alphabet starts at 97 (a
) and ends at 122 (z
). Other
symbols such as (
, !
, tab, carriage return, etc, are also
represented in the ASCII table. See ASCII (wikipedia) and AsciiTable
for the mapping between characters and integer ascii codes.
An important character constant to know about is the constant '\0
'
which represents the character with value zero, sometimes called the
NULL
character. We will see later when we talk about string handling
in C that '\0
' is used to terminate variable-length strings.
String Constants
String constants can be specified using a sequence of zero or more
characters enclosed within double quotes, e.g. "C is fun
". A
string constant is technically an array of characters that is
terminated by a null character '\0
' at the end. This means that the
storage required to represent a string of length n
is actually
n+1
. Thus we can store strings of arbitrary length in memory as long
as they are terminated by a null character (so we know when they
stop). We will talk about arrays later.
Enumeration Constants
An enumeration constant is a list of constant integer values, that you can assign to arbitrary labels. They provide a convenient way to associate constant values with names. For example you could store the months of the year like this:
enum months { JAN=1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC };
Now you have defined a new enumerated constant data type called
months
. Now a variable of type months
can only take on values as
defined above. You can use the symbolic names (e.g. JAN
) in place of
their integer counterparts, for example like this:
months the_month; ... if (the_month == JAN) { printf("it's January\n"); }
Why not just use strings to represent months? One reason is that in C
strings are slightly clunky to work with, especially compared to
interpreted languages like Python, R, etc. Comparing two strings in C
is not as easy as typing if (the_month == "JAN")
… it requires a
call to a function in string.h
called strcmp()
.
Another reason is that because enum
data types are represted as
integers, you can do integer operations (comparisons, arithmetic, etc)
on them… so for example you could do something clever like:
if ((the_month > APR) && (the_month < SEP)) { printf("it's summer!\"); }
Declarations
Unlike in languages like Python, R, Octave/Matlab, etc, which are dynamically typed languages, the C language is a statically typed language. From a practical point of view, this means in C, we have to declare, up front, the type of every variable we use. In languages like Python we can do crazy stuff like this:
a = 123.456 b = 50.2 c = 100.0 d = [a, b, c] print a, b, c, d
123.456 50.2 100.0 [123.456, 50.2, 100.0]
The Python interpreter will figure out what type to assign to a
,
b
, c
and d
based on evaluating the right-hand side of each
declaration. In C, we have to explicitly declare the type of each
variable like this:
#include <stdio.h> int main() { double a = 123.456; double b = 50.2; double c = 100.0; double d[] = {a, b, c}; printf("a=%.3f, b=%.3f, c=%.3f, d=[%.3f, %.3f, %.3f]\n", a, b, c, d[0], d[1], d[2]); return 0; }
a=123.456, b=50.200, c=100.000, d=[123.456, 50.200, 100.000]
We haven't talked about arrays yet but we will later in the tutorial.
Expressions
Like in any other programming language, in C, there are a number of arithmetic relational and logical operators we can use to write expressions that are made up of simpler basic types.
Arithmetic Operators
The following binary arithmetic operators can be used in C: +
, -
,
*
, /
and the modulus operator %
. When writing arithmetic
expressions we must always be aware of operator precedence, which is
the order in which operators are applied when evaluating an
expression.
For example 4+5*6
evaluates to 34
, because the *
operator has
precedence over the +
operator, and so the expression is evaluated
as 4 + (5*6)
, not (4+5)*6
. My own strategy to deal with this is
to always use brackets to explicitly denote desired precedence in
arithmetic expressions. So instead of writing:
double q = a*x*x+b*x+c;
which is a perfectly accurate expression of the quadratic equation:
\begin{equation} ax^{2} + bx + c \end{equation}I would rather code it like this:
double q = (a*x*x) + (b*x) + c;
Another illustration of operator precedence: What are the values of
the result1
, result2
and result3
variables in the following
code?
#include <stdio.h> int main() { int a=100, b=2, c=25, d=4; int result1, result2, result3; result1 = a * b + c * d; result2 = (a * b) + (c * d); result3 = a * (b + c) * d; printf("result1=%d, result2=%d, result3=%d\n", result1, result2, result3); return 0; }
Always using brackets will avoid cases where operator precedence messes up your calculations. These errors are very hard to debug.
Wikipedia provides a chart showing operator precedence.
Relational and Logical Operators
The relational operators are >
, >=
, <
and <=
, which all have
equal precedence. There are also two equality operators: ==
and
!=
.
A very common gotcha in C programming is to erroneously use the
assignment operator =
when you mean to use the equality operator
==
, for example:
if (grade = 49) grade = grade + 1; // INCORRECT !!! if (grade == 49) grade = grade + 1; // CORRECT
In line 1 above, the expression grade=49
doesn't test for the
equality of the variable grade
and the constant 49
, it assigns
the value 49
to the variable grade. What we really want is in line 2
where we use the equality operator ==
to test if grade==49
. This
bug is a tough one to spot when it happens.
There are two logical operators &&
(logical AND) and ||
(logical
OR).
By default in C, the results of relational and logical operators are
evaluated to integer values: 0
for FALSE and 1
for TRUE.
Increment and Decrement Operators
You may come across two unusual-looking operators that may be used as
a shorthand for incrementing and decrementing variables. The ++
and
--
operators add 1
and subtract 1
, respectively, from their
operands. For example in the following code snippet, we increment the
int
variable a
and we decrement the int
variable b
:
#include <stdio.h> int main(int argc, char *argv[]) { int a = 0; int b = 0; printf("a=%d, b=%d\n", a, b); a++; b--; printf("a=%d, b=%d\n", a, b); return 0; }
a=0, b=0 a=1, b=-1
A note of caution, you can also use these two operators in a different
way, by putting the operator before the operand, e.g. ++a
and
--b
. When the operand is used before the operand this is called a
prefix operator, and when it is used after the operand it is called
a postfix operator. When using ++
and --
as a prefix operator,
the increment (or decrement) happens before its value is used. As
postfix operators, the increment (or decrement) happens after its
value has been used. Here is a concrete example:
#include <stdio.h> int main(int argc, char *argv[]) { int n, x; n = 3; x = 0; printf("n=%d, x=%d\n", n, x); x = n++; printf("n=%d, x=%d\n\n", n, x); n = 3; x = 0; printf("n=%d, x=%d\n", n, x); x = ++n; printf("n=%d, x=%d\n", n, x); return 0; }
n=3, x=0 n=4, x=3 n=3, x=0 n=4, x=4
In lines 7 to 11, x
is set to 3
(the value of n
), and then n
is incremented by 1
. In lines 13 to 17, n
is incremented first and
becomes 4
, and then x
is set to the resulting value (also 4
).
If you think this is all a bit unnecessarily confusing, then you agree with me. I typically don't use these operators because of the risk of mis-using them, and so when I want to increment or decrement a value by 1, I just write it out explicitly:
int x; x = x + 1;
Type Conversions
There are two kinds of type conversion we need to talk about: automatic or implicit type conversion and explicit type conversion.
Implicit Type Conversion
The operators we have looked at can deal with different types. For
example we can apply the addition operator +
to an int
as well as
a double
. It is important to understand how operators deal with
different types that appear in the same expression. There are rules in
C that govern how operators convert different types, to evaluate the
results of expressions.
For example, when a floating-point number is assigned to an integer
value in C, the decimal portion of the number gets truncated. On the
other hand, when an integer value is assigned to a floating-point
variable, the decimal is assumed as .0
.
This sort of implicit or automatic conversion can produce nasty bugs that are difficult to find, especially for example when performing multiplication or division using mixed types, e.g. integer and floating-point values. Here is some example code illustrating some of these effects:
#include <stdio.h> int main() { int a = 2; double b = 3.5; double c = a * b; double d = a / b; int e = a * b; int f = a / b; printf("a=%d, b=%.3f, c=%.3f, d=%.3f, e=%d, f=%d\n", a, b, c, d, e, f); return 0; }
Explicit Type Conversion
Type Casting
There is a mechanism in C to perform type casting, that is to force an expression to be converted to a particular type of our choosing. We surround the desired type in brackets and place that just before the expression to be coerced. Look at the following example code:
#include <stdio.h> #include <stdio.h> int main() { int a = 2; int b = 3; printf("a / b = %.3f\n", a/b); printf("a / b = %.3f\n", (double) a/b); return 0; }
String Conversion Library Functions
There are some built-in library functions in C to perform some basic
conversions between strings and numeric types. Two useful functions to
know about convert ascii strings to numeric types: atoi()
(ascii to
integer) and atof()
(ascii to floating-point). We need to #include
the library stdlib.h
in order to use these functions.
To convert from numeric types to strings things are a bit more
difficult. First we have to allocate space in memory to store the
string. Then we use the sprintf()
built-in function to "print" the
numeric type into our string.
Here is some example code (typeConvert.c
) illustrating conversion of
strings to numerics, and vice-versa:
#include <stdio.h> #include <stdlib.h> int main() { char intString[] = "1234"; char floatString[] = "328.4"; int myInt = atoi(intString); double myDouble = atof(floatString); printf("intString=%s, floatString=%s\n", intString, floatString); printf("myInt=%d, myDouble=%.1f\n\n", myInt, myDouble); int a = 2; double b = 3.14; char myString1[64], myString2[64]; sprintf(myString1, "%d", a); sprintf(myString2, "%.2f", b); printf("a=%d, b=%.2f\n", a, b); printf("myString1=%s, myString2=%s", myString1, myString2); return 0; }
intString=1234, floatString=328.4 myInt=1234, myDouble=328.4 a=2, b=3.14 myString1=2, myString2=3.14
Defining your own type names using typedef
In C you can assign an alternate name to a data type, any name you
want. The typedef
statement allows you to do this.
For example we can use typedef to define a type called "Counter" which is an alternate name for an integer, like this:
typedef int Counter;
Now we can declare variables to be of type "Counter":
typedef int Counter; Counter i, j, k;
Typedef isn't used particularly often in most basic C code, but you may come across it in applications requiring a high degree of portability. New types may be defined for basic variables and typedef may be used in header files to tailor the program to the target machine.
One place you may see typedef used more often is to simplify the
declaration of compound types such as the struct
type (which we will
see later).
Exercises
- 1 Write a program that converts 27^{°} from degrees Fahrenheit (F) to degrees Celsius (C) using the following formula, and write the result to the screen:
2 Write a program that computes the (two) roots of the quadratic equation:
\begin{equation} a x^{2} + bx + c = 0 \end{equation}where \(a=1.2\), \(b=2.3\) and \(c=-3.4\).
You can hard-code values of \(a\), \(b\) and \(c\) and then compute and print the two solutions for \(x\), to 5 decimal places. You can use WolframAlpha to check your arithmetic.