4. Complex Data Types
Table of Contents
We have seen numeric data types, characters and strings so far. We can also collect several single values together in complex data types to keep track of collections of things. These can be ordered or unordered.
Typically ordered types are things like lists, arrays, matrices, vectors, etc… in other words, collections of things where the order in the sequence matters, and in which you can access particular elements by their location in the sequence.
One can also have unordered collections of things, where there is no inherent order of things, and you can access particular elements by their name, and not their location.
1 Ordered (sequential) data types
- Python list and tuple
- NumPy array vs matrix
- MATLAB array and matrix
- R vector, matrix and data frame
- C array
2 Unordered collections of named elements
- Python dictionary
- MATLAB structure
- R list
- C struct
3 Using complex data types
Complex data types such as lists, arrays and matrices are useful in a number of situations:
- manipulating entire chunks of data as a unit, e.g. matrix algebra
- repeated operations on a set of data where reading each item from a file over and over again is slower than just loading it into memory
- plotting data
- doing statistics on data
- interactively exploring data
A typical workflow with scientific data is somethine like the following:
- load data from files into memory, in the form of variables that are complex data types (lists, arrays, matrices, data frames, etc)
- apply pre-processing steps to your data (smoothing, averaging, grouping, etc)
- interactively explore your data by plotting things, rearranging items, taking averages, doing statistics, etc
- perhaps save some intermediate representation of your data so that next time you don't have to load things up from scratch
- decide on a set of canonical analyses to apply to your data for your current project (e.g. writing a paper)
- decide on a set of canonical plots to generate for your project (e.g. your paper)
The choice of how to represent your data (i.e. what kind of data type to use) can have consequences for how easy or difficult it is to perform certain operations.
For example if your data are time series, like (x,y) hand position over time for some number of trials (movements), then probably storing each trial as a matrix would be a sensible idea (n rows by 2 columns, where each row is a time point and the two columns are x and y position).
In class we will live demo the different kinds of complex data types you will find most useful, and we will go through some exercises that are designed to get you comfortable with manipulating data.
4 Readings
- have a look through the NumPy Tutorial on arrays
- look at chapter 9 (lists) and chapter 12 (dictionaries) of HTTLACS
- Mark Daley runs an undergraduate computer science course for non-CS biologists, you can look at his notes for Class #8 (lists), Class #9 (Arrays) and Class # 10 (Tuples)
- MATLAB users can look through this tutorial on Matrix Indexing in MATLAB
- R Data Types
- Complex Data Types in C
5 Exercises
- exercises 12, 13, 14 and 15 all make use of complex data types.
6 Assignment
- assignment 2 on sorting is due no later than Friday Oct 17, 23:59:59 EDT. For your reference here is US Naval Observatory Master Clock Time.