1. Introduction
Table of Contents
Our main plan for the first class will be to talk about the course content, and to get everyone set up on their laptop with an appropriate programming environment.
In our second class we will start in on the first batch of programming exercises which will illustrate getting values from the user, printing strings, doing arithmetic, and we will be introduced to loops.
Organization of the course
Typically we will start each topic with a short presentation by myself on the main concepts. We will spend the rest of the time (and in some cases the next class) working on programming exercises, hands-on.
Setting up your computing environment
I will be using unix-style tools, sometimes on my Mac and sometimes on a linux machine (or in a linux virtual machine). If you have a Mac and are running a relatively recent version of Mac OSX then you can achieve essentially the same functionality. We will talk about this in class. If you have a Windows laptop then your choices may be more limited.
If you want to use linux, you have essentially four choices about how you can set things up, assuming you are not already set up to use linux. In order of most committed to least committed:
- Wipe your existing operating system and install linux. There are several flavours (see the resources page)
- Repartition your hard drive and set up a dual-boot system. You can find instructions on the web, e.g. for Ubuntu, Windows Dual Boot and Dual Boot Mac OSX. Instructions for other linux flavours can be found easily on the web.
- Boot your laptop from an external drive that has a linux boot partition set up. Again, find instructions on the web for your favourite linux flavour.
- Run a linux virtual appliance in a virtualization application such as VirtualBox. Note that virtual appliances will take up space on your internal hard drive. If you need to, I suggest putting them on an external drive.
If you decide you don't want to use linux, but you do want to use Python + SciPy and all of the other python-based scientific packages, then I suggest installing the Anaconda distribution from Continuum Analytics. It's free and it provides an easy all-in-one Python + SciPy system.
If you have decided that you're using MATLAB, then it should not be a big deal to get it installed on your machine, whether you're running Mac, Windows or even linux.
Programming Languages
There are many programming languages that are used today for scientific computing, and it's likely that you will be exposed to code in many languages over the course of your career as a graduate student in the sciences.
Unless you decide that you would rather code everything you ever need on your own, from scratch, in one language, it's likely that you will be using programs, code snippets and libraries written by others, to accomplish the tasks you need to complete to do your research and data analysis. Life will be better for you if you're somewhat proficient, or at least familiar, with several programming languages.
In this course we will learn some basic programming paradigms and patterns that are not language specific. We will sometimes be looking at example code in different languages. This has the risk of being confusing, as different languages have different syntax, and the differences are sometimes subtle. The benefit however is that you will see that with few exceptions, most languages are essentially the same, at least in broad strokes, and differ only in their syntax, and in the names for various functions. There are some low-level differences between interpreted and compiled languages that we will talk about later, that make a big difference in terms of the speed that your code runs.
The languages we will be seeing in this course are:
- MATLAB: commerical, engineering-oriented numerical programming environment
- GNU Octave: free, open-source clone of Matlab
- Python: free, open-source high level programming language
- SciPy: a Python-based ecosystem for math, science & engineering,
that includes:
- NumPy: library for numerical computing with Python
- matplotlib: 2D plotting library for Python
- iPython: improved interface for interactive computing with Python
- SymPy: Python library for symbolic mathematics
- pandas: data structures and data analysis tools for Python
- R: free, open-source language for statistical computing & graphics
- C: a high level, general purpose compiled language
Other languages you may encounter in your travels as a graduate student and as a scientist are:
- C++: basically an object-oriented (OOP) version of C
- Objective C: another OOP version of C, used primarly by Apple to write Mac OSX and iOS applications
- Java: a general purpose OOP language
- JavaScript: like Java for running inside web browsers
- Ruby: another general purpose OOP language
- Fortran: a general purpose compiled language aimed at numerical and scientific computing
- Mathematica: a commercial environment for mathematical computation and visualization (popular with pure math types, not so much engineers)
- Sage: a free, open-source math system sort of aimed at being a Mathematica replacement
There are also some new languages that are gaining traction:
- julia: a high-level dynamic language aimed at numerical and technical computing. The love child of Python and C
- go: high-level OOP compiled language designed at Google, sort of a Googly version of C
- Swift: a new programming language from Apple for iOS and OS X
There are many other languages out there that are still in use but these are the ones you are likely to encounter in the scientific and numerical computing sphere.
For our purposes, we will be primarily seeing code in Python, MATLAB/Octave and R. Occasionally I will show you a C version of a program as an illustration of how much faster a compiled language can be compared to an interpreted language (Python, Matlab/Octave and R are all interpreted languages).
Am I supposed to learn \(n\) different languages in this course? (where \(n>1\))
No. For exercises and assignments you can use whatever language you like. I will be providing examples in Python, Matlab/Octave, R and occasionally C. My expectation is not that you learn 4 programming languages in this course. My expectation is that you learn to program in one language of your choosing, and that you at least gain exposure to what code looks like in other languages.
In doing so I hope that you'll learn at least two important lessons:
- you will see that mostly, all high level languages are basically the same, but with different syntax and different names for things
- you will become familiar with the range of languages and associated libraries, toolboxes and add-on modules that are available to you as a scientist
So what language should I use?
The answer to this question is to use the language that will be most beneficial for you personally going forward. This could depend on things like:
- what language(s) are in common use in your supervisor's lab right now?
- are there existing toolboxes or libraries for a certain language that you know in advance will be particularly useful to you in your research?
- Do you want to spend money? (MATLAB costs money and is a proprietary, closed source product; Python, R, Octave and C are free and open source) — although at present, Western has a site license for MATLAB, so it is available for you to install free of charge
- what language(s) do you already know, and are you interested in refining what you already know, and/or becoming proficient in another language?
In the absence of other constraints imposed upon you, I suggest using MATLAB for the course.
It could be that there is pressure on you from your supervisor or the other people in your lab to learn a particular language, e.g. if the rest of your lab already uses Python/SciPy, then perhaps it would be best for you to learn that. If this is the case, then it's probably a good idea to follow their advice.
If you are still unsure about which language to use, come and see me and we can make sure it's going to be a suitable plan. Also keep in mind that it's absolutely fine from my point of view for you to switch languages during the course. I won't hold you to a particular choice at the outset.
If you already have experience with the interpreted languages we will be looking at in this course and you would like to challenge yourself and learn some C, then reading through this C Programming Boot Camp might be useful to you. There are also tons of resources online, and books, about programming in C.
A Rough List of Topics
We will talk in our first class about what topics are of interest to the class this year. Here is a rough list of potential topics. We will start by learning some general principles of programming and then we will move on to some of the more useful techniques you might encounter for data analysis. We don't cover statistics per se in this course, that is saved for next term when I teach Introduction to Statistics Using R.
General Programming
- basic data types
- operators, expressions
- control flow (loops, conditionals)
- functions & modularity, variable scope
- complex data types
- input & output
- speeding up your code
- object-oriented programming OOP
Data Analysis Topics
- graphical displays of data
- signals & sampling
- fourier analysis & filtering
- numerical integration
- simulating dynamical systems
- optimization & gradient descent
- curve fitting
- resampling & bootstrapping
Other Topics
- document processing & reproducible research
- Sweave, Pweave, iPython notebook
What should I do now?
During our first meeting we will be talking about the pros and cons of the various ways of getting linux onto your laptop, and whether you will be OK just using Mac OS X or Windows.
A brief note about laptops: I am assuming in this course that you own (or you have access to) a laptop computer. If you don't, then it's time to buy one. I don't feel particularly uncomfortable asking students to buy a laptop in today's market, since prices are low enough nowadays that you can find a modern, suitable laptop for a few hundred dollars — essentially the cost of buying several high-end textbooks. If this is a serious issue for you, let me know and we can talk about what your options are.
So your first task in the course is to get your computer set up and running for the programming language of your choice.
Your second task is to write and run a "Hello, World" program in the language of your choice. Here is some code for you in a variety of languages:
# Python print "Hello, World"
% MATLAB / Octave disp('Hello, World');
# R cat("Hello, World\n")
// C // to compile: gcc -o hello hello.c #include <stdio.h> int main(int argc, char *argv[]) { printf("Hello, World\n"); return 0; }