Scientific Computing (Psychology 9040A)
Goals of the course
The goal of this one-semester graduate seminar is to provide you with skills in scientific computing — tools and techniques that you can use in your own research. We will focus on learning to think about experiments and data in a computational framework, and we will learn to implement specific data processing and analysis algorithms using a high-level programming language.
The course is designed to achieve four primary goals:
- to teach you to program in a high-level language such as Python, Matlab, R or C.
- to introduce you to Unix (either via the Mac, or GNU/Linux), and to some common computing tools that will help you become more efficient with using your computer for research.
- to introduce you to some common computational techniques (not necessarily "statistics" per se) for data processing and analysis — e.g. curve-fitting, simulation, optimization & gradient descent, frequency-domain analysis and filtering, parallel programming, interacting with hardware, organization and storage of data, programming GUIs (graphical user interfaces), etc. This list can be adapted from year to year depending on class demand.
- to get you to think computationally and algorithmically about data processing, analysis, modeling and visualization.
Even if you don't become a full-fledged programmer yourself, you will have learned enough to know what is possible given today's computational resources. This will enable you in the future to design experiments and data analysis approaches that take full advantage of the computational resources available to you — whether programmed yourself, or by programmers and technicians that you hire.
Exercises, Assignments and Grades
Each week we will go over a series of programming exercises that are designed to get you going with the concepts we cover each week. The exercises are not graded. The idea is for you to have a go, and when you have problems, questions, etc, ask each other, and ask me for help. The best way to learn to program, is to spend time programming and debugging.
There will be a set of assignments that will be graded. The course grade will be based on the assignments (90% weighting) and a "course participation" mark (10% weighting).
The assignments will have several components. Some components will be easy, some will be more difficult, and some assignments may contain components that are rather challenging. Do your best, and don't be afraid to go to each other for help. I'm all in favour of students working together on assignments. I just don't want to find out that you are handing in work that is not your own. I'm perfectly willing to give you hints, so don't hesitate to come to me for help (although do make sure that you've given it a good try yourself, first… read this blog post, What have you tried? by Matt Gemmell — it's aimed at professional programmers but I think it applies equally well to us all, in whatever field we work in).
Which programming language(s)?
The issue of which language to use in the course is an interesting one. In the past I have taught the course using only one language to demonstrate concepts. Once we used MATLAB. Another time we used Python. The problem I see with using only one language is that we risk turning the course into a "how to use language X" course rather than a course in learning computing concepts, and using a given language to demonstrate the concepts. On the other hand, there is no doubt that a major component of learning to be a competent programmer is learning the ins and outs and specific features of a given language.
We will talk about this during the first class and we will decide how to proceed this year. My present inclination is to use Python, Matlab, R and perhaps even C to demonstrate concepts in the course, and to allow students to submit assignments in any of these languages. As I said though we will talk about this during our first meeting and decide what is best for our class this year.
If you are a beginner programmer and you don't know which language to choose, I would suggest choosing MATLAB. It's platform-independent, it is very easy to install and setup, and many, many people use it for scientific programming. Also, Western has a campus site license for MATLAB which means it is available to you free of charge.
What you should probably do in any case is talk with your supervisor, and with the other people in your lab, and find out which language(s) are in common use in your lab, and perhaps in your field of study.
If you already have some background in programming, you may want to challenge yourself and use two languages in the course.
Which operating system(s)?
I will be using Mac OS X, which is unix-based. If you have a Mac (and a reasonably recent version of the OS), you can use that. If not then I suggest you use GNU/Linux. I won't say you can't use Microsoft Windows, but please know that I have no expertise with Windows, and I will not be able to help. We will have a discussion and a tutorial session at the beginning of the course about how best to set up your computer. You will have several choices. If you are using MATLAB then Windows is not a problem at all. If you want to use Python or R or C then Windows can sometimes present some challenges.
Why learn to code?
Learning how to program will significantly enhance your ability to conduct scientific research today and in the future. Programming skills will provide you with the ability to go beyond what is available in pre-packaged analysis tools, and code your own custom data processing, analysis and visualization pipelines.
Learning to code is probably one of the most useful general skills you can learn as a scientist today. You will learn how to think computationally and algorithmically about your experiments and your data. You will learn to take full advantage of the enormous computational resources available to you — today's laptops are more powerful than the multi-million dollar supercomputers of the 1980s and 90s.
List of Topics
- Introduction, setting up your computing environment
- Basic Types, Operators & Expressions
- Functions
- Complex Data Types and Structures
- File input and output
- Graphical displays of data
- Document processing & reproducible research
- Optimization & Gradient Descent
- Signals & Sampling Theory
- Fourier Analysis & Filtering
- Resampling & Bootstrapping
- Parallel Computing
A1. Appendix 1: Digital Representation of Data
Next steps
Stay tuned… we are going to have fun and learn a lot!
The course notes will be posted and updated online as we go.
If you have questions about the course, please get in touch with me at: paul [at] gribblelab [dot] org, or just stop by my office (NSC 228) and/or lab (NSC 245G)
Start Date & Location
- Mondays and Wednesdays 2:30—4:00pm in STH 3166
- start date: Monday, September 8, 2014