Introduction

Slides

Welcome

Welcome to Introduction to Statistics Using R (Psychology 9041b) for the winter term, 2019!

The goal of this one-semester graduate seminar is to provide you with a deep understanding of the logic behind statistical analyses of data, to learn a set of standard statistical techniques, and to gain hands-on experience using the R language for statistical computing and graphical display of data. We will cover an initial set of core topics including sampling distributions, t-tests, ANOVA (and its variants), multiple comparisons & post-hoc tests, and multiple regression. We also cover maximum-likelihood estimation, bootstrapping and resampling techniques.

There will be a significant hands-on practical aspect to the course, namely learning to use the R language for statistical computation and graphical display of data. No prior experience with R is required.

If your statistical background is limited, you will benefit greatly from this course, which focuses on the logic of statistical analysis, using a practical, hands-on approach focusing on real-world applications in research.

If you have had some of these topics before in previous courses, you will probably still benefit from covering the same ground, in a different way. You will also learn to use R, which will serve you very well in the future.

There is always a balance to be struck between breadth (number of topics) and depth (level of detail), and given that this course is presently offered as a one-semester course, there are limits to what we can cover in sufficient detail.

We will start out by considering the introductory topics in some detail, and then we will go on to the more advanced topics, with less detail, at least in class. You should explore topics that are relevant to your own work in as much detail as you deem necessary, even if we don't have time in class.

Textbook & Readings

There is one mandatory textbook in this course. The Maxwell & Delaney book is a large book and very extensive, but I'm serious when I say that I want you to do your best to read each section that is assigned. You are taking a graduate course in statistics, and you have a unique opportunity to dedicate a block of time each week to consolidating the knowledge you already have, and to learn new things about statistical analyses of data.

We will cover the initial set of core topics using the approach taken in Maxwell & Delaney:

Mandatory Textbook

  • Designing Experiments and Analysing Data: A Model Comparison Perspective (3rd Edition) by Scott E. Maxwell, Harold D. Delaney and Ken Kelley. Routledge (2017). ISBN: 978-1138892286

The textbook is available at the campus bookstore (they are charging $200). You can also buy it from Amazon.ca at a significant discount, in both hardcover ($152) and Kindle ($72) format, or directly from the publisher for an even deeper discount ($100 hardcover, $49 eBook).

There is a website accompanying the textbook with datasets from the book, and sample R code for many of the chapters.

There is a package for R on the CRAN site that contains datasets from the Maxwell & Delaney text:

Recommended Books

The Keppel book is not mandatory but I would encourage you all to acquire this book as well. It is a very good statistics text that emphasizes computational formulae:

  • Design and Analysis: A Researcher's Handbook (4th Ed.) by Geoffrey Keppel. Prentice Hall (2004). ISBN: 0135159415

Resources for more advanced R instruction

Other selected readings will be assigned as appropriate for the topic each week.

Software

R is a sophisticated package for graphical and exploratory data analysis, and is a powerful statistical programming language. R can be downloaded for free for Windows, Macintosh, Linux, and Unix operating systems from http://www.r-project.org. The R manual is also available for free on the web. R code is platform-independent. R has extensive on-line help, and there are lots of other resources on-line for using R.

It's likely that none of you will have used R before, although some of you may have used other statistical software packages (like SPSS), or spreadsheet programs (like Excel) or even some numerical programming languages (like MATLAB). A non-trivial aspect of the work involved in this course will be learning how to use R. There is a certain learning curve involved with starting to program in any language. Once you conquer that initial stage, your productivity will grow exponentially.

RStudio

RStudio is a free and open-source integrated development environment (IDE) for R. You can download it from https://www.rstudio.com. I recommend you use it, although if you prefer you can use the plain R client.

Here are links to some cheatsheets for quick reference to R and RStudio features:

Laptops

Bring a laptop to class, so that you can try things out in R as you see them.

Grades & Course Requirements

  • 70% weekly assignments
  • 15% take-home midterm exam
  • 15% take-home final exam

The requirement for this course is simple: work diligently. This includes attending class, doing the readings carefully before the seminar meets, reading beyond the syllabus, working regularly on the problem sets, practicing using statistical software, and coming to see me for extra help, as needed. You will spend time learning and using R to implement the concepts discussed in class. Because you each have different backgrounds and interests, the amount of time necessary to master the material will vary. That being said, I am confident that with concerted effort each one of you can learn the material.

One important thing to realize: We will not have enough time in the classes to go over the details of every concept covered in the course. In class I will highlight the major ideas and provide a conceptual roadmap for you to navigate through the material. You will be responsible for reading the material in the textbooks on your own and asking questions if you need more guidance. Just because I didn't say it out loud in class doesn't mean you're not responsible for it. Pay attention to the readings assigned for each week and do them in advance of class, not afterwards.

There is no doubt, there is a lot of reading assigned in this course. Remember, you are a full time graduate student, your full time job is to learn. Do the readings, they are a requirement of the course.

Feel free to work together on the assignments. Just do not hand in a document to me that is identical to someone else's. Also note that you won't learn nearly as much if you simply copy someone else's work. You will get the most out of this course if you work through the problem sets on your own. Your goal here should not be to get a high grade. Your goal should be to learn as much as you can.

Some Final Words

Statistics (I have sometimes heard it called Sadistics) has a reputation as a difficult topic, and one that depends on an extraordinary mathematical or arithmetic ability. It's my hope that I can convince you otherwise by the end of this course. Statistics is often taught in a cookbook fashion, whereby you learn a series of recipies that are often tied to particular software packages (e.g. SPSS). Quite often you are evaluated (e.g. in exams or assignments) based on your ability to recall memorized recipes and equations, and / or to produce accurate arithmetic calculations. You might even earn high grades in these courses, not because you understand anything about statistics per se, but because you have learned how to regurgitate statistical recipes and use a calculator.

Statistics, at its core, is not about calculation. Rather statistics (at least the statistics we cover in this course) is a logical framework for interpreting data. The key is to understand the logic of statistical procedures. What I would like to drive home for you is that an understanding of the logic behind statistics will empower you in your future scientific endeavors. You will be able to incorporate new statistical analyses into your own work, even if you haven't formally learned about them before, because you will find that you have insight into the fundamental logic behind them.

It's a cliche, but it's like the difference between being a cook and being a chef. As a cook you know how to follow particular recipies but you have little ability to flexibly adapt recipes for novel situations, and you have little facility for developing your own new recipies. As a chef you understand a core set of basic principles that allows you to succeed in new environments by being creative (note that being creative does not mean breaking rules, per se, but understanding the ways in which you have flexibility, within the rules).

Being able to carry out statistical analyses of course depends on computation, ultimately, but the implementation should be independent of the statistical logic itself. In this course we will focus largely on the logic of the statistical procedures we cover. There will be a practical aspect as well, as we learn to use R to implement statistical analyses … but as you will see, R is built on an extensive core of statistical smarts so that as a user, you won't have to do much actual calculation yourself.

My Role and Your Role

My goal as instructor of this course is to maximize your opportunity to learn both the fundamental logic and reason behind statistics, and some practical tools (using R) for carrying out statistical analyses and presenting data graphically. This is not an undergraduate-style lecture course in which you can succeed by sitting back and memorizing what I say. What you ultimately get out of this course depends, in the end, on the efforts you put in to exploring the theoretical and practical components of the course. I'm not going to tell you everything you need to know. My role is more like a tour-guide, showing you interesting topics and avenues that you will explore on your own.

I am more than happy to spend some extra time with you going over material and clarifying concepts that may be difficult to understand at first. Often hearing something said in a new and different way will help you gain a better understanding. I don't have formal office hours; send me an email and we can schedule a time to meet. If there is a clear need to go over material as a class, I am happy to arrange extra tutorial sessions.

Contact Info

  • no formal office hours
  • contact me by email: paul [at] gribblelab [dot] org
  • Office: WIRB 4122

Paul Gribble | Winter, 2019