# Exercise 14

## Linear Regression

Linear regression is an approach whereby we can model the relationship
between one *dependent variable* \(Y\) and \(m\) *independent variables*
\(X_{j}\) where \(j=1:m\). When \(j=1\) this is called *simple linear
regression*. When \(j>1\) this is called *multiple regression*.

The independent variables \(X_{j}\) are sometimes called *predictor
variables*. In the general linear modeling approach often used in fMRI
analysis, for example, the predictor variables \(X_{j}\) might be a
collection of different things, some of which are predictor variables
(e.g. that code different aspects of the task that the subject is
performing in the scanner), and some of which are so-called "nuisance
variables" (like physiological noise, head motion parameters, etc).

Linear regression is called *linear* because the underlying model
relating \(Y\) to \(X_{j}\) is linear:

The \(\beta_{j}\) terms represent weights on each predictor variable. The \(\epsilon\) term represents random "noise" or variance that is not predicted by the model. It is typically assumed that \(\epsilon\) is normally distributed.

We also typically have many individual samples of the dependent variable \(Y\) and associated values for the dependent variables \(X\). If we have \(n\) cases, we can rewrite the equation with a subscript \(i\) denoting each sample, where \(i\) varies from \(1\) through \(n\):

\begin{equation} Y_{i} = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + ... + \beta_{3}X_{i3} + \epsilon_{i} \end{equation}We can rewrite this as a matrix equation:

\begin{equation} Y = X \beta + \epsilon \end{equation}Now \(Y\) is a matrix with \(n\) rows and \(1\) column, and \(X\) is a matrix with \(n\) rows and \(m+1\) columns. There is a trick here, which is that the first column of the \(X\) matrix is a column filled with \(1\). This is how we incorporate the \(\beta_{0}\) constant, that doesn't multiply an \(X_{j}\) independent variable. Essentially we are saying it is multiplied by \(1\). Finally, \(\beta\) will be a matrix with \(m+1\) rows and \(1\) column — in other words, a constant \(\beta_{0}\) and then one \(\beta_{j}\) for each independent variable \(X_{j}\). The values of \(\beta\) are the weights on the independent variables \(X\) that best predict the dependent variable \(Y\). The matrix multiplication is legal: \((n~1) = (n~[m+1])([m+1]~1)\)

## Yay linear algebra

An estimate of the weights \(\hat{\beta_{j}}\) can be found using ordinary least squares estimation using the following matrix equation (take my word for it, for now, or you can find proofs on the internet):

\begin{equation} \hat{\beta} = \left( X^{\top}X \right)^{-1} X^{\top} Y \end{equation}where \(X^{\top}\) denotes matrix transpose, \(X^{-1}\) denotes matrix inverse, and \(XY\) denotes matrix multiplication. If your matrix algebra is shaky, find a classmate who is on more solid ground (or come and talk to me).

Briefly, in ordinary least squares, the values of \(\hat{\beta}\) are found that minimize the sum of squared differences between the actual dependent variable \(Y\) and the predicted values \(\hat{Y}\) given the linear model.

## with some data

Here are some example data:

Income ($) | Age (years) | Education (years) |
---|---|---|

50000 | 35 | 4 |

35000 | 40 | 2 |

80000 | 45 | 6 |

25000 | 25 | 0 |

90000 | 70 | 4 |

75000 | 55 | 6 |

65000 | 50 | 4 |

95000 | 60 | 6 |

70000 | 45 | 8 |

110000 | 50 | 8 |

Your task is to estimate a linear model that relates Age and Eduction to Income. So your dependent variable \(Y\) is Income and your two dependent variables \(X_{1}\) and \(X_{2}\) are Age and Education.

## more hints

Our \(X\) matrix will be \(10\) rows and \(3\) columns. The first column will be filled with \(1\). The second column will be the Age data and the third column will be the Education data. Our \(Y\) matrix will be \(10\) rows and \(1\) column and will contain the Income data. Our task now is to estimate the weights \(\beta\) that when multiplied by \(X\) will best predict \(Y\). The \(\beta\) matrix will be \(3\) rows and \(1\) column.

Note that there are built-in functions to estimate linear models, but I don't want you to use them here. I want you to program this from scratch.

Your two tasks are:

- estimate weights \(\beta_{0}\), \(\beta_{1}\) and \(\beta_{2}\), and
- generate predicted values of Income

## linear algebra refresher

The Khan Academy has a series of matrix tutorials that might be useful if you need a refresher.