import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("sleep_dose_response.csv", delimiter=",", skip_header=1)
x = data[:, 0] # dose_mg
y = data[:, 1] # sleep_improvement_hrsHomework 10
Due: Apr 14 by 11:59 pm eastern standard time
Submit a single jupyter notebook called name_10.ipynb to Brightspace/OWL where name is replaced with your last name, e.g. gribble_10.ipynb
Homework 10 will be graded out of 9 points. Note that for Homework 10 you can only submit once, you cannot resubmit again to fix errors. Your grade on Homework 10 will be based on the first submission. For Homework 10, late submissions will receive a grade of zero.
Fitting a Model to Data
A research team is studying the effect of a new sleep-aid compound (Somnulex) on sleep improvement. They administered various doses (in mg) to 15 participants and measured how many additional hours of sleep each participant gained relative to a placebo baseline. The researchers believe there is a nonlinear relationship between dose and sleep improvement — at low doses the drug helps, but at high doses the benefit may plateau or even reverse — and they want to find the best polynomial model to describe this relationship.
The dataset is provided in the file sleep_dose_response.csv. It contains two columns:
dose_mg— the dose in milligramssleep_improvement_hrs— the measured improvement in hours of sleep
You can load it with:
Complete each question below in a Jupyter notebook. Include all Python code, clearly labelled plots, and written answers to discussion questions. Use NumPy, Matplotlib, and (optionally) SciPy. Do not use scikit-learn or any other machine-learning library — write the cross-validation loop yourself.
Question 1 — Exploratory plot [1 point]
Load the dataset and create a scatter plot of sleep_improvement_hrs (y-axis) versus dose_mg (x-axis).
Based on visual inspection alone, what general shape does the relationship appear to have? What polynomial degree(s) might be reasonable to try?
Question 2 — Polynomial fits [1.5 points]
Fit polynomial models of degree 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 to the full dataset using np.polyfit() and np.polyval(). For each degree:
(a) Compute the Sum of Squared Errors (SSE) between the predicted values and the actual values of sleep_improvement_hrs.
(b) Produce a single figure with 10 subplots (arranged in a 2 × 5 grid), one per polynomial degree. In each subplot, show the data as red dots and the fitted polynomial curve as a blue line. Display the polynomial degree and SSE in each subplot title. Use a finely spaced x-range (e.g. np.linspace) for the curve so it looks smooth, and set consistent axis limits across subplots.
(c) In a brief paragraph, describe what happens to SSE as polynomial degree increases. Is the polynomial with the lowest SSE necessarily the best model? Why or why not?
Question 3 — Leave-one-out cross-validation [3 points]
Implement leave-one-out cross-validation (LOO-CV) from scratch. For each polynomial degree (0 through 9):
(a) For each of the n = 15 data points, leave that point out, fit the polynomial to the remaining n - 1 points, and compute the squared prediction error on the left-out point.
(b) Compute the mean squared prediction error (MSPE) across all 15 folds for that degree:
\text{MSPE} = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2
where \hat{y}_i is the prediction for the left-out point i from the model trained on the other n - 1 points.
(c) Store the MSPE for each polynomial degree in an array.
Question 4 — Visualising the results [1.5 points]
Create a figure that shows both the training SSE (from Question 2) and the LOO-CV MSPE (from Question 3) as a function of polynomial degree, on the same plot or on a dual-axis plot. Use different colours or markers to distinguish them. Consider using a log-scale on the y-axis if it helps.
(a) At which polynomial degree is the LOO-CV MSPE minimised?
(b) At which polynomial degree is the training SSE minimised?
(c) Are these the same? Explain why or why not.
Question 5 — The best model [2.0 points]
Based on your LOO-CV results, state which polynomial degree you would select as the best model. Produce a final plot showing the data (red dots) and the best-fitting polynomial (blue curve).
This is an individual assignment — do not share code or answers. Also remember, as stated in section 12 of the course syllabus, the use of generative artificial intelligence (AI) tools/software/apps to complete any part of this homework assignment is not acceptable, and consitutes an academic offence.