import statsmodels.stats.weightstats as sm
import numpy as np
import matplotlib.pyplot as plt
Statistics: Parametric tests
Statsmodels & Scipy.stats
The statsmodels
package is a Python module that provides classes and functions for many different kinds of statistical tests. It is a good alternative to the scipy.stats
module, which we have been using so far. The statsmodels
package is more comprehensive and has more options for the tests. It also has a more user-friendly interface. Here is a semi-organized list of various tests and tools Statistics stats.
On the other hand scipy.stats
is more widely used and has a more extensive documentation. So it is a good idea to know how to use both. Here is a list of the tests and tools in scipy.stats
scipy.stats.
Welcome to the world of Python 🤷
Common Parametric Statistical Tests
Let’s go over some of the tests you have probably learned about in your statistics classes in the past.
First we will load packages that we will be using:
One-sample t-test
Let’s generate a sample of size n=10 from a normal distribution with mean 0 and standard deviation 1:
9040)
np.random.seed(= 10
n = 0
popmean = 1
popsd = np.random.normal(popmean, popsd, n)
y y
array([ 1.6906368 , -0.75271818, 0.26192835, 0.03147374, -0.41998303,
0.90818203, 1.81479305, 0.66291534, -0.97879878, -1.10367887])
Now let’s imagine we want to conduct a statistical test of the null hypothesis that the mean of the population from which the same was drawn is equal to 0. We can do this using the ztest()
function:
= sm.ztest(x1=y, x2=None, value=0, alternative='two-sided')
tstat, pvalue
print(f"t({n-1}) = {tstat:.3f}, p = {pvalue:.3f}")
t(9) = 0.636, p = 0.525
The results of our statistical test show that the t statistic is 0.636 on 9 degrees of freedom, and the p-value is 0.525. This p-value is far above any commonly used significance level (e.g. .05), so we fail to reject the null hypothesis that the mean of the population is equal to 0.
More precisely, the p-value indicates that the probability of observing a t statistic as large as 0.636 or larger if we were to take a random sample of size n=10 from a population with a mean of zero, is 0.525. This is a very large probability, so we fail to reject the null hypothesis.
Let’s repeat the “experiment” this time taking a different random sample:
1236)
np.random.seed(= 10
n = 0
popmean = 1
popsd = np.random.normal(popmean, popsd, n)
y y
array([-0.09375848, -1.69935067, 0.40455707, -1.39563456, -1.35254219,
-0.76516263, -1.3732163 , -2.00312241, 0.33949538, -0.07418577])
= sm.ztest(x1=y, x2=None, value=0, alternative='two-sided')
tstat, pvalue
print(f"t({n-1}) = {tstat:.3f}, p = {pvalue:.3f}")
t(9) = -2.869, p = 0.004
This time the t statistic is -2.869 which is much larger (farther from zero) and the p-value is much smaller, 0.004. If this were a real experiment, we would reject the null hypothesis that the mean of the population is equal to 0. We would do this because the probability of obtaining a t statistic as large (different than zero) as -2.869 if we were to take a random sample of size n=10 from a population with a mean of zero, is only 0.004. This is a very small probability, so we would reject the null hypothesis.
Remember in a real experiment we never know if the null hypothesis is true or false. All we can ever do is compute the p-value, the probability of obtaining a test statistic as large or larger than the one we obtained, if the null hypothesis were true. If that p-value is small (e.g. less than .05, or less than .01), then we conclude we’re not that lucky! We reject the null hypothesis.
The role of sample size: simulations
Remember from your introductory statistics class(es) that the sample size plays a huge role in the range of sample means that one is likely to obtain under the null hypothesis. Let’s see this in action using some simulations.
We will take a random sample of size n from a population with a true mean of zero. Then we will compute the sample mean. We will repeat this 10000 times.
We will repeat the 10000 simulatons for different sample sizes n=10, n=50, n=10, n=1000.
= [10, 50, 100, 1000]
samp_sizes = 10000
nsims for n in samp_sizes:
= np.zeros(nsims)
ymeans for i in range(nsims):
= np.random.normal(0, 1, n)
y = np.mean(y)
ymeans[i] =50)
plt.hist(ymeans, bins= (-1.1, 1.1)
plt.xlim
plt.legend(samp_sizes)"n=10, n=50, n=100, n=1000")
plt.title("Sample mean")
plt.xlabel("Frequency")
plt.ylabel( plt.tight_layout()
You can clearly see that as the sample size increases, the range of sample means we get, under the null hypothesis, gets smaller. This is why the p-value is so much smaller for a given t statistic when the sample size is larger. It reflects the fact that is is more unlikely to see a given non-zero sample mean (and non-zero t statistic) as the sample size increases.
For example if we did one experiment with a sample size of n=10 and we obtained a sample mean of 0.60, what is the probability of obtaining this under the null hypothesis that the true mean of the population from which the random sample was taken, is zero? That is the p-value. You can see in the plot below that it is pretty unlikely to happen.
= 10000
nsims = 10
n = np.zeros(nsims)
ymeans for i in range(nsims):
= np.random.normal(0, 1, n)
y = np.mean(y)
ymeans[i] =50)
plt.hist(ymeans, bins= (-1.1, 1.1)
plt.xlim 0.60, color="red")
plt.axvline("n=10, n=50, n=100, n=1000")
plt.title("Sample mean")
plt.xlabel("Frequency")
plt.ylabel( plt.tight_layout()
We can estimate this probability by counting the number of simulated cases in which the sample mean was equal or greater than 0.60, and dividing by the total number of simulated cases. This is the empirical p-value.
= len(ymeans[ymeans >= 0.60]) / len(ymeans)
pvalue_empirical
print(f"p-value = {pvalue_empirical:.3f}")
p-value = 0.029
We can of course compute this p-value using a parametric statistical test, without having to run 10,000 simulations, e.g. a one-sample t-test, as above:
8329) # chosen by me strategically to get a sample mean of 0.60
np.random.seed(= np.random.normal(0, 1, 10) # run the experiment once
y print(f"sample mean = {np.mean(y):.3f}")
sample mean = 0.599
= sm.ztest(x1=y, x2=None, value=0, alternative='two-sided')
tstat, pvalue print(f"t({n-1}) = {tstat:.3f}, p = {pvalue:.3f}")
t(9) = 2.162, p = 0.031
The p-value we get from the parametric test is very close to the empirical p-value we computed above.
t-test for two independent samples
Let’s say we do an experiment and obtain two random samples of size n=10 from two different populations. We want to know if the two populations have the same mean. We can do this using a two-sample t-test.
1234)
np.random.seed(= 10
n = 0
popmean1 = 0
popmean2 = 1
popsd = np.random.normal(popmean1, popsd, n)
y1 = np.random.normal(popmean2, popsd, n)
y2 print(f"The mean of the first sample is {np.mean(y1):.3f}")
print(f"The mean of the second sample is {np.mean(y2):.3f}")
The mean of the first sample is -0.144
The mean of the second sample is 0.121
= sm.ttest_ind(y1, y2, alternative='two-sided')
tstat, pvalue, df print(f"t({df}) = {tstat:.3f}, p = {pvalue:.3f}")
t(18.0) = -0.527, p = 0.605
The p-value is above .05 and so we fail to reject the null hypothesis that the two populations have the same mean. This is what we would expect because we simulated the data from two populations with the same mean.
Let’s now change the mean of the second population to 1.0 and see what happens.
1234)
np.random.seed(= 10
n = 0
popmean1 = 1
popmean2 = 1
popsd = np.random.normal(popmean1, popsd, n)
y1 = np.random.normal(popmean2, popsd, n)
y2 print(f"The mean of the first sample is {np.mean(y1):.3f}")
print(f"The mean of the second sample is {np.mean(y2):.3f}")
The mean of the first sample is -0.144
The mean of the second sample is 1.121
= sm.ttest_ind(y1, y2, alternative='two-sided')
tstat, pvalue, df print(f"t({df}) = {tstat:.3f}, p = {pvalue:.3f}")
t(18.0) = -2.517, p = 0.022
The p-value is below .05 and so we reject the null hypothesis that the two populations have the same mean.
t-test for two paired samples
Apparently there is no paired-samples t-test in statsmodels. Scipy.stats does have one though so let’s use that:
1234)
np.random.seed(= 10
n = 0
popmean1 = 0
popmean2 = 1
popsd = np.random.normal(popmean1, popsd, n)
y1 = np.random.normal(popmean2, popsd, n)
y2 print(f"The mean of the first sample is {np.mean(y1):.3f}")
print(f"The mean of the second sample is {np.mean(y2):.3f}")
The mean of the first sample is -0.144
The mean of the second sample is 0.121
import scipy.stats as stats
= stats.ttest_rel(y1, y2)
tstat, pvalue print(f"t({n-1}) = {tstat:.3f}, p = {pvalue:.3f}")
t(9) = -0.729, p = 0.484
The p-value is above .05 and so we fail to reject the null hypothesis.
Correlation tests
Let’s say we have two random samples of size n=15 from two different populations. We want to know if the two populations are correlated. We can do this using a correlation test.
1234)
np.random.seed(= 15
n = 0
popmean1 = 0
popmean2 = 1
popsd = np.random.normal(popmean1, popsd, n)
y1 = np.random.normal(popmean2, popsd, n)
y2 =(4,4))
plt.figure(figsize'bs')
plt.plot(y1,y2,"y1")
plt.xlabel("y2")
plt.ylabel( plt.show()
= stats.pearsonr(y1, y2)
out print(f"r = {out[0]:.3f}, p = {out[1]:.3f}")
print(f"95% CI on r: {out.confidence_interval(0.95)}")
r = -0.315, p = 0.252
95% CI on r: ConfidenceInterval(low=np.float64(-0.7125755629533768), high=np.float64(0.23472947261268048))
The p-value is above .05 and so we fail to reject the null hypothesis that the true correlation between the two populations is zero. This is what we would expect because we simulated the data from two populations with no correlation.
Role of sample size
Again as above, there is a powerful effect of sample size on the probability of obtaining large correlations under the null hypothesis. Let’s repeat the simulations above but instead of testing sample mean against zero we will test the correlation between two random samples, under the null hypothesis in which the two samples are taken from populations with a correlation of zero.
= [10, 50, 100, 1000]
samp_sizes = 10000
nsims for n in samp_sizes:
= np.zeros(nsims)
rvalues for i in range(nsims):
= np.random.normal(0, 1, n)
y1 = np.random.normal(0, 1, n)
y2 = stats.pearsonr(y1, y2)[0]
rvalues[i] =50)
plt.hist(rvalues, bins= (-1.1, 1.1)
plt.xlim
plt.legend(samp_sizes)"n=10, n=50, n=100, n=1000")
plt.title("Pearson Correlation Coefficient r")
plt.xlabel("Frequency")
plt.ylabel( plt.tight_layout()
What you can see is when the sample size is small (e.g. 10) we can expect to see a very large range of correlation coefficients under the null hypothesis (i.e. when the two samples are taken from populations with a true correlation equal to zero). As the sample size increases, the range of correlation coefficients shrinks.
So if we do an experiment with a small sample size, we would need to obtain a very large (far from zero) correlation coefficient in order to reject the null hypothesis that the two populations are uncorrelated.
One-way Between Subjects ANOVA
Let’s say we do an experiment and obtain three random samples of size n=10 from three different populations. We want to know if the three populations have the same mean. We can do this using a one-way between subjects ANOVA.
1234)
np.random.seed(= 10
n = 0
popmean1 = 0
popmean2 = 0
popmean3 = 1
popsd = np.random.normal(popmean1, popsd, n)
y1 = np.random.normal(popmean2, popsd, n)
y2 = np.random.normal(popmean3, popsd, n)
y3 print(f"The mean of the first sample is {np.mean(y1):.3f}")
print(f"The mean of the second sample is {np.mean(y2):.3f}")
print(f"The mean of the third sample is {np.mean(y3):.3f}")
The mean of the first sample is -0.144
The mean of the second sample is 0.121
The mean of the third sample is 0.047
= stats.f_oneway(y1, y2, y3)
out print(f"F = {out.statistic:.3f}, p = {out.pvalue:.3f}")
F = 0.166, p = 0.848
If we want to get a full ANOVA table in the usual way, we can use the statsmodels library. We need to create a dataframe with the data and then use the ols
function to fit a model to the data. The anova_lm
function then gives us the ANOVA table. This is a bit more complicated than the other tests we have seen so far but it is very flexible and can be used to do many different types of ANOVA. It is reminiscent of the lm
function in R.
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd
= pd.DataFrame({'y': np.concatenate((y1, y2, y3)),
df 'group': np.concatenate((np.repeat('A', n),
'B', n),
np.repeat('C', n)))})
np.repeat(= ols('y ~ group', data=df).fit()
model = sm.stats.anova_lm(model, typ=3)
anova_table print(anova_table)
sum_sq df F PR(>F)
Intercept 0.206449 1.0 0.183477 0.671800
group 0.373295 2.0 0.165878 0.848006
Residual 30.380549 27.0 NaN NaN
Linear Regression
We can use statsmodels.OLS()
to perform a linear regression using ordinary least squares to fit the model coefficients. (OLS is likely what you learned about in your introduction to statisics class(es)).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.api as sm
1234)
np.random.seed(
= 11
n = np.linspace(0, 10, n)
x = -5 + 2*x + np.random.normal(0, 4, size=n)
y
=(4,4))
plt.figure(figsize'bo')
plt.plot(x,y,"x")
plt.xlabel("y")
plt.ylabel(
plt.grid()
plt.tight_layout() plt.show()
We will fit a model of the following form:
\hat{y_{i}} = \beta_0 + \beta_1 x_{i} + \varepsilon_{i}
where \hat{y_{i}} is the predicted value of y for the ith observation, \beta_0 is the model intercept, \beta_1 is the model slope, and \varepsilon_{i} is the error term.
For the statsmodels.OLS()
function we will need to add a column of ones to the x data to account for the intercept term in the model.
= sm.add_constant(x)
X print(X)
[[ 1. 0.]
[ 1. 1.]
[ 1. 2.]
[ 1. 3.]
[ 1. 4.]
[ 1. 5.]
[ 1. 6.]
[ 1. 7.]
[ 1. 8.]
[ 1. 9.]
[ 1. 10.]]
= sm.OLS(y, X)
model = model.fit()
results print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.654
Model: OLS Adj. R-squared: 0.616
Method: Least Squares F-statistic: 17.03
Date: Fri, 10 Jan 2025 Prob (F-statistic): 0.00257
Time: 10:32:48 Log-Likelihood: -31.530
No. Observations: 11 AIC: 67.06
Df Residuals: 9 BIC: 67.86
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -4.3529 2.652 -1.642 0.135 -10.352 1.646
x1 1.8497 0.448 4.127 0.003 0.836 2.864
==============================================================================
Omnibus: 0.653 Durbin-Watson: 2.798
Prob(Omnibus): 0.721 Jarque-Bera (JB): 0.631
Skew: -0.388 Prob(JB): 0.729
Kurtosis: 2.120 Cond. No. 11.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
/Users/plg/github/9040/.venv/lib/python3.13/site-packages/scipy/stats/_axis_nan_policy.py:418: UserWarning: `kurtosistest` p-value may be inaccurate with fewer than 20 observations; only n=11 observations were given.
return hypotest_fun_in(*args, **kwds)
The model output tells us that the intercept is -4.3529 and the slope is 1.8497. The R-squared for the model is 0.654 which tells us that 65.4 % of the variance in y
is accounted for by x
. The p-value for the slope is 0.003 which tells us that it is unlikely that we could have obtained a slope as large (far from zero) as we did if the samples were taken from populations for which the true slope was zero.
We can plot the model fit over the data.
=(4,4))
plt.figure(figsize'bo')
plt.plot(x,y,'r-')
plt.plot(x, results.predict(X), "x")
plt.xlabel("y")
plt.ylabel(
plt.grid()
plt.tight_layout() plt.show()
Other Resources
See the statsmodels
documentation for more information on the various statistical tests and models available in the library. The statsmodels
package is oriented at statisticians and so is very flexible and provides a lot of statistical information and functionality. Look over the documentation to familiarize yourself with what is available.
The scipy.stats
package may be easier to navigate for you for simple statistical procedures. The scipy.stats
package is oriented at scientists and so is more focused on basic functionality needed for scientific research. Look over the documentation to familiarize yourself with what is available.