Signals, Sampling, & Filtering

1 Signals, Sampling, & Filtering

Whereas signals in nature (such as sound waves, magnetic fields, hand position, electromyograms (EMG), electroencephalograms (EEG), extra-cellular potentials, etc) vary continuously, often in science we measure these signals by sampling them repeatedly over time, at some sampling frequency. The resulting collection of measurements is a discretized representation of the original continuous signal.

Before we get into sampling theory however we should first talk about how signals can be represented both in the time domain and in the frequency domain.

Jack Schaedler has a nice page explaining and visualizing many concepts discussed in this chapter:

Seeing circles, sines and signals

1.1 Time domain representation of signals

This is how you are probably used to thinking about signals, namely how the magnitude of a signal varies over time. So for example a signal $s$ containing a sinusoid with a period $T$ of 0.5 seconds (a frequency of 2 Hz) and a peak-to-peak magnitude $b$ of 2 volts is represented in the time domain $t$ as:

$s(t) = \left(\frac{b}{2}\right) \mathrm{sin}\left(wt\right)$

where

$w = \frac{2 \pi}{T}$

We can visualize the signal by plotting its magnitude as a function of time, as shown in Figure 1.

1.2 Frequency domain representation of signals

We can also represent signals in the frequency domain. This requires some understanding of the Fourier series. The idea of the Fourier series is that all periodic signals can be represented by (decomposed into) the sum of a set of pure sines and cosines that differ in frequency and period. See the wikipedia link for lots of details and a helpful animation.

$s(t) = \frac{a_{0}}{2} + \sum_{n=1}^{\infty} \left[a_{n}\mathrm{cos}(nwt) + b_{n}\mathrm{sin}(nwt)\right]$

The coefficients $a_{n}$ and $b_{n}$ define the weighting of the different sines and cosines at different frequencies. In other words these coefficients represent the strength of the different frequency components in the signal.

We can also represent the Fourier series using only cosines:

$s(t) = \frac{a_{0}}{2} \sum_{n=1}^{\infty} \left[r_{n}\mathrm{cos}(nwt-\phi_{n})\right]$

Using this formulation we now have magnitude coefficients $r_{n}$ and phase coefficients $\phi_{n}$ . That is, we are representing the original signal $s(t)$ using a sum of sinusoids of different frequencies and phases.

Here is a web page that lets you play with how sines and cosines can be used to represent different signals: Fourier series visualization.

Here is another that lets you manipulate sines and cosines, play them as sounds, and manipulate the spectrum directly: Fourier series applet.

1.3 Fast Fourier transform (FFT)

Given a signal there is a very efficient computational algorithm called the Fast Fourier transform (FFT) for computing the magnitude and phase coefficients. We will not go into the details of this algorithm here, most high level programming languages have a library that includes the FFT algorithm.

Here is a video showing a 100-year-old mechanical computer that does both forward and inverse Fourier transforms:

1.4 Sampling

Before we talk about the FFT and magnitude and phase coefficients, we need to talk about discrete versus continuous signals, and sampling. In theory we can derive a mathematical description of the Fourier decomposition of a continuous signal, as we have done above, in terms of an infinite number of sinusoids. In practice however, signals are not continuous, but are sampled at some discrete sampling rate.

For example, when we use Optotrak to record the position of the fingertip during pointing experiments, we choose a sampling rate of 200 Hz. This means 200 times per second the measurement instrument samples and records the position of the fingertip. The interval between any two samples is 5 ms. It turns out that the sampling rate used has a specific effect on the number of frequencies used in a discrete Fourier representation of the recorded signals.

The Nyquist-Shannon sampling theorem states that a signal must be sampled at a rate which is at least twice that of its highest frequency component. If a signal contains power at frequencies higher than half the sampling rate, these high frequency components will appear in the sampled data at lower frequencies and will distort the recording. This is known as the problem of aliasing.

Let’s look at a concrete example that will illustrate this concept. Let’s assume we have a signal that we want to sample, and we choose a sampling rate of 4 Hz. This means every 250 ms we sample the signal. According to the Shannon-Nyquist theorem, the maximum frequency we can uniquely identify is half that, which is 2 Hz. This is called the nyquist frequency. Let’s look at a plot and see why this is so.

In Figure 2 we see a solid blue line showing a 2 Hz signal, a magenta dashed line showing a 4 Hz signal, and a green dashed line showing a 8 Hz signal. Now imagine we sample these signals at 2 Hz, indicated by the vertical red lines. Notice that at the sample points (vertical red lines), the 2 Hz, 4 Hz and 8 Hz signals overlap with identical values. This means that on the basis of our 2 Hz samples, we cannot distinguish between frequencies of 2, 4 and 8 Hz. What’s more, what this means is that if the signal we are actually sampling at 2 Hz has significant signal power at frequencies above the Nyquist (1 Hz) then the power at these higher frequencies will influence our estimates of the magnitude coefficients corresponding to frequencies below the Nyquist... in other words the high-frequency power will be aliased into the lower frequency estimates.

Figure 3 shows another example taken from the wikipedia article on aliasing. Here we have two sinusoids—one at 0.1 Hz (blue) and another at 0.9 Hz (red). We sample both at a sampling rate of 1 Hz (vertical green lines). You can see that at the sample points, both the 0.1 Hz and 0.9 Hz sinusoids pass through the sample points and thus both would influence our estimates of the power at the 0.1 Hz frequency. Since the sampling rate is 1 Hz, the Nyquist frequency (the maximum frequency we can distinguish) is 0.5 Hz—and so any power in the signal above 0.5 Hz (such as 0.9 Hz) will be aliased down into the lower frequencies (in this case into the 0.1 Hz band).

So the message here is that in advance, before choosing your sampling rate, you should have some knowledge about the highest frequency that you (a) are interested in identifying; and (b) you think is a real component in the signal (as opposed to random noise). In cases where you have no a priori knowledge about the expected frequency content, one strategy is to remove high frequency components before sampling. This can be accomplished using low-pass filtering—sometimes called anti-aliasing filters. Once the signal has been sampled, it’s too late to perform anti-aliasing.

1.5 Spectrum

Having bypassed completely the computational details of how magnitude and phase coefficients are estimated, we will now talk about how to interpret them.

For a given signal, the collection of magnitude coefficients gives a description of the signal in terms of the strength of the various underlying frequency components. For our immediate purposes these magnitude coefficients will be most important to us and we can for the moment set aside the phase coefficients.

Here is an example of a magnitude spectrum for a pure 10 Hz signal, sampled at 100 Hz.

The magnitude values are zero for every frequency except 10 Hz. We haven’t plotted the phase coefficients. The set of magnitude and phase coefficients derived from a Fourier analysis is a complete description of the underlying signal, with one caveat—only frequencies up to the Nyquist are represented. So the idea here is that one can go between the original time-domain representation of the signal and this frequency domain representation of the signal without losing information. As we shall see below in the section on filtering, we can perform operations in the frequency domain and then transform back into the time domain.

1.5.1 Python code for Spectrum

Here is some Python code to illustrate these concepts. We construct a one second signal sampled at 100 Hz that is composed of a 6 Hz, 10 Hz and 13 Hz component. We then use the scipy.fft.rfft() function from the SciPy package to compute the Fast Fourier transform, we extract the magnitude information, we set our frequency range (up to the Nyquist) and we plot the spectrum, which is shown below.

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

t = np.linspace(0,1,1000) # 1 second sampled at 1000 Hz
y = np.sin(2*np.pi*t*6) + np.sin(2*np.pi*t*10) + np.sin(2*np.pi*t*13) # 6 Hz, 10 Hz and 13 Hz components

fig,ax = plt.subplots(2,1)
ax[0].plot(t,y)
ax[0].set_xlabel('Time (sec)')
ax[0].set_ylabel('Signal Amplitude')
 
out = sp.fft.rfft(y) # compute the FFT (rfft is fft on real-valued data and is preferred for speed)
mag = np.abs(out)    # extract the magnitude information
freqs = sp.fft.rfftfreq(len(y), d=1/1000) # compute the frequency range

ax[1].plot(freqs, mag) # plot the magnitude spectrum up to the Nyquist
ax[1].set_xlim([0,50]) # let's zoom in on the first 50 Hz 
ax[1].set_xlabel('Frequency Component (Hz)')
ax[1].set_ylabel('Amplitude')

plt.tight_layout()

We can see that the spectrum has revealed peaks at 6, 10 and 13 Hz—which we know is correct, since we designed our signal from scratch.

Typically however signals in the real world that we record are not pure sinusoids, but contain random noise. Noise can originate from the actual underlying process that we are interested in measuring, and it can also originate from the instruments we use to measure the signal. For noisy signals, the FFT taken across the whole signal can be noisy as well, and can make it difficult to see peaks.

1.6 Power Spectral Density

One solution is instead of performing the FFT on the entire signal all at once, to instead, split the signal into chunks, take the FFT of each chunk, and then average these spectra to come up with a smoother spectrum. This can be accomplished using a power spectral density function. In Python in the SciPy package there is a function scipy.signal.welch() to accomplish this. We won’t go into the mathematical details or the theoretical considerations (relating to stochastic processes) but for now suffice it to say that the psd can often give you a better estimate of the power at different frequencies compared to a “plain” FFT, in the presence of random noise.

1.6.1 Python code for power spectral density

Here is an example of plotting the power spectral density of a signal. We construct a 12 Hz signal at 1000 Hz sampling rate, and we add some random noise on top:

t = np.linspace(0,1,1000)
y = np.sin(2*np.pi*t*12)
yn = y + np.random.randn(len(y))*2
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,yn)
ax[0].set_xlabel('Time (sec)')
ax[0].set_ylabel('Signal Amplitude')
ax[0].set_title('TIME SERIES SIGNAL')
freqs,psd = sp.signal.welch(yn, fs=1000, nperseg=1000)
ax[1].plot(freqs,psd)
ax[1].set_xlim([0,100])
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('PSD (dB)')
ax[1].set_title('PSD')
plt.tight_layout()

In the Figure above you can see that the peak at 12 Hz stands nicely above all the noise in the power spectral density estimate (bottom panel).

1.6.2 Multitaper PSD

It turns out that Welch’s method while good, can be improved upon, and that’s what the Thomson Multitaper PSD does. Welch’s method, which involves a sliding window approach to averaging multiple spectral estimates, can result in spectral leakage and bias. Spectral leakage refers to the idea that the estimate of the spectrum at a given frequency is contaminated by power from other frequency bands. Welch’s method can also result in a bias in the estimated power at a given frequency.

The Thomson multitaper method is a way to overcome these limitations. We won’t go over the details here, you can read about it in the Wikipedia article linked above.

In Python there is a package called spectrum (install using pip install spectrum) that contains functions for doing multitapering PSD estimates. For the purposes of this course I don’t mind which you use, Welch or something else.

1.7 Decibel scale

The decibel (dB) scale is a ratio scale. It is commonly used to measure sound level but is also widely used in electronics and signal processing. The dB is a logarithmic unit used to describe a ratio. You will often see power spectra displayed in units of decibels.

The difference between two sound levels (or two power levels, as in the case of the power spectra above), is defined to be:

$10 * \left[\mathrm{log}_{10}\left(\frac{P_{2}}{P_{1}}\right)\right] dB$

Thus when $P_{2}$ is twice as large as $P_{1}$ , then the difference is about 3 dB. When $P_{2}$ is 10 times as large as $P_{1}$ , the difference is 10 dB. A 100 times difference is 20 dB.

An advantage of using the dB scale is that it is easier to see small signal components in the presence of large ones. In other words large components don’t visually swamp small ones.

Since the dB scale is a ratio scale, to compute absolute levels one needs a reference—a zero point. In acoustics this reference is usually 20 micropascals—about the limit of sensitivity of the human ear.

For our purposes in the absence of a meaningful reference we can use 1.0 as the reference (i.e. as $P_{1}$ in the above equation).

1.8 Spectrogram

Often there are times when you may want to examine how the power spectrum of a signal (in other words its frequency content) changes over time. In speech acoustics for example, at certain frequencies, bands of energy called formants may be identified, and are associated with certain speech sounds like vowels and vowel transitions. It is thought that the neural systems for human speech recognition are tuned for identification of these formants.

Essentially a spectrogram is a way to visualize a series of power spectra computed from slices of a signal over time. Imagine a series of single power spectra (frequency versus power) repeated over time and stacked next to each other over a time axis.

The SciPy package has a function called scipy.signal.spectrogram() that will generate a spectrogram.

Here we generate a 100 Hz signal within noise, sampled at 2000 Hz, and we add a transient “chirp” of a 400 Hz signal partway through.

dt = 0.0005
t = np.arange(0.0, 20.0, dt)
s1 = np.sin(2 * np.pi * 100 * t)
s2 = 2 * np.sin(2 * np.pi * 400 * t)
s2[t <= 10] = s2[12 <= t] = 0 # create a transient "chirp"
nse = 0.01 * np.random.random(size=len(t)) # add some noise into the mix
x = s1 + s2 + nse  # the signal
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,x)
ax[0].set_xlabel('Time [sec]')
ax[0].set_ylabel('Signal Amplitude')
f,t,Sxx = sp.signal.spectrogram(x, fs=1/dt, nfft=1024)
ax[1].pcolormesh(t, f, Sxx)
ax[1].set_xlabel('Time [sec]')
ax[1].set_ylabel('Frequency [Hz]')
plt.tight_layout()

The Matplotlib package also has a function called matplotlib.pyplot.specgram() that will generate a spectrogram.

1.9 Filtering

The Fourier series representation and its computational implementation, the FFT and the PSD, are useful not only for determining what frequency components are present in a signal, but we can also perform operations within frequency space in order to manipulate the strength of different frequency components in the signal. This can be especially effective for eliminating noise sources with known frequency content.

Let’s look at a concrete example, a spectrum of a noisy signal with peaks at 10, 50 and 200 Hz.

t = np.linspace(0,1,1000)
y = np.sin(2*np.pi*t*10) + np.sin(2*np.pi*t*50) + np.sin(2*np.pi*t*200)
yn = y + np.random.randn(len(y))*0.5
freqs,psd = sp.signal.welch(yn, fs=1000, nperseg=1000)
plt.plot(freqs,psd)
plt.xlabel('Frequency (Hz)')
plt.ylabel('PSD')
plt.tight_layout()

In the Figure above we can see the signal has three signal components: 10, 50 and 200 Hz. Let’s say we believe that the frequencies we are interested in are all below 100 Hz. In other words, frequencies above that are assumed to be noise of one sort or another. We can filter the signal so that all frequencies above 100 Hz are essentially zeroed out (or at least reduced in magnitude). One way to do this is simply to take the vector of power coefficients, change all values for frequencies above 100 Hz to zero, and perform an inverse Fourier transform (the inverse of the FFT) to go back to the time domain. We won’t go into the mathematical details, but there are also other ways to filter a signal as well.

Here is a short summary of different kinds of filters, and some terminology.

low-pass filters pass low frequencies without change, but attenuate (i.e. reduce) frequencies above the cutoff frequency
high-pass filters pass high frequencies and attenuate low frequencies, below the cutoff frequency
band-pass filters pass frequencies within a pass band frequency range and attenuate all others
band-stop filters (sometimes called band-reject filters or notch filters) attenuate frequencies within the stop band and pass all others

1.9.1 Characterizing filter performance

A useful way of characterizing a filter’s performance is in terms of the ratio of the amplitude of the output to the input (the amplitude ratio AR or gain), and the phase shift ( $\phi$ ) between the input and output, as functions of frequency. A plot of the amplitude ratio and phase shift against frequency is called a Bode plot.

The pass band of a filter is the range of frequencies over which signals pass with no change. The stop band refers to the range of frequencies over which a filter attenuates signals. The cutoff frequency or corner frequency of a filter is used to describe the transition point from the pass band to the reject band. This this transition cannot occur instantaneously it is usually defined to be the point at which the filter output is equal to -3 dB of the input in the pass band. The cutoff frequency is sometimes called the -3 dB point or the half-power point since -3 dB corresponds to half the signal power. The roll-off refers to the rate at which the filter attenuates the input after the cutoff point. When the roll-off is linear it can be specified as a specific slope, e.g. in terms of dB/decade or dB/octave (an octave is a doubling in frequency).

Let’s look at some examples of filter characteristics.

Spectrum of three filtered versions of a noisy signal with peaks at 6, 10 and 13 Hz.

In the Figure above the blue trace shows the power spectrum for the unfiltered signal. The red trace shows a lowpass-filtered version of the signal with a cutoff frequency of 30 Hz. The green trace shows a low-pass with a cutoff frequency of 130 Hz. Also notice that the roll-off of the 30 Hz lowpass is not as great as for the 130 Hz lowpass, which has a higher roll-off.

1.9.2 Python code for filtering

Here is a function in Python to do low-pass filtering using a Butterworth filter:

def plg_lowpass(y, samprate, cutoff, order=2):
    w = cutoff / (samprate / 2) # Normalize the cutoff frequency
    b, a = sp.signal.butter(N=order, Wn=w, btype='lowpass') # get the filter coefficients
    yf = sp.signal.filtfilt(b, a, y) # perform the filtering
    return yf

One for high-pass filtering:

def plg_highpass(y, samprate, cutoff, order=2):
    w = cutoff / (samprate / 2) # Normalize the cutoff frequency
    b, a = sp.signal.butter(N=order, Wn=w, btype='highpass') # get the filter coefficients
    yf = sp.signal.filtfilt(b, a, y) # perform the filtering
    return yf

One for band-pass filtering and band-stop filtering:

def plg_bandpass(y, samprate, cutoffs, order=2):
    cutoffs = np.array(cutoffs)
    w = cutoffs / (samprate / 2) # Normalize the cutoff frequencies
    b, a = sp.signal.butter(N=order, Wn=w, btype='bandpass') # get the filter coefficients
    yf = sp.signal.filtfilt(b, a, y) # perform the filtering
    return yf

def plg_bandstop(y, samprate, cutoffs, order=2):
    cutoffs = np.array(cutoffs)
    w = cutoffs / (samprate / 2) # Normalize the cutoff frequencies
    b, a = sp.signal.butter(N=order, Wn=w, btype='bandstop') # get the filter coefficients
    yf = sp.signal.filtfilt(b, a, y) # perform the filtering
    return yf

You can download a Python file containing all 4 filter functions above here: plg_filters.py

Here we demo the four types of filters by using them to filter broadband noise:

t = np.arange(0, 10, 1/1000)
y = np.random.randn(len(t))
freqs,psd = sp.signal.welch(y,fs=1000,nperseg=100)
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,y)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('Amplitude')
ax[0].set_title('Unfiltered noise')
ax[1].plot(freqs,psd)
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('Power Spectral Density')
ax[1].set_title('Power spectrum of unfiltered noise')
ax[1].grid()
plt.tight_layout()

yf = plg_lowpass(y, 1000, 200) # low-pass filter at 200 Hz
freqs,psd = sp.signal.welch(yf,fs=1000,nperseg=100)
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,y)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('Amplitude')
ax[0].set_title('Low-passed at 200 Hz')
ax[1].plot(freqs,psd)
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('Power Spectral Density')
ax[1].set_title('Power spectrum of low-passed noise')
ax[1].grid()
plt.tight_layout()

yf = plg_highpass(y, 1000, 200) # high-pass filter at 200 Hz
freqs,psd = sp.signal.welch(yf,fs=1000,nperseg=100)
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,y)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('Amplitude')
ax[0].set_title('High-passed at 200 Hz')
ax[1].plot(freqs,psd)
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('Power Spectral Density')
ax[1].set_title('Power spectrum of high-passed noise')
ax[1].grid()
plt.tight_layout()

yf = plg_bandpass(y, 1000, [200,300]) # band-pass filter at 200-300 Hz
freqs,psd = sp.signal.welch(yf,fs=1000,nperseg=100)
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,y)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('Amplitude')
ax[0].set_title('Band-passed at 200-300 Hz')
ax[1].plot(freqs,psd)
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('Power Spectral Density')
ax[1].set_title('Power spectrum of band-passed noise')
ax[1].grid()
plt.tight_layout()

yf = plg_bandstop(y, 1000, [200,300]) # band-stop filter at 200-300 Hz
freqs,psd = sp.signal.welch(yf,fs=1000,nperseg=100)
fig,ax = plt.subplots(2,1, figsize=(6,6))
ax[0].plot(t,y)
ax[0].set_xlabel('Time (s)')
ax[0].set_ylabel('Amplitude')
ax[0].set_title('Band-stopped at 200-300 Hz')
ax[1].plot(freqs,10*np.log10(psd))
ax[1].set_xlabel('Frequency (Hz)')
ax[1].set_ylabel('Power Spectral Density')
ax[1].set_title('Power spectrum of band-stopped noise')
ax[1].grid()
plt.tight_layout()

1.9.3 Common Filters

There are many different designs of filters, each with their own characteristics (gain, phase and delay characteristics). Some common types:

Butterworth Filters have frequency responses which are maximally flat and have a monotonic roll-off. They are well behaved and this makes them very popular choices for simple filtering applications. For example in my work I use them exlusively for filtering physiological signals.
Tschebyschev Filters provide a steeper monotonic roll-off, but at the expense of some ripple (oscillatory noise) in the pass-band.
Cauer Filters provide a sharper roll-off still, but at the expense of ripple in both the pass-band and the stop-band, and reduced stop-band attenuation.
Bessel Filters have a phase-shift which is linear with frequency in the pass-band. This corresponds to a pure delay and so Bessel filters preserve the shape of the signal quite well. The roll-off is monotonic and approaches the same slope as the Butterworth and Tschebyschev filters at high frequencies although it has a more gentle roll-off near the corner frequency.

1.9.4 Filter order

In filter design the order of a filter is one characteristic that you might come across. Technically the definition of the filter order is the highest exponent in the z-domain (transfer function) of a digital filter. That’s helpful isn’t it! (not) Another way of describing filter order is the degree of the approximating polynomial for the filter. Yet another way of describing it is that increasing the filter order increases roll-off and brings the filter closer to the ideal response (i.e. a “brick wall” roll-off).

Practically speaking, you will find that a second-order butterworth filter provides a nice sharp roll-off without too much undesirable side-effects (e.g. large time lag, ripple in the pass-band, etc).

See this section of the wikipedia page on low-pass filters for another description.

1.9.5 High-frequency noise and taking derivatives

One of the characteristics of just about any experimental measurement is that the signal that you measure with your instrument will contain a combination of true signal and “noise” (random variations in the signal). A common approach is to take many measurements and average them together. This is what is commonly done in EEG/ERP studies, in EMG studies, with spike-triggered averaging, and many others. The idea is that if the “real” part of the signal is constant over trials, and the “noise” part of the signal is random from trial to trial, then averaging over many trials will average out the noise (which is sometimes positive, sometimes negative, but on balance, zero) and what remains will be the true signal.

You can imagine however that there are downsides to this approach. First of all, it requires that many, many measures be taken so that averages can be computed. Second, there is no guarantee that the underlying “true” signal will in fact remain constant over those many measurements. Third, one cannot easily do analyses on single trials, since we have to wait for the average before we can look at the data.

One solution is to use signal processing techniques such as filtering to separate the noise from the signal. A limitation of this technique however is that when we apply a filter (for example a low-pass filter), we filter out all power in the signal above the cutoff frequency—whether “real” signal or noise. This approach thus assumes that we are fairly certain that the power above our cutoff is of no interest to us.

One salient reason to low-pass filter a signal, and remove high-frequency noise, is for cases in which we are interested in taking the temporal derivative of a signal. For example, let’s say we have recorded the position of the fingertip as a subject reaches from a start position on a tabletop, to a target located in front of them on a computer screen. Using a device like Optotrak we can record the (x,y,z) coordinates of the fingertip at a sampling rate of 200 Hz. Figure 11 shows an example of such a recording.

Sample 3D movement data recorded from Optotrak at 200 Hz.

In the Figure above, the top panel shows position in one coordinate over time. The middle panel shows the result of taking the derivative of the position signal to obtain velocity. I have simply used the NumPy np.diff() function here to obtain a numerical estimate of the derivative, taking the forward difference. Note how much noisier it looks than the position signal. Finally the bottom panel shows the result of taking the derivative of the velocity signal, to obtain acceleration. It is so noisy one cannot even see the peaks in the acceleration signal, they are completely masked by noise.

What is happening here is that small amounts of noise in the position signal are amplified each time a derivative is taken. One solution is to low-pass filter the position signal. The choice of the cutoff frequency is key—too low and we will decimate the signal itself, and too high and we will not remove enough of the high frequency noise. It happens that we are fairly certain in this case that there isn’t much real signal power above 12 Hz for arm movements. The Figure below shows what it looks like when we low-pass filter the position signal at a 12Hz cutoff frequency.

Sample 3D movement data recorded from Optotrak at 200 Hz, low-pass filtered using a 12 Hz cutoff frequency.

What you can see in the Figure above is that for the position over time, the filtered version (shown in red) doesn’t differ that much, at least not visibly, from the unfiltered version (in blue). The velocity and acceleration traces however look vastly different. Differentiating the filtered position signal yields a velocity trace (shown in red in the middle panel) that is way less noisy than the original version. Taking the derivative again of this new velocity signal yields an acceleration signal (shown in red in the bottom panel) that is actually usable. The original version (shown in blue) is so noisy it overwhelms the entire panel. Note the scale change on the ordinate.

1.10 Quantization

Converting an analog signal to a digital form involves the quantization of the analog signal. In this procedure the range of the input variable is divided into a set of class intervals. Quantization involves the replacement of each value of the input variable by the nearest class interval centre.

Another way of saying this is that when sampling an analog signal and converting it to digital values, one is limited by the precision with which one can represent the (analog) signal digitally. Usually a piece of hardware called an analog-to-digital (A/D) board is the thing that performs this conversion. The range of A/D boards are usually specified in terms of bits. For example a 12-bit A/D board is capable of specifying $2^{12}=4096$ unique values. This means that a continuous signal will be represented using only 4096 possible values. A 16-bit A/D board would be capable of using $2^{16}=65,536$ different values. Obviously the higher the better, in terms of the resolution of the underlying digital representation. Often however in practice, higher resolutions come at the expense of lower sampling rates.

As an example, let’s look at a continuous signal and its digital representation using a variety of (low) sample resolutions. The Figure below shows a range of sample resolutions.

A continuous signal sampled at a variety of (low) sampling rates, showing quantization.

Here we see as the number of possible unique values increases, the digital representation of the underlying continuous signal gets more and more accurate. Also notice that in general, quantization adds noise to the representation of the signal.

It is also important to consider the amplitude of the sampled signal compared to the range of the A/D board. In other words, if the signal you are sampling has a very small amplitude compared to the range of the A/D board then essentially your sample will only be occupying a small subset of the total possible values dictated by the resolution of the A/D board, and the effects of quantization will be greatly increased.

For example, let’s say you are using an A/D board with 12 bits of resolution and an input range of +/- 5 Volts. This means that you have $2^{12}=4096$ possible values with which to characterize a signal that ranges maximally over 10 Volts. If your signal is very small compared to this range, e.g. if it only occupies 25 millivolts, then the A/D board is only capable of using $0.0025/10*4096=10$ (ten) unique values to characterize your signal! The resulting digitized characterization of your signal will not be very smooth.

Whenever possible, amplify your signal to occupy the maximum range of the A/D board you’re using. Of course the trick is always to amplify the signal without also amplifying the noise!

1.11 Sources of noise

It is useful to list a number of common sources of noise in physiological signals:

Extraneous Signal Noise arises when a recording device records more than one signal—i.e. signals in addition to the one you as an experimenter are interested in. It’s up to you to decide which is signal and which is noise. For example, electrodes placed on the chest will record both ECG and EMG activity from respiratory muscles. A cardiologist might consider the ECG signal and EMG noise, while a respiratory physiologist might consider the EMG signal and the ECG noise.
1/f Noise: Devices with a DC response sometimes show a low frequency trend appearing on their output even though the inputs don’t change. EEG systems and EOG systems often show this behaviour. Fourier analyses show that the amplitude of this noise increases as frequency decreases.
Power or 60 Hz Noise is interference from 60 Hz AC electrical power signals. This is one of the most common noise sources that experimental neurophysiologists have to deal with. Often we find, for example, on hot days when the air conditioning in the building is running, we see much more 60 Hz noise in our EMG signals than on other days. Some neurophysiologists like to do their recordings late at night or on weekends when there is minimal activity on the electrical system in their building.
Thermal Noise arises from the thermal motion of electrons in conductors, is always present and determines the theoretical minimum noise levels for a device. Thermal noise is white (has a Gaussian probability distribution) and thus has a flat frequency content — equal power across all frequencies.