MATLAB is an
extremely powerful tool for dealing with sounds. You can use MATLAB
for sound synthesis as well as for sound analysis. It is possible
to create your own custom sound from scratch, and it is also
possible to edit at will an existing sound. Furthermore, MATLAB can
be used to understand several acoustical characteristics of digital
sounds. This chapter shows how to do all of these.
Generate a Sound
First, let’s generate a white noise (i.e., a
random succession of amplitude values) having a duration of 1 s and
a sample rate of 44,100 Hz and then play it.1
Listing 5.1
![A978-1-4614-2197-9_5_Figa_HTML.gif](A978-1-4614-2197-9_5_Figa_HTML.gif)
Lines 1 and 2 set the sample rate and the sound
duration. Line 3 implements the vector noise; to the rand() function we
have passed the product of the sample rate and the duration of the
sound in seconds. This operation will be repeated several times in
this chapter; it returns the array length of the digital sound, at
the desired sample rate. To prevent MATLAB warnings such as
“Warning: Size
vector should be a row vector with integer elements”, it may
be convenient to round the result of this multiplication. Lines 4
and 5 expand and translate the amplitude values of the white noise
within the −1/+1 range. We need to do this because MATLAB wants
sounds to range within the −1/+1 limits.
The next thing we may wish to do is to synthesize
a pure tone, i.e., a sinusoidal tone. We now synthesize a pure tone
of 1,000 Hz and 1 sec duration (a must in psychoacoustics).
Listing 5.2
![A978-1-4614-2197-9_5_Figb_HTML.gif](A978-1-4614-2197-9_5_Figb_HTML.gif)
In Listing 5.2, we calculate the sine of an
argument that completes f cycles (i.e.,
2*pi) in
time t.
The time t
has been represented digitally, with an array starting at
0 and
arriving at the specific sample rate (44,100 Hz) at the overall
sound’s duration. With these few command lines, we can create
different pure tones by changing f and d.
Now let’s play the tone twice using the command
sound
twice in rapid succession.
>> sound(tone,
sr)
>> sound(tone,
sr)
If your fingers were fast enough you could hear a
single unpleasant tone rather than two tones in succession. This is
because the sound function works
asynchronically; in other words, the second tone would be played
while the first tone is still playing.
There are two ways to avoid this. The first is to
use (wavplay)2
which lets you decide whether you want to work synchronically
(‘sync’
parameter) or asynchronically (‘async’ parameter). Use
the function in the following way to play the tones synchronically
(i.e., the second tone is played only after the first tone has
ended).
wavplay(tone, sr,
‘sync’);
wavplay(tone,
sr);
If you pass the parameter ‘async’ instead of
‘sync’,
you will get the same unpleasant output resulting from the use of
sound().
Another way to avoid the asynchrony problem is to use the sound
functions included in PsychToolbox. These functions will be
described in the chapters dedicated to this toolbox.
The tone we have created can be saved in your
computer as a wav file by means of the wavwrite() function.
wavwrite()
transforms the array into a wav sound file. Note that if you do not
want a clipped sound, the array’s values should range between −1
and +1. wavwrite() quantizes
the values of the given vector according to the number of bits you
are using (16 by default). This means that if you are working at 16
bits, the lower value without clipping is −1 +
1/(2nbit−1). In the pure tone example, the sin() function returns
values in the range −1 to +1, and therefore all −1s are clipped
when playing the file. If you multiply the sound array by the
minimum possible quantized value, i.e., −1 +
1/(2nbit−1), then the sound amplitude will be
compressed into a sound file without clipping.
>> wavwrite(tone, sr,
‘my_first_tone’)
Warning: Data clipped during
write to file:my_first_tone
> In
wavwrite>PCM_Quantize at 247
In wavwrite>write_wavedat
at 267
In wavwrite at
112
Matlab clips the lower values
and gives a warning message. For common usage, the error is
insignificant (a quantization step). If you wish not to obtain a
warning message, you can rescale the signal using the following
formula:
>> nbits=16;
tone=tone*(1-1/(2^nbits-1));
>> wavwrite(tone, sr,
‘my_first_tone’)
We are now ready for something more complex:
let’s build a harmonic tone that is the sum of two or more pure
tones having a harmonic relation. Let’s create a complex tone with
three harmonics and with a fundamental frequency of 250 Hz:
Listing 5.3
![A978-1-4614-2197-9_5_Figc_HTML.gif](A978-1-4614-2197-9_5_Figc_HTML.gif)
In Listing 5.3 we have generated three sine waves
(lines 5, 6, and 7) and then added the simple tones together to
create a complex tone. Line 9 needs a bit of clarification. Since
the complex variable implemented in line 7 results from the sum of
three components, its amplitude exceeds the −1/+1 range. Therefore,
if we play the tone as it is, it will result in a distorted sound
(and MATLAB does not warn you about this distortion!). To avoid any
distortions, we need to normalize the sound, which is what line 8
does. The normalization is done by dividing the harmonic variable
by its maximum absolute value.
Another solution to the distortion problem is to
use the soundsc() function,
which automatically scales the sound’s amplitude within the −1/+1
range no matter how “large” the sound’s amplitude is. So lines 8
and 9 could have been replaced by the following code:
soundsc(harmonic,
sr);
We are now able to produce pure and complex tones
of any desired frequency and duration. By the same token, we can
now produce an inharmonic tone, i.e., a tone whose frequency
components are not in a harmonic relation.
Listing 5.4 generates an inharmonic tone:
Listing 5.4
![A978-1-4614-2197-9_5_Figd_HTML.gif](A978-1-4614-2197-9_5_Figd_HTML.gif)
In this example, we have solved the distortion
problem by using soundsc() instead of
normalizing the sound. Let’s now change the timbre of the complex
tone by changing its waveform so that the sound is, for example, a
sawtooth wave. In sawtooth waves, the amplitude of each successive
harmonic is half of the amplitude of the previous harmonic.
Therefore, we have to multiply the amplitude of each successive
harmonic by a factor ½, ¼, 1/8 and so on. Let us start with the
following code:
Listing 5.5
![A978-1-4614-2197-9_5_Fige_HTML.gif](A978-1-4614-2197-9_5_Fige_HTML.gif)
If we plot the first 10 ms of the sound (i.e.,
the first 441 samples, since we are working at a sample rate of
44,100 Hz), we see that the waveform does not look like a sawtooth
wave (Fig. 5.1):
![A978-1-4614-2197-9_5_Fig1_HTML.gif](A978-1-4614-2197-9_5_Fig1_HTML.gif)
Fig.
5.1
A sawtooth wave with a limited number of
harmonics
plot(complex(1:441))
This is because of the limited number of
harmonics we have used to generate the wave. However, if we make
the complex tone with, let’s say, 20 harmonics, then the waveform’s
shape will be more sawtooth-like. The following code listing shows
how to do it by means of a for cycle:
Listing 5.6
![A978-1-4614-2197-9_5_Figf_HTML.gif](A978-1-4614-2197-9_5_Figf_HTML.gif)
In each for cycle, the
amplitude of successive components is halved. If you now plot the
first 441 samples, you can check that the waveform is much smoother
than the previous one. Moreover, if we plot some of the single
components (the first three, for example), we can get an idea of
the summation process we have implemented. Listing 5.7 does the
job.
Listing 5.7
![A978-1-4614-2197-9_5_Figg_HTML.gif](A978-1-4614-2197-9_5_Figg_HTML.gif)
Multiple Sounds
Sometimes it can be useful to create short
melodies, i.e., sequences of tones. In order to do this, we have to
concatenate the various arrays of the tones (and silences) we want
to put in our little tune. Listing 5.8 plays the notes middle C, D,
and E, followed by a pause of 1 s and then the note C again.
Listing 5.8
![A978-1-4614-2197-9_5_Figh_HTML.gif](A978-1-4614-2197-9_5_Figh_HTML.gif)
Let’s run a more “psychological” example: let us
play the pulsation threshold example. The pulsation threshold
(Houtgast 1972) is a case of
auditory continuity. The stimulus we need to synthesize is a
succession of brief noises and tones, for example a noise followed
by a tone, followed by a noise, and so on. If the intensity of the
tone is low in comparison with that of the noise, the tone is heard
as continuous, i.e., the tone is heard even within the noise (even
though there is no tone within the noise).
Let’s synthesize the tone and the noise:
>> sr =
44100;
>> f =
1000;
>> d =.2;
>> t = linspace(0, d,
d*sr);
>> tone =
sin(2*pi*f*t);
>> noise = (rand(1,
sr*d)*2)-1;
We have now to reduce the amplitude level of the
tone. Here, we first reduce the tone’s amplitude 1/100 (i.e., 40
dB) from its original amplitude and successively create the
noise–tone sequence:
>> tone = tone
*.01;
>> sequence = [noise,
tone];
>> sequence =
[sequence, sequence]; % sequence of 4 sounds
>> sequence =
[sequence, sequence]; % sequence of 8 sounds
>> sequence =
[sequence, sequence]; % sequence of 16 sounds
>> sequence =
[sequence, sequence]; % sequence of 32 sounds
>> sound(sequence,
sr)
You should be able to hear the tone as continuous
within the noise bands (although you should be well aware that
there is no tone within the noise bands!).
Let’s suppose you need to synthesize a number of
tones sounding together in time, but one (or more) of the tones
needs to have just a little onset asynchrony so that it can be
heard as popping out from the other tones sounding together (Darwin
and Ciocca 1992). Here we implement
this example with three tones and add a little onset asynchrony to
the second and third tones. First, we add three tones together so
that they begin simultaneously in time.
>> sr =
44100;
>> f1 =
300;
>> f2 =
550;
>> f3 =
640;
>> d = 5;
>> t = linspace(0, d,
d*sr);
>> tone1 =
sin(2*pi*f1*t);
>> tone2 =
sin(2*pi*f2*t);
>> tone3 =
sin(2*pi*f3*t);
>> complex =
tone1+tone2+tone3;
>>
soundsc(tone1+tone2+tone3, sr)
Now we add the temporal offset to the second and
third tones:
>> offset_dur
=.5;
>> offset = zeros(1,
sr*offset_dur);
>> tone1 = [tone1,
offset, offset];
>> tone2 = [offset,
tone2, offset];
>> tone3 = [offset,
offset, tone3];
>>
soundsc(tone1+tone2+tone3, sr)
If you listen to the sound, you may note that the
individual components are indistinguishable in the first example.
However, in the second example they can be easily distinguished.
This is because of the onset asynchrony of the tones.
Manipulating a Sound’s Level
With MATLAB, it is possible to manipulate
precisely the relative levels of sounds. In fact, the absolute
level of a sound depends on several factors such as the sound card
of your computer, the headphones, the loudspeakers, the amplifiers,
and so on. The digital synthesis moves within a dynamic range of
2^bits (2^16, i.e., 96 dB, is a very common dynamic range for most
PCs). In digital audio, 0 is arbitrarily set as the maximum level,
and all softer levels are represented with negative numbers. In
practice, a sound whose level is 0 dB is louder than a sound whose
level is −10 dB. For example, let us play a tone twice but the
second time 10 dB softer:
Listing 5.9
![A978-1-4614-2197-9_5_Figi_HTML.gif](A978-1-4614-2197-9_5_Figi_HTML.gif)
When you are working with a sound’s amplitude,
you should use the sound() function
rather than the soundsc() function,
because soundsc() normalizes
the sound’s amplitude before playing the sound. Therefore, any
manipulation you have done on the sound’s amplitude is ineffective
if you use soundsc.
The same amplitude attenuation can be done with a
noise:
Listing 5.10
![A978-1-4614-2197-9_5_Figj_HTML.gif](A978-1-4614-2197-9_5_Figj_HTML.gif)
In Listing 5.10, the level of the second noise is
20 dB lower than that of the first one.
Match the Level of Sound with Different Waveforms
In this section we show how to match the levels
of sounds of differing waveforms. Let’s suppose that we need to
play several environmental sounds during an experiment. Let’s also
suppose that we want these (very different) sounds to be perceived
as similarly loud. These sounds have different waveforms.
Therefore, the sound pressure of each sound will be different.
There are two options for matching the levels of sounds having
different waveforms. The first option is to normalize the amplitude
of the sounds. For example, let’s suppose that we have two sounds
stored in two arrays (s1 and s2) in our workspace.
To normalize the amplitude of the two sounds, we need to increase
their amplitudes so that the peak for both sounds is equal to one.
This can be done in this way:
>> s1 =
s1/max(abs(s1));
>> s2 =
s2/max(abs(s2));
A simpler alternative could be, once again, that
of using the sounsc() function,
which normalizes the sounds’ amplitudes before playing them.
The second option is to match the two sounds
using the root mean square (RMS) amplitude. This process is often
more efficient than normalization, in particular when the sounds
are very different. Listing 5.11 shows how to do this.
Listing 5.11
![A978-1-4614-2197-9_5_Figk_HTML.gif](A978-1-4614-2197-9_5_Figk_HTML.gif)
Analysis
The root mean square of each sound can be
calculated as in lines 1 and 2. If the first sound is on average
greater in amplitude than the second, we need to attenuate its
amplitude (lines 4–6). In contrast, we may need to amplify the
amplitude of the second (lines 7–9).
The two sounds now have identical mean
amplitudes. In fact, if we recalculate the RMS amplitude, we will
find out that the second sound has the same mean amplitude as the
first.
Stereophonic Sounds for ITD and ILD3
We can use MATLAB to create sounds with simulated
interaural time difference (ITD) or interaural level difference
(ILD). First, let’s create a pure tone of 3,000 Hz and 500 ms
duration (recall that ILD cues are relevant at relatively high
frequencies in particular, Moore 2003).
>> sr =
44100;
>> f = 3000; % the
tone’s frequency (in Hz)
>> d =.5;
>> t = linspace(0, d,
sr*d);% a time vector
>> tone =
sin(2*pi*f*t);
All the sounds we have created so far are
monophonic, and monophonic sounds are useful in the majority of
psychological applications. If the monophonic sound is coded within
a single array, the corresponding stereophonic sound will be coded
within two arrays, the first containing the sound for the left
channel and the second containing the sound for the right channel.
Therefore:
>> stereo_tone =
[tone’, tone’];
>> sound(stereo_tone,
sr);
Note that both arrays need to be rotated. This is
because the sound() function wants
as input a stereophonic matrix having two columns, one for the left
channel and one for the right channel. This can be tested by
playing one channel at time. In the following example, we play the
sound, but we multiply the amplitude of the left channel by
zero:
>> sound([tone’*0,
tone’], sr)
Because the left channel consists entirely of 0s,
it is silent. You can make the right channel silent by passing 0s
to the right column of the matrix in this way:
>> sound([tone’,
tone’*0], sr)
Let’s suppose we want an ILD of 10 dB that
simulates a sound source at your left side. What we need to do is
to attenuate the right channel by 10 dB, as follows:
>> stereo_tone =
[tone’, tone’*10^(-10/20)];
>> sound(stereo_tone,
sr)
You can hear that the tone is louder in the left
channel than in the right channel. Mutatis mutandis, the same
operations (inverting the matrix columns) can be used to create a
sound that is perceived as coming from the right.
It is only somewhat more complex to add
interaural phase difference (ITD) to our sounds. To get ITD into
our sounds we need to control the sound’s phase. Theoretically, the
ITD is a temporal difference, usually expressed in microseconds, in
the arrival of the sound at the two ears. However, because of this
temporal difference, the sound arrives at each ear with a different
phase. This phase difference is called interaural phase difference
(IPD). In digital synthesis, it is easier to create a phase
difference than a temporal difference. The IPD cue is important for
sound localization along the azimuth for frequencies up to 1,500 Hz
(Moore 2003). Let’s first write a
command for controlling a tone’s phase. It can be done as
follows:
>> sr =
44100;
>> f =
250;
>> d =.5;
>> t = linspace(0, d,
sr*d);
>> phase = 0; %
starting phase of the tone in radians
>> tone =
sin(2*pi*f*t+phase);
Now let’s suppose we want to create a sound that
seems to originate from the right channel. This channel leads, and
it is followed after a certain ITD by the sound that arrives at the
left ear. Suppose that we want to simulate an ITD of 0.4 ms, i.e.,
4*10^–4:
>> ITD =
4*10^–4;
>> IPD =
2*pi*f*ITD;
>> phase_left =
0;
>> phase_right =
IPD;
>> tone_left =
sin(2*pi*f*t+phase_left);
>> tone_right =
sin(2*pi*f*t+phase_right);
>> sound([tone_left’,
tone_right’], sr)
As you can hear, the tone seems to be originating
from your right.
We are now ready to do something more complex.
Let’s say that we want to play 13 sounds starting from the left
channel and slowly moving to the right one. Listing 5.12 does
this:
Listing 5.12
![A978-1-4614-2197-9_5_Figl_HTML.gif](A978-1-4614-2197-9_5_Figl_HTML.gif)
At every iteration of the for loop, we synthesize
a stereo tone and play it. The ITD variable content is changed at
every iteration by multiplication of the variable i, and this results in
a different ITD value every time i changes its
value.
A Sound’s Envelope
You may have noticed that in all the sounds we
have played so far, there were audible clicks both at the beginning
and at the end of the sounds. This is because the amplitude of the
sound at onset and offset started abruptly. To remove these
disturbing clicks, we need to modulate the amplitude of the very
first and very last portions of the sound with an onset and offset
ramp, whereas we need to leave unmodulated the middle portion of
the sound. In other words, we need to smooth the onset and offset a
bit so that the clicks will be inaudible. This smoothing is a very
common operation, and usually ramps of 10 ms are sufficient to
smooth the sound. In the majority of cases, onset and offset are
modulated with raised cosine ramps, i.e., half of a cosine cycle,
and precisely the half ranging from π to 2π:
>> sr = 44100
>> gatedur =.01; % the
duration of the gate in seconds (i.e., 10 ms)
>> gate =
cos(linspace(pi, 2*pi, sr*gatedur));
Let’s now generate a tone:
>> f = 250; % frequency
of the tone in Hz
>> d =.5; % duration of
the tone in seconds
>> time = linspace(0,
d, sr*d); % a time vector
>> tone =
sin(2*pi*f*time);
If we play it, we can hear the onset and offset
clicks:
>> sound(tone,
sr)
If we want to modulate the amplitude of the
tone’s onset and offset, we need to multiply the tone by an
envelope. The envelope of a sound is an imaginary line connecting
all the positive (or negative) peaks of the sound. In digital
synthesis, the envelope’s values must lie within the 0/+1 range.
Therefore, we now need to adjust our modulator (the gate variable
created previously) so that its range is within the 0/+1
limits:
>> gate = gate+1; %
this operation translates all the values of the modulator to the
0/+2 range
>> gate = gate/2; %
this operation compresses the values within the 0/+1
range
Now the gate is within the correct range. We can
now easily create the offset gate by flipping the array containing
our onset gate as follows:
>> offsetgate =
fliplr(gate);
The last thing we need to do is to create the
“sustain” portion of our envelope, in other words, the portion that
will not modulate the sound’s amplitude. Because we are going to
perform a multiplication, if we want the tone’s central part
unchanged after the multiplication, we need to create an array of
ones, the neutral factor for the multiplication. The length of the
sustain portion will be equal to the tone’s length minus the
lengths of the onset and offset gates. In this way, the length of
our envelope (i.e., onset gate, sustain, and offset gate) will be
identical to that of the tone:
>> sustain = ones(1,
(length(tone)-2*length(gate)));
>> envelope = [gate,
sustain, offsetgate];
>> smoothed_tone =
envelope.* tone;
If we now play the tone, we no longer hear any
clicks. Moreover, a graph showing the original tone, the envelope,
and resulting smoothed tone may explain visually what we have done
so far (Fig. 5.2).
![A978-1-4614-2197-9_5_Fig2_HTML.gif](A978-1-4614-2197-9_5_Fig2_HTML.gif)
Fig.
5.2
At the top part of the figure you can see the
original sound. At the center is shown the envelope that was
used to modulate the sound’s onset and offset. At the bottom of the figure is displayed the
resulting modulated sound. Note that the onset and offset are not
abrupt as in the top left
graph
>> sound(smoothed_tone,
sr)
>> subplot(3, 1, 1);
plot(t, tone)
>> subplot(3, 1, 2);
plot(time, envelope, ’o’)
>> subplot(3, 1, 3);
plot(time, smoothed_tone)
Sound Filtering
In this section we describe how to filter sounds.
One of the purposes of filtering is to generate noises of various
kinds (e.g., lowpass, highpass, bandpass). In this section we will
see how to filter a noise and how to generate any kind of filtered
noise. The first thing we have to do is to create a white
noise4:
>> sr =
44100;
>> d = 1; % the
duration of the noise (in sec)
>> noise = (rand(1,
sr*d)*2)-1;
>> sound(noise,
sr);
In order to filter our noise, we need to look at
our sound in the frequency domain, i.e., we need to apply a fast
Fourier transform (FFT):
>> fnoise =
fft(noise);
The magnitude spectrum of our noise looks almost
flat along the frequency axis, as it should for white noises.
Command
|
Graphical result
|
---|---|
>>
plot(20*log10(abs(fnoise(1:22050))))
|
![]() |
As you can see, the FFT command returns an array
as long as the original noise array. However, the new array
contains complex numbers. This array is our original noise.
However, now we are looking at it from the frequency domain rather
than from the time domain. Therefore, if in the original array each
element represented a given amplitude value in time, in the new
array, each element represents a frequency with its relative
amplitude and phase (note that we are dealing with complex numbers,
i.e., numbers having a real and an imaginary part). In the array
returned by the FFT, all frequencies up to the half of the sampling
rate (i.e., the Nyquist frequency) will be represented twice and
symmetrically (see below).
If we want to filter the noise, we need to create
a filter array as long as our noise array. The filter array will
contain elements ranging from 0 to 1. Later, we will convolve this
filter array on the noise array. The frequencies convoluted with
elements smaller than 1 will be filtered, and the filtering will be
greater, the closer the value of the filter is to 0. The
frequencies convoluted with 1 will, however, be untouched. Now
let’s suppose we want to create a low-pass filter with a cutoff
frequency of 5 kHz. We have to proceed as at lines 3–7 of the
following script:
Listing 5.13
![A978-1-4614-2197-9_5_Fign_HTML.gif](A978-1-4614-2197-9_5_Fign_HTML.gif)
Listing 5.13 creates the filter. Note that the
filter is symmetric because frequencies are represented twice and
symmetrically within the noise array. Specifically, we have
convolved5 our
filter to the noise (line 10) and plotted the spectrum of the
filtering (line 11) process. As you can see, a large band of
frequency components is now missing (what was removed by the
filter). Moreover, you can see the symmetric representation of
frequency. If we want to listen to our filtered noise, there are a
couple of more things to do. The filtered noise is represented in
the frequency domain, but
we need to represent it in the time domain. To do this, we first apply
the inverse of the FFT, and then we extract the real part of the
complex number array:
>> fnoise =
ifft(fnoise);
>> fnoise =
real(fnoise);
>> fnoise =
fnoise/max(abs(fnoise));
>> sound(fnoise,
sr)
Note that we have normalized the noise’s
amplitude before playing it. This is because the filtering can
return a sound whose amplitude exceeds the −1/+1 range.
In Listing 5.14 we include a plot to represent
the filtering, step by step. In the example, we band-pass filter a
Gaussian white noise and keep the frequency components from 2,000
to 6,000 Hz:
Listing 5.14
![A978-1-4614-2197-9_5_Figo_HTML.gif](A978-1-4614-2197-9_5_Figo_HTML.gif)
![A978-1-4614-2197-9_5_Figp_HTML.gif](A978-1-4614-2197-9_5_Figp_HTML.gif)
Sound Analysis
The analysis of the sound’s acoustical
characteristics can range from very simple to very complex (see
Giordano and McAdams 2006 for a
list of possible sophisticated analyses). In this section we show
only some simple analyses because these are the most common in
psychology. The first things we may want to know are the digital
characteristics of the sound. These characteristics are returned by
the wavread() function, which takes as argument the wav filename.
Let’s see how this function works, assuming that we have saved a
sound in mywavefile.wav:
>> [s, sr, bits] =
wavread(‘mywavefile’);
Here s is a vector variable
containing the sound. The sr and bits variables are the
sound’s sample rate in hertz and the sound’s resolution in bits,
respectively. The sound’s duration can be obtained by dividing the
length of the sound vector by the sample frequency:
>> duration =
length(s)/sr;
The root mean square power of a sound can be
obtained by squaring sound’s array and then calculating the average
of the resulting values. Finally, we calculate the square root.
Because the returned value is linear, we may want to transform it
into decibel units.
>> s2 =
s^2;
>> ms2 = mean
(s2);
>> rms =
sqrt(ms2);
>> dB_rms =
20*log10(rms);
If we want to know the peak amplitude of the
file, we just need to calculate the maximum absolute value of the
sound vector as follows:
>> peak =
max(abs(s));
>> peak_dB =
20*log10(peak);
We may want to look at the sound’s magnitude
spectrum in order to obtain its frequency content. We first
calculate the fast Fourier transform (FFT) of the sound, and then
obtain the absolute value to plot the magnitude spectrum. The
magnitude spectrum repeats itself. Therefore, we have to plot only
the first half of the data:
>> s_fft =
fft(s);
>> s_fft =
abs(s);
>> s_fft =
s_fft(1:length(s_fft)/2);
>>
plot(s_fft)
We now may want to see the amplitude of the
various frequency components on a decibel scale rather than on a
linear scale. Moreover, we may want to have a more meaningful
x-axis as well as to add labels to the plot (Fig. 5.3):
![A978-1-4614-2197-9_5_Fig3_HTML.gif](A978-1-4614-2197-9_5_Fig3_HTML.gif)
Fig.
5.3
Magnitude spectrum of “chirp.” The
following spectrum was plotted by loading the variable “chirp” and
then by performing the operations written in the text. Note that
when you load chirp, the sound array automatically gets the
variable name y, whereas the sample rate gets the variable name Fs.
Therefore, the following lines must be written at the beginning of
the script: load chirp; s = y; sr = Fs; Note that the spectrum
of the chirp shows peaks at 500 Hz, 1,500 Hz, 2,500 Hz, showing an
evident harmonic structure of the spectrum
>> f = linspace(0,
sr/2, length(s_fft));
>> plot(f,
20*log10(s_fft))
>> xlabel(‘frequency
(Hz)’)
>>
ylabel(‘dB’)
Summary
-
Sounds are represented by arrays and should lie within the −1/+1 range.
-
White noises (and noises in general) are generated by means of the rand (or randn) function.
-
Pure tones are generated by means of the sin function.
-
Complex tones are generated by adding two or more simple tone arrays.
-
Silences can be generated with arrays of zeros.
-
Sounds can be played with the sound, soundsc, and wavplay functions. soundsc scales normalize the sound’s amplitude automatically. Wavplay is the only MATLAB command that makes it possible to work synchronically.
-
A sound’s sequences are generated by concatenating the sound’s arrays.
-
A sound’s level can be manipulated by multiplying the sound’s array by the desired factor of attenuation/amplification.
-
Stereophonic sounds use matrices containing two columns (i.e., two sound arrays).
-
A sound’s envelope can be manipulated by multiplying the sound’s array by an envelope array. The envelope array must contain values within the 0–1 range.
-
Sounds can be filtered by means of the fft function.
Exercises
1.
Synthesize a 2-kHz, 2-s duration pure tone.
2.
Synthesize a white noise of 0.5 s duration.
3.
Synthesize a white noise of 0.5 s duration
followed by 0.1 s of silence and then the noise again.
4.
Synthesize a 200-Hz four-harmonic complex tone
(0.5 s duration), followed by the same tone attenuated by 30
dB.
5.
Write the previous tone to a wav file.
6.
Open the file just created and calculate the root
mean square of the first half of the sound and of the second half
of the sound and calculate the level difference between the two
tones in dB.
7.
Synthesize the following stimulus for a forward
masking experiment: a band-pass noise of 200 ms followed by a
sinusoidal signal of 1,000 Hz. The noise frequency content must
range within the 500–2,000 Hz limits.
8.
Synthesize a pure tone (300 Hz, 0.2 s duration)
with 600 μs of ITD.
9.
Synthesize a 500-ms low-pass filtered noise
(cutoff frequency 5,000 Hz) and smooth the onset and the offset of
the noise with 50-ms raised cosine ramps.
A Brick for an Experiment
In the experiment developed by Sekuler et al.
(1997), the visual display of a
disc’s motion is accompanied by a sound. The authors actually used
more than one sound in the various conditions of the experiment.
The sound, described in greater detail, is a tone of 440 Hz and 100
ms duration. No other details are provided. For our brick
experiment we can use a sine wave of 440 Hz and of 50 ms duration
(the effect observed by Sekuler et al. is more compelling if the
sound has a short duration), gated on and off with two 5-ms raised
cosine ramps, just to prevent clicks at the onset and offset of the
sound. Let’s create this sound:
% stimuli
(creation)
sr = 44100;
d =.05;
f = 440;
% tone synthesis
t = linspace(0, 0.1,
sr*d);
tone =
sin(2*pi*f*t);
% ONSET AND OFFSET
GATING
gatedur =.005; % the duration
of the gate in seconds (i.e., 5-ms)
ongate = cos(linspace(pi,
2*pi, sr*gatedur));
ongate =
ongate+1;
ongate =
ongate/2;
offgate =
fliplr(ongate);
sustain = ones(1,
(length(tone)-2*length(ongate)));
envelope = [ongate, sustain,
offgate];
tone = tone.*
envelope;
This code needs to be extended for our purposes.
In our experiment the sound needs to be switched on while the discs
are in motion, and in particular, when the discs overlap. In later
chapters, we will see that this is going to happen at frame 70
(i.e., at the x coordinate of 140). We have to append, at the
beginning of the sound, a silence for the duration we have to wait
before switching on the sound. To do that, we need to anticipate
the use of the screen function. This function will be extensively
described later. The screen is the most important function of the
psychtoolbox functions and can do several things. One of the things
that this function does is to interrogate the video card to know
the refresh rate that is currently set in your computer. Here we
use screen to get the refresh rate of your monitor.
refreshrate =
FrameRate(screennumber); % get the frame rate of the
monitor
silencepre_dur =
(1/refreshrate) * 70;
silencepre = zeros(1,
round(sr * silencepre_dur));
SoundToPlay = [silencepre,
tone];
Now write everything within a single GenerateSound()
function. This function receives as argument a digit representing
the sound we want to play (1 for no sound, 2 for the sound) that is
stored in the EventTable and a screen number (see later chapters).
The function’s output will be a sound array. Note that the input
number is used at the end of the function to decide whether to
return silence or the tone.
Listing 5.15
![A978-1-4614-2197-9_5_Figq_HTML.gif](A978-1-4614-2197-9_5_Figq_HTML.gif)
References
Moore BCJ (2003) An
introduction to the psychology of hearing, 5th edn. Academic, San
Diego
Sekuler R, Sekuler AB, Lau R
(1997) Sound alters visual motion perception. Nature 385: 308
Suggested Readings
There are a number of MATLAB tools developed by
researchers that can be used in audition. Here is a certainly
incomplete list of the available tools:
Malcom Stanley has released a
toolbox that implements several popular auditory models:
Pérez E, Rodriguez-Esteban R
(2006) Oreja: a MATLAB environment for the design of psychoacoustic
stimuli. Behav Res Methods 38:574–578 (The Oreja software package
was designed to study speech intelligibility. It is a tool that
allows manipulation of speech signals to facilitate study of human
speech perception.)PubMedCrossRef
Readers interested in MATLAB
tools for audition should take a look at the following journals:
Behavior Research Methods
and the Journal of Neuroscience
Methods. Both journals often publish MATLAB tools for
audition.
Readers Interested in more
technical and advanced audio processing should read the following
book:
McLoughlin I (2009) Applied
speech and audio processing: With Matlab examples. Cambridge:
Cambridge University PressCrossRef
Footnotes
1
Although MATLAB enables the generation of sounds
at several sample rates, 44,100 Hz is the most used. This is, for
example, the sample rate of an audio CD.
4
Note that it is possible to generate white noises
according to various distributions. For example, the rand function returns
random and uniformly distributed numbers. Therefore, our white
noise would be a “uniform white noise.” If we use randn, a function that
generates random numbers according to the normal distribution, we
would have a Gaussian white noise instead. See, for instance,
Wikipedia for further details.
5
Convolution is an operation done in time. Thanks
to Fourier transform properties, the convolution operation in time
become a simple product in frequency between the Fourier transform
of the noise and the filter frequency response.