Mauro Borgo, Alessandro Soranzo and Massimo GrassiMATLAB for Psychologists201210.1007/978-1-4614-2197-9_5© Springer Science+Business Media, LLC 2012

5. A Better Sound

Mauro Borgo, Alessandro Soranzo2 and Massimo Grassi3
(1)
Via Marosticana 168, Dueville, VI, Italy
(2)
School of Social Science & Law, University of Teesside, Middlesbrough, UK
(3)
Department of General Psychology, University of Padova, Padova, Italy
 
Abstract
MATLAB is an extremely powerful tool for dealing with sounds. You can use MATLAB for sound synthesis as well as for sound analysis. It is possible to create your own custom sound from scratch, and it is also possible to edit at will an existing sound. Furthermore, MATLAB can be used to understand several acoustical characteristics of digital sounds. This chapter shows how to do all of these.
MATLAB is an extremely powerful tool for dealing with sounds. You can use MATLAB for sound synthesis as well as for sound analysis. It is possible to create your own custom sound from scratch, and it is also possible to edit at will an existing sound. Furthermore, MATLAB can be used to understand several acoustical characteristics of digital sounds. This chapter shows how to do all of these.

Generate a Sound

First, let’s generate a white noise (i.e., a random succession of amplitude values) having a duration of 1 s and a sample rate of 44,100 Hz and then play it.1
Listing 5.1
A978-1-4614-2197-9_5_Figa_HTML.gif
Lines 1 and 2 set the sample rate and the sound duration. Line 3 implements the vector noise; to the rand() function we have passed the product of the sample rate and the duration of the sound in seconds. This operation will be repeated several times in this chapter; it returns the array length of the digital sound, at the desired sample rate. To prevent MATLAB warnings such as “Warning: Size vector should be a row vector with integer elements”, it may be convenient to round the result of this multiplication. Lines 4 and 5 expand and translate the amplitude values of the white noise within the −1/+1 range. We need to do this because MATLAB wants sounds to range within the −1/+1 limits.
The next thing we may wish to do is to synthesize a pure tone, i.e., a sinusoidal tone. We now synthesize a pure tone of 1,000 Hz and 1 sec duration (a must in psychoacoustics).
Listing 5.2
A978-1-4614-2197-9_5_Figb_HTML.gif
In Listing 5.2, we calculate the sine of an argument that completes f cycles (i.e., 2*pi) in time t. The time t has been represented digitally, with an array starting at 0 and arriving at the specific sample rate (44,100 Hz) at the overall sound’s duration. With these few command lines, we can create different pure tones by changing f and d.
Now let’s play the tone twice using the command sound twice in rapid succession.
>> sound(tone, sr)
>> sound(tone, sr)
If your fingers were fast enough you could hear a single unpleasant tone rather than two tones in succession. This is because the sound function works asynchronically; in other words, the second tone would be played while the first tone is still playing.
There are two ways to avoid this. The first is to use (wavplay)2 which lets you decide whether you want to work synchronically (‘sync’ parameter) or asynchronically (‘async’ parameter). Use the function in the following way to play the tones synchronically (i.e., the second tone is played only after the first tone has ended).
wavplay(tone, sr, ‘sync’);
wavplay(tone, sr);
If you pass the parameter ‘async’ instead of ‘sync’, you will get the same unpleasant output resulting from the use of sound(). Another way to avoid the asynchrony problem is to use the sound functions included in PsychToolbox. These functions will be described in the chapters dedicated to this toolbox.
The tone we have created can be saved in your computer as a wav file by means of the wavwrite() function. wavwrite() transforms the array into a wav sound file. Note that if you do not want a clipped sound, the array’s values should range between −1 and +1. wavwrite() quantizes the values of the given vector according to the number of bits you are using (16 by default). This means that if you are working at 16 bits, the lower value without clipping is −1  +  1/(2nbit−1). In the pure tone example, the sin() function returns values in the range −1 to +1, and therefore all −1s are clipped when playing the file. If you multiply the sound array by the minimum possible quantized value, i.e., −1  +  1/(2nbit−1), then the sound amplitude will be compressed into a sound file without clipping.
>> wavwrite(tone, sr, ‘my_first_tone’)
Warning: Data clipped during write to file:my_first_tone
> In wavwrite>PCM_Quantize at 247
In wavwrite>write_wavedat at 267
In wavwrite at 112
Matlab clips the lower values and gives a warning message. For common usage, the error is insignificant (a quantization step). If you wish not to obtain a warning message, you can rescale the signal using the following formula:
>> nbits=16; tone=tone*(1-1/(2^nbits-1));
>> wavwrite(tone, sr, ‘my_first_tone’)
We are now ready for something more complex: let’s build a harmonic tone that is the sum of two or more pure tones having a harmonic relation. Let’s create a complex tone with three harmonics and with a fundamental frequency of 250 Hz:
Listing 5.3
A978-1-4614-2197-9_5_Figc_HTML.gif
In Listing 5.3 we have generated three sine waves (lines 5, 6, and 7) and then added the simple tones together to create a complex tone. Line 9 needs a bit of clarification. Since the complex variable implemented in line 7 results from the sum of three components, its amplitude exceeds the −1/+1 range. Therefore, if we play the tone as it is, it will result in a distorted sound (and MATLAB does not warn you about this distortion!). To avoid any distortions, we need to normalize the sound, which is what line 8 does. The normalization is done by dividing the harmonic variable by its maximum absolute value.
Another solution to the distortion problem is to use the soundsc() function, which automatically scales the sound’s amplitude within the −1/+1 range no matter how “large” the sound’s amplitude is. So lines 8 and 9 could have been replaced by the following code:
soundsc(harmonic, sr);
We are now able to produce pure and complex tones of any desired frequency and duration. By the same token, we can now produce an inharmonic tone, i.e., a tone whose frequency components are not in a harmonic relation.
Listing 5.4 generates an inharmonic tone:
Listing 5.4
A978-1-4614-2197-9_5_Figd_HTML.gif
In this example, we have solved the distortion problem by using soundsc() instead of normalizing the sound. Let’s now change the timbre of the complex tone by changing its waveform so that the sound is, for example, a sawtooth wave. In sawtooth waves, the amplitude of each successive harmonic is half of the amplitude of the previous harmonic. Therefore, we have to multiply the amplitude of each successive harmonic by a factor ½, ¼, 1/8 and so on. Let us start with the following code:
Listing 5.5
A978-1-4614-2197-9_5_Fige_HTML.gif
If we plot the first 10 ms of the sound (i.e., the first 441 samples, since we are working at a sample rate of 44,100 Hz), we see that the waveform does not look like a sawtooth wave (Fig. 5.1):
A978-1-4614-2197-9_5_Fig1_HTML.gif
Fig. 5.1
A sawtooth wave with a limited number of harmonics
plot(complex(1:441))
This is because of the limited number of harmonics we have used to generate the wave. However, if we make the complex tone with, let’s say, 20 harmonics, then the waveform’s shape will be more sawtooth-like. The following code listing shows how to do it by means of a for cycle:
Listing 5.6
A978-1-4614-2197-9_5_Figf_HTML.gif
In each for cycle, the amplitude of successive components is halved. If you now plot the first 441 samples, you can check that the waveform is much smoother than the previous one. Moreover, if we plot some of the single components (the first three, for example), we can get an idea of the summation process we have implemented. Listing 5.7 does the job.
Listing 5.7
A978-1-4614-2197-9_5_Figg_HTML.gif

Multiple Sounds

Sometimes it can be useful to create short melodies, i.e., sequences of tones. In order to do this, we have to concatenate the various arrays of the tones (and silences) we want to put in our little tune. Listing 5.8 plays the notes middle C, D, and E, followed by a pause of 1 s and then the note C again.
Listing 5.8
A978-1-4614-2197-9_5_Figh_HTML.gif
Let’s run a more “psychological” example: let us play the pulsation threshold example. The pulsation threshold (Houtgast 1972) is a case of auditory continuity. The stimulus we need to synthesize is a succession of brief noises and tones, for example a noise followed by a tone, followed by a noise, and so on. If the intensity of the tone is low in comparison with that of the noise, the tone is heard as continuous, i.e., the tone is heard even within the noise (even though there is no tone within the noise).
Let’s synthesize the tone and the noise:
>> sr = 44100;
>> f = 1000;
>> d =.2;
>> t = linspace(0, d, d*sr);
>> tone = sin(2*pi*f*t);
>> noise = (rand(1, sr*d)*2)-1;
We have now to reduce the amplitude level of the tone. Here, we first reduce the tone’s amplitude 1/100 (i.e., 40 dB) from its original amplitude and successively create the noise–tone sequence:
>> tone = tone *.01;
>> sequence = [noise, tone];
>> sequence = [sequence, sequence]; % sequence of 4 sounds
>> sequence = [sequence, sequence]; % sequence of 8 sounds
>> sequence = [sequence, sequence]; % sequence of 16 sounds
>> sequence = [sequence, sequence]; % sequence of 32 sounds
>> sound(sequence, sr)
You should be able to hear the tone as continuous within the noise bands (although you should be well aware that there is no tone within the noise bands!).
Let’s suppose you need to synthesize a number of tones sounding together in time, but one (or more) of the tones needs to have just a little onset asynchrony so that it can be heard as popping out from the other tones sounding together (Darwin and Ciocca 1992). Here we implement this example with three tones and add a little onset asynchrony to the second and third tones. First, we add three tones together so that they begin simultaneously in time.
>> sr = 44100;
>> f1 = 300;
>> f2 = 550;
>> f3 = 640;
>> d = 5;
>> t = linspace(0, d, d*sr);
>> tone1 = sin(2*pi*f1*t);
>> tone2 = sin(2*pi*f2*t);
>> tone3 = sin(2*pi*f3*t);
>> complex = tone1+tone2+tone3;
>> soundsc(tone1+tone2+tone3, sr)
Now we add the temporal offset to the second and third tones:
>> offset_dur =.5;
>> offset = zeros(1, sr*offset_dur);
>> tone1 = [tone1, offset, offset];
>> tone2 = [offset, tone2, offset];
>> tone3 = [offset, offset, tone3];
>> soundsc(tone1+tone2+tone3, sr)
If you listen to the sound, you may note that the individual components are indistinguishable in the first example. However, in the second example they can be easily distinguished. This is because of the onset asynchrony of the tones.

Manipulating a Sound’s Level

With MATLAB, it is possible to manipulate precisely the relative levels of sounds. In fact, the absolute level of a sound depends on several factors such as the sound card of your computer, the headphones, the loudspeakers, the amplifiers, and so on. The digital synthesis moves within a dynamic range of 2^bits (2^16, i.e., 96 dB, is a very common dynamic range for most PCs). In digital audio, 0 is arbitrarily set as the maximum level, and all softer levels are represented with negative numbers. In practice, a sound whose level is 0 dB is louder than a sound whose level is −10 dB. For example, let us play a tone twice but the second time 10 dB softer:
Listing 5.9
A978-1-4614-2197-9_5_Figi_HTML.gif
When you are working with a sound’s amplitude, you should use the sound() function rather than the soundsc() function, because soundsc() normalizes the sound’s amplitude before playing the sound. Therefore, any manipulation you have done on the sound’s amplitude is ineffective if you use soundsc.
The same amplitude attenuation can be done with a noise:
Listing 5.10
A978-1-4614-2197-9_5_Figj_HTML.gif
In Listing 5.10, the level of the second noise is 20 dB lower than that of the first one.

Match the Level of Sound with Different Waveforms

In this section we show how to match the levels of sounds of differing waveforms. Let’s suppose that we need to play several environmental sounds during an experiment. Let’s also suppose that we want these (very different) sounds to be perceived as similarly loud. These sounds have different waveforms. Therefore, the sound pressure of each sound will be different. There are two options for matching the levels of sounds having different waveforms. The first option is to normalize the amplitude of the sounds. For example, let’s suppose that we have two sounds stored in two arrays (s1 and s2) in our workspace. To normalize the amplitude of the two sounds, we need to increase their amplitudes so that the peak for both sounds is equal to one. This can be done in this way:
>> s1 = s1/max(abs(s1));
>> s2 = s2/max(abs(s2));
A simpler alternative could be, once again, that of using the sounsc() function, which normalizes the sounds’ amplitudes before playing them.
The second option is to match the two sounds using the root mean square (RMS) amplitude. This process is often more efficient than normalization, in particular when the sounds are very different. Listing 5.11 shows how to do this.
Listing 5.11
A978-1-4614-2197-9_5_Figk_HTML.gif

Analysis

The root mean square of each sound can be calculated as in lines 1 and 2. If the first sound is on average greater in amplitude than the second, we need to attenuate its amplitude (lines 4–6). In contrast, we may need to amplify the amplitude of the second (lines 7–9).
The two sounds now have identical mean amplitudes. In fact, if we recalculate the RMS amplitude, we will find out that the second sound has the same mean amplitude as the first.

Stereophonic Sounds for ITD and ILD3

We can use MATLAB to create sounds with simulated interaural time difference (ITD) or interaural level difference (ILD). First, let’s create a pure tone of 3,000 Hz and 500 ms duration (recall that ILD cues are relevant at relatively high frequencies in particular, Moore 2003).
>> sr = 44100;
>> f = 3000; % the tone’s frequency (in Hz)
>> d =.5;
>> t = linspace(0, d, sr*d);% a time vector
>> tone = sin(2*pi*f*t);
All the sounds we have created so far are monophonic, and monophonic sounds are useful in the majority of psychological applications. If the monophonic sound is coded within a single array, the corresponding stereophonic sound will be coded within two arrays, the first containing the sound for the left channel and the second containing the sound for the right channel. Therefore:
>> stereo_tone = [tone’, tone’];
>> sound(stereo_tone, sr);
Note that both arrays need to be rotated. This is because the sound() function wants as input a stereophonic matrix having two columns, one for the left channel and one for the right channel. This can be tested by playing one channel at time. In the following example, we play the sound, but we multiply the amplitude of the left channel by zero:
>> sound([tone’*0, tone’], sr)
Because the left channel consists entirely of 0s, it is silent. You can make the right channel silent by passing 0s to the right column of the matrix in this way:
>> sound([tone’, tone’*0], sr)
Let’s suppose we want an ILD of 10 dB that simulates a sound source at your left side. What we need to do is to attenuate the right channel by 10 dB, as follows:
>> stereo_tone = [tone’, tone’*10^(-10/20)];
>> sound(stereo_tone, sr)
You can hear that the tone is louder in the left channel than in the right channel. Mutatis mutandis, the same operations (inverting the matrix columns) can be used to create a sound that is perceived as coming from the right.
It is only somewhat more complex to add interaural phase difference (ITD) to our sounds. To get ITD into our sounds we need to control the sound’s phase. Theoretically, the ITD is a temporal difference, usually expressed in microseconds, in the arrival of the sound at the two ears. However, because of this temporal difference, the sound arrives at each ear with a different phase. This phase difference is called interaural phase difference (IPD). In digital synthesis, it is easier to create a phase difference than a temporal difference. The IPD cue is important for sound localization along the azimuth for frequencies up to 1,500 Hz (Moore 2003). Let’s first write a command for controlling a tone’s phase. It can be done as follows:
>> sr = 44100;
>> f = 250;
>> d =.5;
>> t = linspace(0, d, sr*d);
>> phase = 0; % starting phase of the tone in radians
>> tone = sin(2*pi*f*t+phase);
Now let’s suppose we want to create a sound that seems to originate from the right channel. This channel leads, and it is followed after a certain ITD by the sound that arrives at the left ear. Suppose that we want to simulate an ITD of 0.4 ms, i.e., 4*10^–4:
>> ITD = 4*10^–4;
>> IPD = 2*pi*f*ITD;
>> phase_left = 0;
>> phase_right = IPD;
>> tone_left = sin(2*pi*f*t+phase_left);
>> tone_right = sin(2*pi*f*t+phase_right);
>> sound([tone_left’, tone_right’], sr)
As you can hear, the tone seems to be originating from your right.
We are now ready to do something more complex. Let’s say that we want to play 13 sounds starting from the left channel and slowly moving to the right one. Listing 5.12 does this:
Listing 5.12
A978-1-4614-2197-9_5_Figl_HTML.gif
At every iteration of the for loop, we synthesize a stereo tone and play it. The ITD variable content is changed at every iteration by multiplication of the variable i, and this results in a different ITD value every time i changes its value.

A Sound’s Envelope

You may have noticed that in all the sounds we have played so far, there were audible clicks both at the beginning and at the end of the sounds. This is because the amplitude of the sound at onset and offset started abruptly. To remove these disturbing clicks, we need to modulate the amplitude of the very first and very last portions of the sound with an onset and offset ramp, whereas we need to leave unmodulated the middle portion of the sound. In other words, we need to smooth the onset and offset a bit so that the clicks will be inaudible. This smoothing is a very common operation, and usually ramps of 10 ms are sufficient to smooth the sound. In the majority of cases, onset and offset are modulated with raised cosine ramps, i.e., half of a cosine cycle, and precisely the half ranging from π to 2π:
>> sr = 44100
>> gatedur =.01; % the duration of the gate in seconds (i.e., 10 ms)
>> gate = cos(linspace(pi, 2*pi, sr*gatedur));
Let’s now generate a tone:
>> f = 250; % frequency of the tone in Hz
>> d =.5; % duration of the tone in seconds
>> time = linspace(0, d, sr*d); % a time vector
>> tone = sin(2*pi*f*time);
If we play it, we can hear the onset and offset clicks:
>> sound(tone, sr)
If we want to modulate the amplitude of the tone’s onset and offset, we need to multiply the tone by an envelope. The envelope of a sound is an imaginary line connecting all the positive (or negative) peaks of the sound. In digital synthesis, the envelope’s values must lie within the 0/+1 range. Therefore, we now need to adjust our modulator (the gate variable created previously) so that its range is within the 0/+1 limits:
>> gate = gate+1; % this operation translates all the values of the modulator to the 0/+2 range
>> gate = gate/2; % this operation compresses the values within the 0/+1 range
Now the gate is within the correct range. We can now easily create the offset gate by flipping the array containing our onset gate as follows:
>> offsetgate = fliplr(gate);
The last thing we need to do is to create the “sustain” portion of our envelope, in other words, the portion that will not modulate the sound’s amplitude. Because we are going to perform a multiplication, if we want the tone’s central part unchanged after the multiplication, we need to create an array of ones, the neutral factor for the multiplication. The length of the sustain portion will be equal to the tone’s length minus the lengths of the onset and offset gates. In this way, the length of our envelope (i.e., onset gate, sustain, and offset gate) will be identical to that of the tone:
>> sustain = ones(1, (length(tone)-2*length(gate)));
>> envelope = [gate, sustain, offsetgate];
>> smoothed_tone = envelope.* tone;
If we now play the tone, we no longer hear any clicks. Moreover, a graph showing the original tone, the envelope, and resulting smoothed tone may explain visually what we have done so far (Fig. 5.2).
A978-1-4614-2197-9_5_Fig2_HTML.gif
Fig. 5.2
At the top part of the figure you can see the original sound. At the center is shown the envelope that was used to modulate the sound’s onset and offset. At the bottom of the figure is displayed the resulting modulated sound. Note that the onset and offset are not abrupt as in the top left graph
>> sound(smoothed_tone, sr)
>> subplot(3, 1, 1); plot(t, tone)
>> subplot(3, 1, 2); plot(time, envelope, ’o’)
>> subplot(3, 1, 3); plot(time, smoothed_tone)

Sound Filtering

In this section we describe how to filter sounds. One of the purposes of filtering is to generate noises of various kinds (e.g., lowpass, highpass, bandpass). In this section we will see how to filter a noise and how to generate any kind of filtered noise. The first thing we have to do is to create a white noise4:
>> sr = 44100;
>> d = 1; % the duration of the noise (in sec)
>> noise = (rand(1, sr*d)*2)-1;
>> sound(noise, sr);
In order to filter our noise, we need to look at our sound in the frequency domain, i.e., we need to apply a fast Fourier transform (FFT):
>> fnoise = fft(noise);
The magnitude spectrum of our noise looks almost flat along the frequency axis, as it should for white noises.
Command
Graphical result
>> plot(20*log10(abs(fnoise(1:22050))))
A978-1-4614-2197-9_5_Figm_HTML.gif
As you can see, the FFT command returns an array as long as the original noise array. However, the new array contains complex numbers. This array is our original noise. However, now we are looking at it from the frequency domain rather than from the time domain. Therefore, if in the original array each element represented a given amplitude value in time, in the new array, each element represents a frequency with its relative amplitude and phase (note that we are dealing with complex numbers, i.e., numbers having a real and an imaginary part). In the array returned by the FFT, all frequencies up to the half of the sampling rate (i.e., the Nyquist frequency) will be represented twice and symmetrically (see below).
If we want to filter the noise, we need to create a filter array as long as our noise array. The filter array will contain elements ranging from 0 to 1. Later, we will convolve this filter array on the noise array. The frequencies convoluted with elements smaller than 1 will be filtered, and the filtering will be greater, the closer the value of the filter is to 0. The frequencies convoluted with 1 will, however, be untouched. Now let’s suppose we want to create a low-pass filter with a cutoff frequency of 5 kHz. We have to proceed as at lines 3–7 of the following script:
Listing 5.13
A978-1-4614-2197-9_5_Fign_HTML.gif
Listing 5.13 creates the filter. Note that the filter is symmetric because frequencies are represented twice and symmetrically within the noise array. Specifically, we have convolved5 our filter to the noise (line 10) and plotted the spectrum of the filtering (line 11) process. As you can see, a large band of frequency components is now missing (what was removed by the filter). Moreover, you can see the symmetric representation of frequency. If we want to listen to our filtered noise, there are a couple of more things to do. The filtered noise is represented in the frequency domain, but we need to represent it in the time domain. To do this, we first apply the inverse of the FFT, and then we extract the real part of the complex number array:
>> fnoise = ifft(fnoise);
>> fnoise = real(fnoise);
>> fnoise = fnoise/max(abs(fnoise));
>> sound(fnoise, sr)
Note that we have normalized the noise’s amplitude before playing it. This is because the filtering can return a sound whose amplitude exceeds the −1/+1 range.
In Listing 5.14 we include a plot to represent the filtering, step by step. In the example, we band-pass filter a Gaussian white noise and keep the frequency components from 2,000 to 6,000 Hz:
Listing 5.14
A978-1-4614-2197-9_5_Figo_HTML.gif A978-1-4614-2197-9_5_Figp_HTML.gif

Sound Analysis

The analysis of the sound’s acoustical characteristics can range from very simple to very complex (see Giordano and McAdams 2006 for a list of possible sophisticated analyses). In this section we show only some simple analyses because these are the most common in psychology. The first things we may want to know are the digital characteristics of the sound. These characteristics are returned by the wavread() function, which takes as argument the wav filename. Let’s see how this function works, assuming that we have saved a sound in mywavefile.wav:
>> [s, sr, bits] = wavread(‘mywavefile’);
Here s is a vector variable containing the sound. The sr and bits variables are the sound’s sample rate in hertz and the sound’s resolution in bits, respectively. The sound’s duration can be obtained by dividing the length of the sound vector by the sample frequency:
>> duration = length(s)/sr;
The root mean square power of a sound can be obtained by squaring sound’s array and then calculating the average of the resulting values. Finally, we calculate the square root. Because the returned value is linear, we may want to transform it into decibel units.
>> s2 = s^2;
>> ms2 = mean (s2);
>> rms = sqrt(ms2);
>> dB_rms = 20*log10(rms);
If we want to know the peak amplitude of the file, we just need to calculate the maximum absolute value of the sound vector as follows:
>> peak = max(abs(s));
>> peak_dB = 20*log10(peak);
We may want to look at the sound’s magnitude spectrum in order to obtain its frequency content. We first calculate the fast Fourier transform (FFT) of the sound, and then obtain the absolute value to plot the magnitude spectrum. The magnitude spectrum repeats itself. Therefore, we have to plot only the first half of the data:
>> s_fft = fft(s);
>> s_fft = abs(s);
>> s_fft = s_fft(1:length(s_fft)/2);
>> plot(s_fft)
We now may want to see the amplitude of the various frequency components on a decibel scale rather than on a linear scale. Moreover, we may want to have a more meaningful x-axis as well as to add labels to the plot (Fig. 5.3):
A978-1-4614-2197-9_5_Fig3_HTML.gif
Fig. 5.3
Magnitude spectrum of “chirp.” The following spectrum was plotted by loading the variable “chirp” and then by performing the operations written in the text. Note that when you load chirp, the sound array automatically gets the variable name y, whereas the sample rate gets the variable name Fs. Therefore, the following lines must be written at the beginning of the script: load chirp; s  =  y; sr  =  Fs; Note that the spectrum of the chirp shows peaks at 500 Hz, 1,500 Hz, 2,500 Hz, showing an evident harmonic structure of the spectrum
>> f = linspace(0, sr/2, length(s_fft));
>> plot(f, 20*log10(s_fft))
>> xlabel(‘frequency (Hz)’)
>> ylabel(‘dB’)

Summary

  • Sounds are represented by arrays and should lie within the −1/+1 range.
  • White noises (and noises in general) are generated by means of the rand (or randn) function.
  • Pure tones are generated by means of the sin function.
  • Complex tones are generated by adding two or more simple tone arrays.
  • Silences can be generated with arrays of zeros.
  • Sounds can be played with the sound, soundsc, and wavplay functions. soundsc scales normalize the sound’s amplitude automatically. Wavplay is the only MATLAB command that makes it possible to work synchronically.
  • A sound’s sequences are generated by concatenating the sound’s arrays.
  • A sound’s level can be manipulated by multiplying the sound’s array by the desired factor of attenuation/amplification.
  • Stereophonic sounds use matrices containing two columns (i.e., two sound arrays).
  • A sound’s envelope can be manipulated by multiplying the sound’s array by an envelope array. The envelope array must contain values within the 0–1 range.
  • Sounds can be filtered by means of the fft function.

Exercises

1.
Synthesize a 2-kHz, 2-s duration pure tone.
 
2.
Synthesize a white noise of 0.5 s duration.
 
3.
Synthesize a white noise of 0.5 s duration followed by 0.1 s of silence and then the noise again.
 
4.
Synthesize a 200-Hz four-harmonic complex tone (0.5 s duration), followed by the same tone attenuated by 30 dB.
 
5.
Write the previous tone to a wav file.
 
6.
Open the file just created and calculate the root mean square of the first half of the sound and of the second half of the sound and calculate the level difference between the two tones in dB.
 
7.
Synthesize the following stimulus for a forward masking experiment: a band-pass noise of 200 ms followed by a sinusoidal signal of 1,000 Hz. The noise frequency content must range within the 500–2,000 Hz limits.
 
8.
Synthesize a pure tone (300 Hz, 0.2 s duration) with 600 μs of ITD.
 
9.
Synthesize a 500-ms low-pass filtered noise (cutoff frequency 5,000 Hz) and smooth the onset and the offset of the noise with 50-ms raised cosine ramps.
 

A Brick for an Experiment

In the experiment developed by Sekuler et al. (1997), the visual display of a disc’s motion is accompanied by a sound. The authors actually used more than one sound in the various conditions of the experiment. The sound, described in greater detail, is a tone of 440 Hz and 100 ms duration. No other details are provided. For our brick experiment we can use a sine wave of 440 Hz and of 50 ms duration (the effect observed by Sekuler et al. is more compelling if the sound has a short duration), gated on and off with two 5-ms raised cosine ramps, just to prevent clicks at the onset and offset of the sound. Let’s create this sound:
% stimuli (creation)
sr = 44100;
d =.05;
f = 440;
% tone synthesis
t = linspace(0, 0.1, sr*d);
tone = sin(2*pi*f*t);
% ONSET AND OFFSET GATING
gatedur =.005; % the duration of the gate in seconds (i.e., 5-ms)
ongate = cos(linspace(pi, 2*pi, sr*gatedur));
ongate = ongate+1;
ongate = ongate/2;
offgate = fliplr(ongate);
sustain = ones(1, (length(tone)-2*length(ongate)));
envelope = [ongate, sustain, offgate];
tone = tone.* envelope;
This code needs to be extended for our purposes. In our experiment the sound needs to be switched on while the discs are in motion, and in particular, when the discs overlap. In later chapters, we will see that this is going to happen at frame 70 (i.e., at the x coordinate of 140). We have to append, at the beginning of the sound, a silence for the duration we have to wait before switching on the sound. To do that, we need to anticipate the use of the screen function. This function will be extensively described later. The screen is the most important function of the psychtoolbox functions and can do several things. One of the things that this function does is to interrogate the video card to know the refresh rate that is currently set in your computer. Here we use screen to get the refresh rate of your monitor.
refreshrate = FrameRate(screennumber); % get the frame rate of the monitor
silencepre_dur = (1/refreshrate) * 70;
silencepre = zeros(1, round(sr * silencepre_dur));
SoundToPlay = [silencepre, tone];
Now write everything within a single GenerateSound() function. This function receives as argument a digit representing the sound we want to play (1 for no sound, 2 for the sound) that is stored in the EventTable and a screen number (see later chapters). The function’s output will be a sound array. Note that the input number is used at the end of the function to decide whether to return silence or the tone.
Listing 5.15
A978-1-4614-2197-9_5_Figq_HTML.gif
References
Darwin CJ, Ciocca V (1992) Grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component. J Acoust Soc Am 91:3381–3390PubMedCrossRef
Giordano BL, McAdams S (2006) Material identification of real impact sounds: effects of size variation in steel, glass, wood, and plexiglas plates. J Acoust Soc Am 119:1171–1181PubMedCrossRef
Houtgast T (1972) Psychophysical evidence for lateral inhibition in hearing. J Acoust Soc Am 51:1885–1894PubMedCrossRef
Moore BCJ (2003) An introduction to the psychology of hearing, 5th edn. Academic, San Diego
Sekuler R, Sekuler AB, Lau R (1997) Sound alters visual motion perception. Nature 385: 308
Suggested Readings
There are a number of MATLAB tools developed by researchers that can be used in audition. Here is a certainly incomplete list of the available tools:
Grassi M, Soranzo A (2009) MLP: a MATLAB toolbox for rapid and reliable auditory threshold estimations. Behav Res Methods 41:20–28 (This paper implements several psychoacoustic experiment together with sound generators and modifiers.)PubMedCrossRef
Peeters G, Giordano BL, Susini P, Misdariis N, McAdams S (2011) The Timbre Toolbox: Extracting audio descriptors from musical signals. J Acoust Soc Am 130:2902–2916 (This paper shows a toolbox for the analysis of musical signals.)PubMedCrossRef
Malcom Stanley has released a toolbox that implements several popular auditory models:
Pérez E, Rodriguez-Esteban R (2006) Oreja: a MATLAB environment for the design of psychoacoustic stimuli. Behav Res Methods 38:574–578 (The Oreja software package was designed to study speech intelligibility. It is a tool that allows manipulation of speech signals to facilitate study of human speech perception.)PubMedCrossRef
Readers interested in MATLAB tools for audition should take a look at the following journals: Behavior Research Methods and the Journal of Neuroscience Methods. Both journals often publish MATLAB tools for audition.
Readers Interested in more technical and advanced audio processing should read the following book:
McLoughlin I (2009) Applied speech and audio processing: With Matlab examples. Cambridge: Cambridge University PressCrossRef
Footnotes
1
Although MATLAB enables the generation of sounds at several sample rates, 44,100 Hz is the most used. This is, for example, the sample rate of an audio CD.
 
2
Note that wavplay is only for use with windows machines.
 
3
We recommend to use headphones for better appreciating the sounds described in this section.
 
4
Note that it is possible to generate white noises according to various distributions. For example, the rand function returns random and uniformly distributed numbers. Therefore, our white noise would be a “uniform white noise.” If we use randn, a function that generates random numbers according to the normal distribution, we would have a Gaussian white noise instead. See, for instance, Wikipedia for further details.
 
5
Convolution is an operation done in time. Thanks to Fourier transform properties, the convolution operation in time become a simple product in frequency between the Fourier transform of the noise and the filter frequency response.