what is science?
4 features of the scientific method
empirical-based on data
deterministic-obeys physical laws
predictive-if you do this..then that will happen
parsimonious-use the simplest explanation possible
SLP's use of technology-why do we use it clinically?
1. overcome listener bias-consistent reliable measurement
2. describe severity objectively
3. track progress over time-demonstrate treatment efficacy
4. ASHA's focus on EBP
5. provide biofeedback to the client
SLP's use of technology-how can acoustics help?
1. need to understand the data
2. examine qualitatively as well as quantitatively
3. the machine makes no judgments
pitch perception vs. frequency measurement
...
frequency
1. how frequently a waveform repeats
2. Hertz (Hz)=cycles per second
simplest sound
pure tone
complex tones
fundamental frequency
low frequency 10ms 220 Hz vs high frequency 880 Hz
pitch perception
1. linked to frequency
2. order on a musical scale
3. subjective perception
4. cannot be measured with instruments
5. listener matches perceived pitch to that of a pure tone of known frequency
frequency difference limens (DL)
1. smallest detectable change in frequency
2. DLs increase with stimulus frequency
3. higher frequency sounds must differ more to be heard as
different in pitch
4. true for comfortable listening level
5. as intensity decreases, DLs become larger
complex tones
1. pure tones rare in everyday life
2. complex tones have many frequencies
3. periodic sounds: a harmonic series
4. fundamental frequency (f0) often strongest
5. harmonics are integer multiples
6. human voice has f0 and harmonics
7. perceived pitch follow
missing fundamental
1. listen to a harmonic series
2. actual fundamental can be absent
3. timbre or quality is different
4. pitch still perceived as the same
5. brain processes harmonic structure
6. fills in the gap by calculation/interpolation 7.cheap audio equipment - simu
pitch perception in music
� A or B note sounds equivalent up or down � octave: doubling/halving of frequency
� 400 Hz base
� 800 Hz an octave up
� 200 Hz an octave down
� some note pairs blend harmoniously � some note pairs are dissonant
� harmonic frequencies match or don't
semitones
12 semitones in one octave
� each semitone is a nonlinear step
� each step upward is bigger than the last
� about 5.9% higher frequency than the one before it � no two semitones are physically identical (Hz) � but semitones sound equal in step size
intensity
� amplitude or 'size' of a sound � lay term: volume level
� intensity measured in dB
� logarithmic scale
� intensity level (IL)
� sound pressure level (SPL)
� measured with a sound level meter
linear or logarithmic?
loudness
� a perceptual characteristic
� judged by a listener
� neither correct nor incorrect
� cannot be measured by equipment
� psychophysical scale links loudness to intensity
loudness vs. intensity
effect of frequency on loudness
� hearing more sensitive at some frequencies � lowest thresholds from 1000 Hz to 5000 Hz � thresholds much higher
� for lowest frequency sounds
� for highest frequency sounds
� audiometers use HL, not SPL
equal loudness contours
how are dB, loudness linked?
� individual responses vary
� influenced by
� study methodology
� instructions to listeners
� general rule: 6-10 dB increase perceived as twice as
loud
� individual perception cannot be 'wrong'
duration and loudness
� below 500 ms, duration affects perception
� especially for durations 15 to 150 ms
� to study this
� vary the tone duration
� vary the intensity
� record the responses
� longer = more audible
� ear integrates the energy
measuring amplitude
� mic signal is always changing
� how to measure its size?
root mean square (RMS)
� squaring makes all values positive
� get the average of squared values
� get the square root of the average
acoustic power
� measured in watts
� describes how much energy is radiated
� intensity
� measured in watts per unit area
� radiates outward from the source in a sphere
� as the radius gets bigger, the energy is spread over a
much larger sphere
� the area of a sphere: 4?r2
inverse square law
� intensity diminishes in proportion to the square of the distance from the source
regulating speech effort
� subglottal pressure drives speech
� more driving pressure for loud speech
� larger vocal fold excursions
� more forceful vocal fold collisions
� larger articulator movements
� higher oral pressure
� consonant burst release stronger
measuring vocal range
� reduced loudness in some disorders
� range of loudness changes with pitch
� Voice Range Profile (VRP)
� measures dB range across F0 range
� more popular in Europe
� not as standardized as audiograms
VRP data collection
� 10 points between min and max F0
� linear scale - Hz increments
� log scale - in semitones
� min and max intensity at each pitch
� plot data on a chart
� can calculate area
� also examine shape
voice range profile
VRP in practice
� can be somewhat time-consuming
� useful for voice patient
� not for a typical SLP evaluation
� practice effects with repetition
� motivation plays a role
� max effort can vary with instructions
� avoid risk of vocal damage
� commercial product available
what is digital sound?
� music on mp3 players vs tapes or vinyl records:
� "it's different"
� "it's clearer"
� "it's more convenient"
� "it's better"
� can your ears hear digits?
� the music is stored in numeric form
� it is played back in analog form
what does 'analog' mean?
� the phenomenon of interest is shown by an analogy
� temperature is represented by the mercury in a thermometer
� more height = higher temperature
� less height = lower temperature
� the signal is analogous to the phenomenon
� microphone voltage represen
analog signals
� analog signals are continuous
� in time
� in amplitude
� continuous signals can be examined in any level of detail
� a line drawn on a page
� a microphone signal in a wire
� there are no gaps
� infinite number of points in time
� infinite number of ampl
common analog devices
� clocks, watches with hands
� mercury thermometer
� tape measures
� cassette recorder
� dimmer switch on a light
digital signals
� digital recordings are discrete
� in time
� in amplitude
� numbers represent snapshots over time
� numbers have finite precision
� a limited number of decimal places
� can be fully represented by a table of numbers
� there are 'unknowns' between the poi
common digital devices
� digital clock
� levels of detail...
� am or pm?
� hours?
� minutes?
� seconds?
� 1/10ths or 1/100ths?
� computer
� compact disc
� mp3 player
how does a sound become digital?
� analog to digital (ADC) conversion
� digital to analog (DAC) conversion
� computer sound cards do this
� why?
� computers only deal with binary data
� binary data: ones or zeroes
� microphones/speakers are analog devices
how can numbers represent signals?
� contrast black and white � 0=black (no light)
� 1=white (all light)
� what about gray?
� 0.5=medium gray
� each shade has its own number
� 1000 shades of gray require 1000 values
how many shades of gray?
� 22 = 4 (2 bit resolution)
� 23 = 8 (3 bit resolution)
� 216 = 65,536 (16 bit resolution)
� 232 = 4,294,967,296 (32 bit resolution)
� number of shades of gray represents level of amplitude precision
quantization
a number reflects signal amplitude
� one number per 'snapshot'
� more decimal places... more detail
� simpler number... cruder record
quantization
using the available range
� weak input signal uses little of the available resolution
� boosting it later will not help
� noise will be boosted with it
strong signal
weak signal
boosted weak signal
sampling rate - snapshots
� numbers represent amplitude values
� how often do you take a sample?
� the more samples the better you can represent the
original signal
� sample rate specified in Hz
� how many would be enough?
� 1000 per second?
� 20,000 per second?
selecting a sampling rate
� higher sampling rate gives better fidelity
� higher sampling rate requires bigger files
� more memory usage
� more disk space for storage
� more processing time for computation
� decision: how much is adequate?
the Nyquist frequency
� the 'Nyquist' is half the sample rate
� the highest frequency you can reproduce
� sample at twice the rate of the highest frequency in the signal
� data up to 100 Hz - sample at 200 Hz
� data to 5 kHz - sample at 10 kHz
� compact discs/mp3 - 44.1 kHz
�
aliasing
� sampling too slowly will inaccurately record the original signal
� you'll miss what happens between samples
� high original frequencies will be improperly
recorded as lower frequencies
� one is the 'alias' of the other
� filtering before digitizing prev
aliasing image
anti-aliasing
� filtering before digitizing prevents aliasing
� set filter to Nyquist frequency
� frequencies above this deleted
� they cannot contaminate recording
� most modern recording systems do the filtering for you automatically
Acoustic Analysis
get good data!
� quiet recording environment - reduce reverbera&on
� quality microphone
� good signal strength
- saturation, not clipping
� proper sample rate
- err on the side of detail
sound and movement
� all sounds originate with movement
- vibrating string of an instrument
- oscillation of the vocal folds
- turbulence in air molecules leaving a tire
� movement characteristics determine the nature of the sound
� speech production has many degrees of fre
acoustics and speech
� acoustic analysis is noninvasive
� we draw inferences about movements from sound
- disordered voice... disordered vocal fold movement - change in source
- distorted articulation... abnormal movement of the articulators - change in filter
limitations
� acoustic patterns reflect vocal tract movements with some ambiguity
� motor equivalence: the same sound can be produced several ways
� acoustics cannot reveal all details of movement
filtering
� what does a filter do?
� speech sounds contain many frequencies � we can be selective
filter types
� high pass
� low pass
� band pass
� band reject
high pass
allows high frequencies through - holds back lower frequencies
low pass
allows low frequencies through- holds back higher frequencies
band pass
s-allows a band of frequencies through-holds back both higher and lower frequencies
band reject
holds back a band of frequencies-allows both higher and lower frequencies through
looking for ingredients
� a prism splits white light into colors
� acoustic analysis splits up sounds
� complex sound... many ingredients � alter proportions... alter quality
Fourier theorem
� all periodic sounds are made of a combination of sine waves
- amplitudes vary
- phase angles vary
- frequencies vary
changing domains
� Fourier transform
� creates a spectrum from the time domain waveform
- analyze a cake to learn its ingredients
time domain data
� a waveform represents sound directly
� (air pressure) changes over time
frequency domain data
� a line spectrum shows the frequency components of a periodic sound
periodic signal spectrum
� frequency domain description of the signal
� has harmonics that are multiples of the fundamental
� has nothing between the lines
- the lines represent the harmonic frequencies
time vs. frequency domain
spectra - aperiodicity
� sine wave: single line on a spectrum
� complex periodic signals: multiple lines
� what would noise look like?
- all frequencies
- equal amplitude
- random phase
noise spectrum connect peaks
real voice signals
� the voice source is not truly periodic
� it is 'nearly periodic'
� spectrum does not have pure lines
� spectrum has peaks
� some spread of energy around the fundamental and harmonics
/a/ vowel spectrum
FFT spectrum
� clearly shows harmonic energy
� each peak is a harmonic
� less clear at showing formants
� more revealing of source
LPC spectrum
� shows spectral envelope
� good at revealing formants
� does not show harmonics
� more revealing of filter
many spectra over time
� a line spectrum is a snapshot in time
� a spectrogram shows speech over time
� spectra are lined up sequentially
� single slices are put together
speech spectrogram
� x-axis is time
� y-axis is frequency
� darkness indicates intensity
spectral slice
sound spectrograms
� display reflects the contribution of many structures and movements
� much detail is present for even simple utterances
� need to be selective, specific in interpreting the display
spectrogram parameters
� y-axis (frequency) limited to Nyquist
- actually, slightly below Nyquist
� sample at a high enough rate
- what details do you want to see?
� display can be adjusted downward
� no way to adjust upward beyond Nyquist frequency
analysis bandwidth
� 'wide band' spectrogram gives clear temporal detail
- frequency resolution is poor
� 'narrow band' spectrogram gives clear frequency detail
- time resolution is poor
why the trade-off?
� analysis evaluates strings of numbers
� more samples needed to find subtle changes in patterns (frequency content)
� quick changes show up over just a few samples
� not enough to give frequency detail
� time, frequency are inversely related
300 Hz bandwidth
� vertical striations are glottal pulses
� accurate time measures possible
not clones
� we're all different on the outside
� we're just as different inside the vocal tract
� acous.cs affected by structural differences
� differences in function also affect sounds
the voice
allows efficient communication
can be an artistic tool
conveys subtle shades of emotion
physiologic detail
voice problems come from disordered vocal fold activity
therapy aims to improve vocal fold movements
detailed information about the activity of the larynx helps assessment and treatment
laryngeal mirror exam
rigid scope exam
flexible scope exam
it's all a blur...
vibrating vocal folds move quickly
movements are blurred to the eye
technology can help us 'slow down' movements
high speed filming
stroboscopy
high speed filming
standard DVD/ VHS video uses 30 frames/sec
ultra-high speed filming uses 4000 - 6000 frames/second
not clinically practical
expense
complexity
was used in pioneering work on physiology
stroboscopy
capitalizes on an optical illusion
light flashes illuminate a target
each illumination is a snapshot
paste the snapshots together in succession
voila! a movie
video has 30 frames/sec
vocal folds may oscillate 200 times/sec
timing of the flashes is crucial
stroboscopy
timing is everything
'frozen' image
flash occurs at the same point in each
cycle
slow motion image
flashes slightly delayed in successive cycles
strobe limitations
'movie' is a composite of stills from different cycles
strobe movies are made of non-adjacent samples
'true' vocal fold motion is not seen
F0 must be steady
flashes cannot synchronize if F0 erratic
severe dysphonia precludes stroboscopy
electroglottography (EGG)
two electrodes placed on the neck
over thyroid laminae
current passes between the electrodes
more current when folds are together
less current when folds are separated
signal goes up and down for each cycle
EGG signal pathway
what does the signal represent?
signal is not the same as sound pressure
also not the same as glottal width
signal does not represent opening/closing
signal represents vocal fold contact area
absolute signal amplitude is meaningless
many units have automatic gain control
changes in voca
microphone and EGG signals
EGG waveform: pressed voice
isn't that dangerous?
E.G.G. uses tiny currents
current is at a very high frequency
current is totally imperceptible
current is about 10 mA
voltage is about 0.5 V
E.G.G. advantages
noninvasive
safe
easy to use
nobody's perfect
human phonation is not perfectly periodic
it is a 'nearly periodic' signal
glottal cycle periods are not identical
harmonic energy: periodic parts
noise: all the rest
vocal perturbation
random cycle-to-cycle duration differences are called jitter
random cycle-to-cycle amplitude differences are called shimmer
jitter and shimmer co-occur
won't find one without the other
what is perturbation like?
hoarse voices have higher perturbation
jitter = frequency perturbation
shimmer = amplitude perturbation
normal voices have minimal perturbation
absence of perturbation sounds artificial
perturbation in the clinic
the number does not give a diagnosis
different pathologies do not have distinct
"acoustic fingerprints"
your ears decide if there's a problem
jitter and shimmer values may quantify severity
numbers may track progress over time
shimmer=amplitude instability
no shimmer height is constant
shimmer present
height fluctuates
measuring perturbation
in the olden days...
get a printout and a ruler
measure the average wave height
calculate the differences of each pulse also measure period and variability
one second of voice = hours of work
but today...
computers do the math in milliseconds
what causes perturbation?
neurologic factors
muscle contraction is inherently unsmooth
individual motor neurons 'take turns' firing
overall contraction is the net effect of many
miniscule twitches
vocal fold tension is not perfectly static
causes of perturbation
air flow can be intermittently turbulent pathological vocal fold tissue changes
left/right asymmetry mass lesions
? nodules, polyps
vocal fold swelling
tension abnormalities
perturbation for your client
use sustained vowels
avoid onsets and offsets
comfortable intensity level
record live to the computer
tape recorders introduce perturbation
compare like with like
same vowel
same conditions
audibly shaky voices
perturbation is very rapid
sounds rough and hoarse
what about a tremulous voice?
how about vocal vibrato?
rhythmic modulations
tremor = rhythmic change in F0 and amplitude
much slower than random cycle-to-cycle perturbation
pattern of gradual increase/decrease in F0 is spread across many cycles
FM = frequency modulation
AM = amplitude modulation
theoretical modulation
vocal tremor
frequency and intensity changes are seldom independent
most speakers F0 will go up with intensity
amplitude (AM) and frequency (FM) modulations typically co-occur
physiologic tremor data
quantifying tremor
how rapid is it?
5-7 Hz modulation rate is typical
how extreme is it?
% fluctuation around the mean
how steady is it?
periodicity can be expressed as %