COMD 5070 exam 1 study guide

what is science?
4 features of the scientific method

empirical-based on data
deterministic-obeys physical laws
predictive-if you do this..then that will happen
parsimonious-use the simplest explanation possible

SLP's use of technology-why do we use it clinically?

1. overcome listener bias-consistent reliable measurement
2. describe severity objectively
3. track progress over time-demonstrate treatment efficacy
4. ASHA's focus on EBP
5. provide biofeedback to the client

SLP's use of technology-how can acoustics help?

1. need to understand the data
2. examine qualitatively as well as quantitatively
3. the machine makes no judgments

pitch perception vs. frequency measurement

...

frequency

1. how frequently a waveform repeats
2. Hertz (Hz)=cycles per second

simplest sound

pure tone

complex tones

fundamental frequency

low frequency 10ms 220 Hz vs high frequency 880 Hz

pitch perception

1. linked to frequency
2. order on a musical scale
3. subjective perception
4. cannot be measured with instruments
5. listener matches perceived pitch to that of a pure tone of known frequency

frequency difference limens (DL)

1. smallest detectable change in frequency
2. DLs increase with stimulus frequency
3. higher frequency sounds must differ more to be heard as
different in pitch
4. true for comfortable listening level
5. as intensity decreases, DLs become larger

complex tones

1. pure tones rare in everyday life
2. complex tones have many frequencies
3. periodic sounds: a harmonic series
4. fundamental frequency (f0) often strongest
5. harmonics are integer multiples
6. human voice has f0 and harmonics
7. perceived pitch follow

missing fundamental

1. listen to a harmonic series
2. actual fundamental can be absent
3. timbre or quality is different
4. pitch still perceived as the same
5. brain processes harmonic structure
6. fills in the gap by calculation/interpolation 7.cheap audio equipment - simu

pitch perception in music

� A or B note sounds equivalent up or down � octave: doubling/halving of frequency
� 400 Hz base
� 800 Hz an octave up
� 200 Hz an octave down
� some note pairs blend harmoniously � some note pairs are dissonant
� harmonic frequencies match or don't

semitones

12 semitones in one octave
� each semitone is a nonlinear step
� each step upward is bigger than the last
� about 5.9% higher frequency than the one before it � no two semitones are physically identical (Hz) � but semitones sound equal in step size

intensity

� amplitude or 'size' of a sound � lay term: volume level
� intensity measured in dB
� logarithmic scale
� intensity level (IL)
� sound pressure level (SPL)
� measured with a sound level meter

linear or logarithmic?

loudness

� a perceptual characteristic
� judged by a listener
� neither correct nor incorrect
� cannot be measured by equipment
� psychophysical scale links loudness to intensity

loudness vs. intensity

effect of frequency on loudness

� hearing more sensitive at some frequencies � lowest thresholds from 1000 Hz to 5000 Hz � thresholds much higher
� for lowest frequency sounds
� for highest frequency sounds
� audiometers use HL, not SPL

equal loudness contours

how are dB, loudness linked?

� individual responses vary
� influenced by
� study methodology
� instructions to listeners
� general rule: 6-10 dB increase perceived as twice as
loud
� individual perception cannot be 'wrong'

duration and loudness

� below 500 ms, duration affects perception
� especially for durations 15 to 150 ms
� to study this
� vary the tone duration
� vary the intensity
� record the responses
� longer = more audible
� ear integrates the energy

measuring amplitude

� mic signal is always changing
� how to measure its size?

root mean square (RMS)

� squaring makes all values positive
� get the average of squared values
� get the square root of the average

acoustic power

� measured in watts
� describes how much energy is radiated

� intensity

� measured in watts per unit area
� radiates outward from the source in a sphere
� as the radius gets bigger, the energy is spread over a
much larger sphere
� the area of a sphere: 4?r2

inverse square law

� intensity diminishes in proportion to the square of the distance from the source

regulating speech effort

� subglottal pressure drives speech
� more driving pressure for loud speech
� larger vocal fold excursions
� more forceful vocal fold collisions
� larger articulator movements
� higher oral pressure
� consonant burst release stronger

measuring vocal range

� reduced loudness in some disorders
� range of loudness changes with pitch
� Voice Range Profile (VRP)
� measures dB range across F0 range
� more popular in Europe
� not as standardized as audiograms

VRP data collection

� 10 points between min and max F0
� linear scale - Hz increments
� log scale - in semitones
� min and max intensity at each pitch
� plot data on a chart
� can calculate area
� also examine shape

voice range profile

VRP in practice

� can be somewhat time-consuming
� useful for voice patient
� not for a typical SLP evaluation
� practice effects with repetition
� motivation plays a role
� max effort can vary with instructions
� avoid risk of vocal damage
� commercial product available

what is digital sound?

� music on mp3 players vs tapes or vinyl records:
� "it's different"
� "it's clearer"
� "it's more convenient"
� "it's better"
� can your ears hear digits?
� the music is stored in numeric form
� it is played back in analog form

what does 'analog' mean?

� the phenomenon of interest is shown by an analogy
� temperature is represented by the mercury in a thermometer
� more height = higher temperature
� less height = lower temperature
� the signal is analogous to the phenomenon
� microphone voltage represen

analog signals

� analog signals are continuous
� in time
� in amplitude
� continuous signals can be examined in any level of detail
� a line drawn on a page
� a microphone signal in a wire
� there are no gaps
� infinite number of points in time
� infinite number of ampl

common analog devices

� clocks, watches with hands
� mercury thermometer
� tape measures
� cassette recorder
� dimmer switch on a light

digital signals

� digital recordings are discrete
� in time
� in amplitude
� numbers represent snapshots over time
� numbers have finite precision
� a limited number of decimal places
� can be fully represented by a table of numbers
� there are 'unknowns' between the poi

common digital devices

� digital clock
� levels of detail...
� am or pm?
� hours?
� minutes?
� seconds?
� 1/10ths or 1/100ths?
� computer
� compact disc
� mp3 player

how does a sound become digital?

� analog to digital (ADC) conversion
� digital to analog (DAC) conversion
� computer sound cards do this
� why?
� computers only deal with binary data
� binary data: ones or zeroes
� microphones/speakers are analog devices

how can numbers represent signals?

� contrast black and white � 0=black (no light)
� 1=white (all light)
� what about gray?
� 0.5=medium gray
� each shade has its own number
� 1000 shades of gray require 1000 values

how many shades of gray?

� 22 = 4 (2 bit resolution)
� 23 = 8 (3 bit resolution)
� 216 = 65,536 (16 bit resolution)
� 232 = 4,294,967,296 (32 bit resolution)
� number of shades of gray represents level of amplitude precision

quantization

a number reflects signal amplitude
� one number per 'snapshot'
� more decimal places... more detail
� simpler number... cruder record

quantization

using the available range

� weak input signal uses little of the available resolution
� boosting it later will not help
� noise will be boosted with it

strong signal

weak signal

boosted weak signal

sampling rate - snapshots

� numbers represent amplitude values
� how often do you take a sample?
� the more samples the better you can represent the
original signal
� sample rate specified in Hz
� how many would be enough?
� 1000 per second?
� 20,000 per second?

selecting a sampling rate

� higher sampling rate gives better fidelity
� higher sampling rate requires bigger files
� more memory usage
� more disk space for storage
� more processing time for computation
� decision: how much is adequate?

the Nyquist frequency

� the 'Nyquist' is half the sample rate
� the highest frequency you can reproduce
� sample at twice the rate of the highest frequency in the signal
� data up to 100 Hz - sample at 200 Hz
� data to 5 kHz - sample at 10 kHz
� compact discs/mp3 - 44.1 kHz

aliasing

� sampling too slowly will inaccurately record the original signal
� you'll miss what happens between samples
� high original frequencies will be improperly
recorded as lower frequencies
� one is the 'alias' of the other
� filtering before digitizing prev

aliasing image

anti-aliasing

� filtering before digitizing prevents aliasing
� set filter to Nyquist frequency
� frequencies above this deleted
� they cannot contaminate recording
� most modern recording systems do the filtering for you automatically

Acoustic Analysis

get good data!
� quiet recording environment - reduce reverbera&on
� quality microphone
� good signal strength
- saturation, not clipping
� proper sample rate
- err on the side of detail

sound and movement

� all sounds originate with movement
- vibrating string of an instrument
- oscillation of the vocal folds
- turbulence in air molecules leaving a tire
� movement characteristics determine the nature of the sound
� speech production has many degrees of fre

acoustics and speech

� acoustic analysis is noninvasive
� we draw inferences about movements from sound
- disordered voice... disordered vocal fold movement - change in source
- distorted articulation... abnormal movement of the articulators - change in filter

limitations

� acoustic patterns reflect vocal tract movements with some ambiguity
� motor equivalence: the same sound can be produced several ways
� acoustics cannot reveal all details of movement

filtering

� what does a filter do?
� speech sounds contain many frequencies � we can be selective

filter types

� high pass
� low pass
� band pass
� band reject

high pass

allows high frequencies through - holds back lower frequencies

low pass

allows low frequencies through- holds back higher frequencies

band pass

s-allows a band of frequencies through-holds back both higher and lower frequencies

band reject

holds back a band of frequencies-allows both higher and lower frequencies through

looking for ingredients

� a prism splits white light into colors
� acoustic analysis splits up sounds
� complex sound... many ingredients � alter proportions... alter quality

Fourier theorem

� all periodic sounds are made of a combination of sine waves
- amplitudes vary
- phase angles vary
- frequencies vary

changing domains

� Fourier transform
� creates a spectrum from the time domain waveform
- analyze a cake to learn its ingredients

time domain data

� a waveform represents sound directly
� (air pressure) changes over time

frequency domain data

� a line spectrum shows the frequency components of a periodic sound

periodic signal spectrum

� frequency domain description of the signal
� has harmonics that are multiples of the fundamental
� has nothing between the lines
- the lines represent the harmonic frequencies

time vs. frequency domain

spectra - aperiodicity

� sine wave: single line on a spectrum
� complex periodic signals: multiple lines
� what would noise look like?
- all frequencies
- equal amplitude
- random phase

noise spectrum connect peaks

real voice signals

� the voice source is not truly periodic
� it is 'nearly periodic'
� spectrum does not have pure lines
� spectrum has peaks
� some spread of energy around the fundamental and harmonics

/a/ vowel spectrum

FFT spectrum

� clearly shows harmonic energy
� each peak is a harmonic
� less clear at showing formants
� more revealing of source

LPC spectrum

� shows spectral envelope
� good at revealing formants
� does not show harmonics
� more revealing of filter

many spectra over time

� a line spectrum is a snapshot in time
� a spectrogram shows speech over time
� spectra are lined up sequentially
� single slices are put together

speech spectrogram

� x-axis is time
� y-axis is frequency
� darkness indicates intensity

spectral slice

sound spectrograms

� display reflects the contribution of many structures and movements
� much detail is present for even simple utterances
� need to be selective, specific in interpreting the display

spectrogram parameters

� y-axis (frequency) limited to Nyquist
- actually, slightly below Nyquist
� sample at a high enough rate
- what details do you want to see?
� display can be adjusted downward
� no way to adjust upward beyond Nyquist frequency

analysis bandwidth

� 'wide band' spectrogram gives clear temporal detail
- frequency resolution is poor
� 'narrow band' spectrogram gives clear frequency detail
- time resolution is poor

why the trade-off?

� analysis evaluates strings of numbers
� more samples needed to find subtle changes in patterns (frequency content)
� quick changes show up over just a few samples
� not enough to give frequency detail
� time, frequency are inversely related

300 Hz bandwidth

� vertical striations are glottal pulses
� accurate time measures possible

not clones

� we're all different on the outside
� we're just as different inside the vocal tract
� acous.cs affected by structural differences
� differences in function also affect sounds

the voice

allows efficient communication
can be an artistic tool
conveys subtle shades of emotion

physiologic detail

voice problems come from disordered vocal fold activity
therapy aims to improve vocal fold movements
detailed information about the activity of the larynx helps assessment and treatment

laryngeal mirror exam

rigid scope exam

flexible scope exam

it's all a blur...

vibrating vocal folds move quickly
movements are blurred to the eye
technology can help us 'slow down' movements
high speed filming
stroboscopy

high speed filming

standard DVD/ VHS video uses 30 frames/sec
ultra-high speed filming uses 4000 - 6000 frames/second
not clinically practical
expense
complexity
was used in pioneering work on physiology

stroboscopy

capitalizes on an optical illusion
light flashes illuminate a target
each illumination is a snapshot
paste the snapshots together in succession
voila! a movie
video has 30 frames/sec
vocal folds may oscillate 200 times/sec
timing of the flashes is crucial

stroboscopy

timing is everything

'frozen' image
flash occurs at the same point in each
cycle
slow motion image
flashes slightly delayed in successive cycles

strobe limitations

'movie' is a composite of stills from different cycles
strobe movies are made of non-adjacent samples
'true' vocal fold motion is not seen
F0 must be steady
flashes cannot synchronize if F0 erratic
severe dysphonia precludes stroboscopy

electroglottography (EGG)

two electrodes placed on the neck
over thyroid laminae
current passes between the electrodes
more current when folds are together
less current when folds are separated
signal goes up and down for each cycle

EGG signal pathway

what does the signal represent?

signal is not the same as sound pressure
also not the same as glottal width
signal does not represent opening/closing
signal represents vocal fold contact area
absolute signal amplitude is meaningless
many units have automatic gain control
changes in voca

microphone and EGG signals

EGG waveform: pressed voice

isn't that dangerous?

E.G.G. uses tiny currents
current is at a very high frequency
current is totally imperceptible
current is about 10 mA
voltage is about 0.5 V
E.G.G. advantages
noninvasive
safe
easy to use

nobody's perfect

human phonation is not perfectly periodic
it is a 'nearly periodic' signal
glottal cycle periods are not identical
harmonic energy: periodic parts
noise: all the rest

vocal perturbation

random cycle-to-cycle duration differences are called jitter
random cycle-to-cycle amplitude differences are called shimmer
jitter and shimmer co-occur
won't find one without the other

what is perturbation like?

hoarse voices have higher perturbation
jitter = frequency perturbation
shimmer = amplitude perturbation
normal voices have minimal perturbation
absence of perturbation sounds artificial

perturbation in the clinic

the number does not give a diagnosis
different pathologies do not have distinct
"acoustic fingerprints"
your ears decide if there's a problem
jitter and shimmer values may quantify severity
numbers may track progress over time

shimmer=amplitude instability

no shimmer height is constant
shimmer present
height fluctuates

measuring perturbation

in the olden days...
get a printout and a ruler
measure the average wave height
calculate the differences of each pulse also measure period and variability
one second of voice = hours of work
but today...
computers do the math in milliseconds

what causes perturbation?

neurologic factors
muscle contraction is inherently unsmooth
individual motor neurons 'take turns' firing
overall contraction is the net effect of many
miniscule twitches
vocal fold tension is not perfectly static

causes of perturbation

air flow can be intermittently turbulent pathological vocal fold tissue changes
left/right asymmetry mass lesions
? nodules, polyps
vocal fold swelling
tension abnormalities

perturbation for your client

use sustained vowels
avoid onsets and offsets
comfortable intensity level
record live to the computer
tape recorders introduce perturbation
compare like with like
same vowel
same conditions

audibly shaky voices

perturbation is very rapid
sounds rough and hoarse
what about a tremulous voice?
how about vocal vibrato?

rhythmic modulations

tremor = rhythmic change in F0 and amplitude
much slower than random cycle-to-cycle perturbation
pattern of gradual increase/decrease in F0 is spread across many cycles
FM = frequency modulation
AM = amplitude modulation

theoretical modulation

vocal tremor

frequency and intensity changes are seldom independent
most speakers F0 will go up with intensity
amplitude (AM) and frequency (FM) modulations typically co-occur

physiologic tremor data

quantifying tremor

how rapid is it?
5-7 Hz modulation rate is typical
how extreme is it?
% fluctuation around the mean
how steady is it?
periodicity can be expressed as %