10.23641/asha.8847833.v1
Jennifer M. Vojtech
Jennifer M.
Vojtech
Jacob P. Noordzij Jr.
Jacob P.
Noordzij Jr.
Gabriel J. Cler
Gabriel J.
Cler
Cara E. Stepp
Cara
E. Stepp
Fundamental frequency and rate on synthetic speech (Vojtech et al., 2019)
ASHA journals
2019
speech
fundamental frequency
f0
modulating
modulation
effects
fundamental
frequency
rate
intelligibility
intelligible
communication
efficiency
efficient
perception
perceived
naturalness
natural
synthetic
prosody
prosodic
synthesizer
variation
rated
synthesis
functional
Linguistic Processes (incl. Speech Production and Comprehension)
Acoustics and Acoustical Devices; Waves
2019-07-15 21:28:49
Media
https://asha.figshare.com/articles/media/Fundamental_frequency_and_rate_on_synthetic_speech_Vojtech_et_al_2019_/8847833
<div><b>Purpose: </b>This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech.</div><div><b>Method:</b> Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli.</div><div><b>Results:</b> Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level.</div><div><b>Conclusions:</b> Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs.</div><div><br></div><div><b>Supplemental Material S1.</b> Normal-Rate Fixed f0</div><div><br></div><div><b>Supplemental Material S2.</b> Slow-Rate Fixed f0</div><div><br></div><div><b>Supplemental Material S3.</b> Normal-Rate Prosodically Natural f0 Variation</div><div><br></div><div><b>Supplemental Material S4.</b> Normal-Rate Prosodically Unnatural f0 Variation</div><div><br></div><div>Vojtech, J. M., Noordzij, J. P., Jr., Cler, G. J., & Stepp, C. E. (2019). The effects of modulating fundamental frequency and speech rate on the intelligibility, communication efficiency, and perceived naturalness of synthetic speech. <i>American Journal of Speech-Language Pathology, 28,</i> 875–886. https://doi.org/10.1044/2019_AJSLP-MSC18-18-0052</div><div><br></div><div><b>Publisher Note: </b>This article is part of the Special Issue: Selected Papers From the 2018 Conference on Motor Speech—Clinical Science and Innovations.</div><div><br></div>