Materials and metrics for Swedish speech tests (Witte & Köbler, 2019)

journal contribution
posted on 02.07.2019 by Erik Witte, Susanne Köbler
Purpose: As factors influencing human word perception are important in the construction of speech perception tests used within the speech and hearing sciences, the purposes of this study were as follows: first, to develop algorithms that can be used to calculate different types of word metrics that influence the speed and accuracy of word perception and, second, to create a database in which those word metrics were calculated for a large set of Swedish words.
Method: Based on a revision of a large Swedish phonetic dictionary, data and algorithms were developed by which various frequency metrics, word length metrics, semantic metrics, neighborhood metrics, phonotactic metrics, and orthographic transparency metrics were calculated for each word in the dictionary. Of the various word metric algorithms used, some were Swedish language reimplementations of previously published algorithms, and some were developed in this study.
Results: The results of this study have been gathered in a Swedish word metric database called the AFC-list. The AFC-list consists of 816,404 phonetically transcribed Swedish words, all supplied with the word metric data calculated. The full AFC-list has been made publicly available under the Creative Commons Attribution 4.0 International license.
Conclusion: The results of this study constitute an extensive linguistic resource for the process of selecting test items in new well-controlled speech perception tests in the Swedish language.

Supplemental Material S1. List of all phonetic characters used in the AFC-list along with their corresponding Unicode hexadecimal code, underlying phonemes, as well as example transcriptions.

Supplemental Material S2. Swedish positional segment probability (PSP) data as well as standardized positional segment probability (S-PSP) data.

Supplemental Material S3. Swedish position-specific biphone probability (PSBP) data as well as standardized position-specific bi-phone probability (S-PSBP) data.

Supplemental Material S4. AFC-list syllabification clusters (ASC).

Supplemental Material S5. Swedish stress and syllable structure based normalized phonotactic probability (SSPP) data.

Supplemental Material S6. Swedish pronunciation-to-grapheme rules used in the current study.

Supplemental Material S7. Swedish grapheme-to-pronunciation orthographic transparency (G2P-OT) data.

Supplemental Material S8. Swedish grapheme-initial letter-to-pronunciation orthographic transparency (GIL2P-OT) data.

Supplemental Material S9. Swedish pronunciation-initial phone-to-grapheme orthographic transparency (PIP2G-OT) data.

Supplemental Material S10. A tab-delimited text file (UTF-8) containing the AFC-list.

Supplemental Material S11. A Microsoft Access database file containing the AFC-list.

Supplemental Material S12. Description of all columns used in the AFC-list.

Witte, E., & Köbler, S. (2019). Linguistic materials and metrics for the creation of well-controlled Swedish speech perception tests. Journal of Speech, Language, and Hearing Research, 62, 2280–2294. https://doi.org/10.1044/2019_JSLHR-S-18-0454


This work was supported by Nyckelfonden (Grant OLL- 597471) and The Research Committee (Grant OLL-551401) at Region Örebro County, Sweden.