Simon Todd

Current and Recent Research

This page is out of date!
Please see my lab website for recent publications.

Non-Māori-speaking New Zealanders have a Māori proto-lexicon

[+] with Yoon Mi Oh, Clay Beckner, Jen Hay, Jeanette King, and Jeremy Needle; published 2020 in Scientific Reports [pdf] [supplements]

We investigate implicit vocabulary learning by adults who are exposed to a language in their ambient environment. Most New Zealanders do not speak Māori, yet are exposed to it throughout their lifetime. We show that this exposure leads to a large proto-lexicon - implicit knowledge of the existence of approximately 1500 morphemes without any associated meaning. Non-speakers of Māori are able to generalize over this proto-lexicon to generate sophisticated intuitions about the well-formedness of Māori words, in a manner that is indistinguishable from fluent Māori speakers.
Word Frequency Effects in Sound Change as a Consequence of Perceptual Asymmetries

[+] with Janet Pierrehumbert and Jen Hay; published 2019 in Cognition [pdf] [supplement] [code]

We present a computational exemplar-theoretic model for the perception/production loop, with which we explore how items of different lexical frequencies may respond to changes in discriminability in the phonological space during sound change. The model predicts high-frequency words to change at a faster rate than low-frequency words when the change decreases discriminability, at a slower rate than low-frequency words when the change increases discriminability, and at the same rate as low-frequency words when the change does not affect discriminability.
Diagnosing Change in a Sparse, Bursty Variable: Eh in Pākehā English

[+] [slides]

I investigate change in the degree of use of the discourse particle eh among Pākehā (white) New Zealanders. Diagnosing change in eh is made difficult by the fact that it is both sparse – not used at all by many speakers – and bursty – infrequent among most users, but excessive among a few. Thus, many speakers may fail to use eh in speech observed in a corpus not because they do not use it at all (in general), but simply because they do not use it enough for it to appear. To address this difficulty, and allow change-in-progress to be distinguished from age-grading, I introduce a new statistical method into the quantitative linguistics toolbox: zero-inflated negative binomial regression. I find quantitative evidence that eh has spread from indigenous Māori to white Pākeha over time, and qualitative evidence that this spread was facilitated by the decrease in social stigma attached to Māori.
Unsupervised Morphological Segmentation in a Language with Reduplication

[+] with Annie Huang, Jeremy Needle, Jen Hay, and Jeanette King; presented 2022 at SIGMORPHON [pdf] [supplements]

We present an extension of the Morfessor family of unsupervised morphological segmentation algorithms that can recognize phonological reduplication. We show that the extension improves accuracy of morphological analyses in Māori, and we conduct a detailed error analysis that demonstrates the reasons for this improvement and the remaining limitations.
Quantitative Insights into Māori Word Structure

[+] with Jeremy Needle, Jeanette King, and Jen Hay

We develop quantitative insights into Māori word structure, based on morphological segmentations provided by an unsupervised machine learning system and by fluent speakers of Māori. We analyse over 18,000 words from the Te Aka dictionary, and we find that fewer than 10% of them are unambiguously monomorphemic. Thus, the vast majority of words in Māori were potentially formed via morphological processes. We present a statistical breakdown of the morphological patterns in Māori, in which we quantify and rank the productivities of the morphological processes from the literature. By presenting this breakdown both over word types in the dictionary and over word tokens in corpora of running speech, we obtain probabilistic strategies for Māori word formation, as well as an indication of which strategies are likely to yield words that will be widely used by speakers.
Talker and Stereotype in the False Recall of Spoken Words

[+] with Zion Mengesha and Meghan Sumner [poster]

We explore the implicit biases that link the use of African American Vernacular English (AAVE) to negative and violent stereotypes, and how these biases interact with linguistic memory. In a DRM task, we find that listeners more often falsely remember hearing words such as gun and gang from an AAVE speaker than from a (white) General American English speaker. We argue that meaning is constructed from both words and voice, and that the strength of a memory for a word is mediated by the word's consistency with stereotypes inferred from the corresponding voice.
Syntactic Embedding and Visibility of Morphophonological Structure

[+] Best student paper at Phonetics and Phonology in Europe, June 2015 [abstract] [slides]

I present experiments that investigate gradience and variation in the propensity to suppress the English possessive morpheme /z/ with possessors ending in the plural morpheme /z/. I find that suppression becomes less necessary as the number of syntactic brackets between the two /z/s increases. I sketch formal implementations of this finding in representational and derivational morphophonological frameworks, and I argue that it implies that morphophonological processing must be guided by syntactic (or corresponding prosodic) structure in a stochastic fashion.

Current and Recent Research

Non-Māori-speaking New Zealanders have a Māori proto-lexicon

Word Frequency Effects in Sound Change as a Consequence of Perceptual Asymmetries

Diagnosing Change in a Sparse, Bursty Variable: Eh in Pākehā English

Unsupervised Morphological Segmentation in a Language with Reduplication

Quantitative Insights into Māori Word Structure

Talker and Stereotype in the False Recall of Spoken Words

Syntactic Embedding and Visibility of Morphophonological Structure

Research interests