A quantitative framework for historical linguistics
"Historical linguistics is not commonly associated with corpus and quantitative approaches. Although examples of quantitative corpus studies are attested throughout the history of modern historical linguistics, they have not yet been adopted by the majority of scholars. Using the ‘chasm’ model of technology adoption introduced by marketing research (Moore 1998), I will report on a survey which analysed research articles from six volumes of historical linguistics journals, and categorized them along the qualitative and quantitative dimension, as well as the corpus-based and non-corpus-based dimension. The results show that historical linguistics lags behind general linguistics in the adoption of quantitative corpus methods.
In this talk I will argue that with respect to quantitative corpus methods, historical linguistics is in the pre-chasm stage, meaning that widespread adoption of these techniques is still difficult. Instead, early innovators are crucial for spreading new ideas further, and for building a framework that will appeal to the majority of historical linguists. I will present the framework described in Jenset & McGillivray (2017), which integrates corpus quantitative approaches in new research practices for historical linguistics, illustrating the power of quantitative corpus studies for nuanced explorations of historical linguistics phenomena."
Language & perception: The tasty, smelly side of language
When word lists and grammar rules are replacedwith cognitively viable building blocks, optimized for human learning.
In this talk I will present work done during the first year of the Leverhulme-funded Out of Our Minds project [https://outofourminds.shef.ac.uk].
The overarching aim of this project is to propose a new way of describing language data that yields a cognitively plausible description of speakers’ linguistic knowledge. Research has shown that traditional linguistic categories do not adequately capture language user’s intuitions about their native tongue (Dąbrowska 2008 for morphology, Frank 2013 for syntax, Divjak et al. 2015 for semantics). Yet these categories form the foundation for work on language across disciplines, from Language Teaching to Language Engineering (e.g., Natural Language Processing, Machine Learning, Artificial Intelligence). We set out to change the ways in which languages are described, modelled and taught by taking an interdisciplinary approach involving linguistics, psychology and engineering.
To achieve our goal, we implement the requirement for cognitive reality in linguistic analysis at the theoretical, methodological and descriptive levels. The cornerstone of our approach are computational models of learning those incorporate insights from Learning Theory and can mimic the way in which humans learn from exposure to language. The patterns we find are then constrained experimentally. By providing researchers across disciplines with linguistic abstractions that matter to the cognitive systems of speakers, this project paves the way for cognitively plausible models of language. This would, then, provide insight into the way in which languages are learned when word lists and grammar rules are replaced with a set of cognitively viable building blocks, optimized for human learning. I will illustrate the results we have achieved so far with case studies on morphology (allomorphy), syntax (construal) and semantics (Tense-Aspect-Mood concepts).
An arbitrary language system? A case for symbol interdependency
In this talk I will argue that with respect to quantitative corpus methods, historical linguistics is in the pre-chasm stage, meaning that widespread adoption of these techniques is still difficult. Instead, early innovators are crucial for spreading new ideas further, and for building a framework that will appeal to the majority of historical linguists. I will present the framework described in Jenset & McGillivray (2017), which integrates corpus quantitative approaches in new research practices for historical linguistics, illustrating the power of quantitative corpus studies for nuanced explorations of historical linguistics phenomena.
Using network science to understand the mental lexicon
Network science approaches are increasingly used in the study of human cognition. Depicting cognitive systems, such as semantic memory or the mental lexicon, as a cognitive network consisting of nodes and edges permits the application of a suite of computational and quantitative tools that allows the cognitive scientist to explicitly examine the structural properties of cognitive systems and the processes that occur in those systems. In this talk, I discuss how network science approaches can address questions related to the representation of cognitive systems and the cognitive and language-related processes that necessarily occur within those systems, with a specific focus on the structure of mental lexicon, the part of long-term memory where phonological and orthographic representations are stored, and the processes related to lexical retrieval, production, and language acquisition. Using a network science framework we will examine how process and structure interact to produce observable behavioural patterns in psycholinguistic studies, and how the structure of the mental lexicon changes over time as new lexical representations are acquired.
How learning and interaction lead to language simplification
Languages spoken in larger populations are relatively simple. This has been attributed to the influence of adult learners: adult learners (i.e. non-native speakers) who don’t achieve native-like competence tend to end up speaking a simplified version of the target language, and since languages with more speakers also tend to have a higher proportion of non-native speakers, the relative simplicity of those languages may be due to the accumulation of the simplifications made by non-native speakers. I will present a series of studies testing how and when a language can be changed by its non-native speakers, exploring the role of learning and communicative interaction in this process. We run artificial language experiments in the lab to see how languages are changed during their learning and use, and then use computational models to explore how small-scale lab results scale to larger populations. Our initial results suggest that accommodation by native speakers to non-native speakers plays a crucial role in explaining the reported relationship between population size and linguistic complexity; the ways in which native speakers adapt their own language use to the simplified usage of their non-native interlocutors allows a relatively small proportion of non-native speakers to have a disproportionately large influence on a language.
Bart de Boer
Computer Simulation of Language Evolution
This talk will illustrate how one particular quantitative approach, i.e. computer simulation, can help to answer questions in the field of language evolution. Language evolution is concerned with understanding the biological (cognitive and anatomical) basis of language, as well as how languages change over time from an evolutionary perspective. Language evolution involves the interaction of many complex systems: language itself, learning mechanisms in the brain, biological and cultural evolution. Moreover, it is about historical processes in which randomness has played a role, and about which a lot of information has been lost. Therefore, it turns out that it is often hard to understand the implications of theories of language evolution, and to understand how empirical data is relevant to these theories. This talk will illustrate with a number of examples (drawn mostly from the presenter's area of expertise: the study of the evolution of speech) how computer simulation can help to understand language evolution, and how it can help to interpret empirical work as well as to suggest new experiments. The examples will include models based on artificial intelligence techniques, physical simulation models and theoretical (numerical) models. The advantages and disadvantages, as well as when to apply the different models will be discussed.
Ramon Ferrer i Cancho
Statistical laws of language in humans, other species and genomes: amusement or real science?
Quantitative linguistics has been feeding science with statistical regularities. George Kingsley Zipf introduced many of them such as the law that bears his name, the law of abbreviation or the meaning-frequency law. Zipf's original explanations for the origins of these laws based on a general principle of least effort as well as their finding in other species have been shadowed arguing that typing at random or rolling a die one would retrieve Zipf's law for word frequencies and related laws anyway. The same shadow has been cast on the finding of another statistical law of language, Menzerath's law, in genomes. Here we will review the finding of these laws across species as well as the statistical flaws of these criticisms. We will show how information theory can help us to understand their theoretical weakness and formalize Zipf's original ideas. We will also show how information theory provides us with parsimonious hypotheses for the emergence of these laws in natural systems that defy the construction of a lengthy theory of natural communication systems by simple aggregation of simplistic, brute force or ill-defined random models from the past century.