Statistical methods for linguistic research: Foundational Ideas

Language & Computation

Statistical methods have become central to almost every data-driven research problem, in both computational linguistics and in linguistic research. This is true in both industry and in academia. And yet, many users of statistical tools have only a vague understanding about the central ideas that underpin statistical theory. For example, many researchers do not understand what a p-value tells you about the research hypothesis; and even professional users of statistics, with many years of practical experience behind them, cannot accurately explain what a confidence interval is. Understanding these concepts is crucial for drawing correct inferences from data. Learning statistical inference concepts is vitally important for students of language and computation. This course and its follow-up achieve that goal over the two weeks of ESSLLI 2015. In this course, we will cover the foundational ideas of classical frequentist statistics. The major topics I will cover constitute the core knowledge that I feel that everyone in linguistics should have: An introduction to R The sampling distribution of the sample mean Power, Type I, Type II, Type M and Type S errors (Generalized) Linear Models Linear Mixed Models

First week
14:00 - 15:30 - slot 3