session 1: introduction
Datasets and classes. Basic operations. Scan, read and write. Help. Subsetting and splitting.- tutorial about Statistics with R
- some functions
- script for session 1
session 2: exploratory data analysis; simple tests
Frequency distribution. Checking for normality.Create your own functions.
t test and effect size.
Scattergrams. Using colors and legends. Pairs of scattergrams. Trellis plots.
- questionnaire for students in Statistiek 2011 course: questionnaire, data.
- script for session 2
session 3: regression
Linear and logistic regression. Correlation and covariance. Collinearity and factor analysis.Crossvalidation and bootstrapping.
- script for session 3 (updated).
- script for function eda.fnc.
- data on gender bias in syntactic textbooks, after Macaulay & Brice (1997), Language 73(4), http://www.jstor.org/stable/417327
- illustration of an ROC curve.
session 4: mixed-effects modeling
Logistic regression: remainder of session 3.The materials about mixed-effects modeling were also used for my presentation at the Bielefeld Mixed Model Workshop, 22-24 Feb 2012, Bielefeld. This is probably more material than we can cover in one session.
- part 1: warming up using QB08 data: handout, script qb08.R, data.
- part 2: on random intercepts and random slopes, using Hardman data: handout, script hardman.R, adjusted data, hardman.trialbytalkers.r, hardman.trialbywords.r, hardman.trialbylisteners.r, hardman.boot.r.
part 3: on growth curve analysis (Visual World data): handout, script kb07.R, data kb07full.txt.zip, kb07m02.timebysubj.R, kb07m02.timebyitem.R.- part 4: on reducing prediction error (R2), using Arnhold data: handout, script arnholdM.R, data arnhold2.txt.
- (new) explanation about "explained variance", or proportional reduction in mean square prediction error.
session 5: Bring Your Own Data
If you wish to analyze your own data in class, please (a) provide a brief description of your study, your research questions and your data, (b) make sure to identify and describe all variables and their values in a "code book", and (c) send me your data, description and code book ASAP as excel (.XLS), .CSV or .TXT file.- data by Marcelle Cole: script
data by Sophia Manika: script- data by Merel Keijzer and Rias van den Doel: script
- data by Brigitta Keij: script
Recommended reading:
- webpage of the R Study Group at UPenn Linguistics Dept (thanks to Marcelle Cole).
- Allerhand, M. (2012). A Tiny Handbook of R. Berlin: Springer. ISBN 9783642179808.
- Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using {R}. Cambridge: Cambridge University Press.
- Faraway, J. J. (2006). Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models. Boca Raton, FL: Chapman and Hall.
- Johnson, K. (2008). Quantitative Methods in Linguistics. Malden, MA: Blackwell. ISBN 978-1-4051-4425-4.
- Menard, S. W. (2009). Logistic Regression: From Introductory to Advanced Concepts and Applications. Thousand Oaks, CA: Sage. ISBN 978-1-4129-7483-7.
- Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-Plus. New York: Springer.
- Quené, H. & Van Delft, L.E. (2010). Non-native durational patterns decrease speech intelligibility. Speech Communication, 52 (11-12), 911-918. [doi:10.1016/j.specom.2010.03.005].
- Venables, W. N., & Ripley, B. D. (1994). Modern Applied Statistics with S-Plus. New York: Springer.