Basics of R
Data import and manipulation, graphs, functions, modeling
R. Condit
Ohio State University Department of Music
May 2013
1 Course overview
The course will start with basic R, covering how to import and manipulate data. We will subsequently
cover three main topics: graphs, functions, and basic regression models.
During the first half of each session, I will explain methods and present examples of
their use; in the second half, students will work on their own using the same methods.
Datasets will be provided, but students are encouraged to bring their own data as well. A
course web site will provide sample code, data, and a list of key R functions. Students
should have their own computer to work on during the sessions, know the file system on it
(ie, how to find files by typing the path plus file name), and be familiar with ascii text
files.
1.1 To join
- Contact Nat Condit-Schultz
1.2 Schedule
- When: Four Sessions, 10AM-2PM (with lunch break), 7-10 May 2013
- Where: Huron Lab, OSU Music
2 Software required
3 Course web site
4 Contents and approximate scheduling
- Basics of R [day 1]
- Data types
- Atomic (or scalar): a single value
- Vector: one dimensional array of values
- Dataframe: two dimensional table
- Advanced: matrix, array, list
- Character vs. numeric variables
- Command line assigning and manipulating
- Import
- Reading ascii
- Loading: load vs. attach
- Export
- Working with dataframes
- Array elements
- The $ symbol
- Extracting one or a few rows (columns)
- Subsetting rows
- Adding a new column
- Repeatability
- Scripts
- The source command
- Functions (see day 4)
- Graphs [start day 1, continue day 2]
- Scatter plot
- The functions plot and points
- Data on tempo and temperature (studyDataOneClean.csv): Tempo Vs.
AverageHigh
- Lines
- Manipulating the appearance
- Arguments to plot: xlab, ylab, pch, col, lty
- Log-transformation
- Command-line export
- Advanced: manipulating axes with axis and box
- Power of R (a brief introduction of things spreadsheets can’t accomplish) [day
2]
- Vectorized calculations
- Filtering data to get one or more rows (columns)
- colSums and rowSums
- table
- tapply
- subset
- Modeling with standard regression [day 3]
- Linear regression
- The function lm
- Regression line
- Capturing the results
- Data treemass: log(agb) vs. log(dbh)
- Creating your own functions [day 4]
- Function definition and the curly braces
- Understanding arguments
- Local names and values
- Default values
- Loops
- for
- while
- Requires curly braces
- Using if and else
- Subroutines
- Functions within functions
- Adds work now to save work later
- Returns value
- Single variable
- Multiple variable returns with list data type
- Advanced topic: Multi-level models [if time and interest are available]
- Why multi-level modeling?
- Multi-level vs. standard regression
Bates Chap 4, Section 4.4; Gelman & Hill pp. 251-259
- Regression with one group using lmer
- output of display
- graphs using the coefficients
- variable intercept, slope, or both
- Random and fixed effects
- Good references
5 Key R functions
- Basics
- length
- dim
- str
- read.table
- write.table
- attach
- load
- colnames
- Data creation
- read.table (also read.delim, read.csv)
- write.table
- numeric
- c
- : (a colon)
- character
- data.frame
- matrix
- array
- Numeric
- math operators: * + - : / ^
- math functions: log, log10, sin, cos, sqrt (and more)
- mean
- median
- sd
- var
- summary
- Character
- table
- unique
- nchar
- Graphics
- hist
- plot
- points
- line
- curve
- abline
- box
- axis
- X11
- dev.set
- Functions
- function
- args
- loops
- if, else
- return
- browser
- Data extraction
- subset
- apply
- tapply
- cut
- dim
- Modeling
- model
- return of lm: coefficients, fitted.values, residuals
- result of summary(lm): also r.squared
- abline(lm) to add line
- predict.lm for predictions
- details
- lm relies on normally distributed data, especially for y
- logarithms when data are skewed
- for small integers or binomial data (0 vs. 1) there is an alternative, glm
- when x is numeric, it is helpful to ’center’ by substracting mean (especially if x is
very far from zero)
- advanced
- glm
- lmer [lme4 package]
- coef
- summary
- fixef [arm package]
- ranef [arm package]
- display [arm package]
- dotplot [lattice package]
- xyplot [lattice package]