Stat 230 Notes
Getting Started
1
Review of Statistical Inference
1.1
Sampling Distribution
1.2
Central Limit Theorem
1.2.1
Standard error
1.3
Hypothesis testing
1.4
Confidence Intervals
1.5
Review activity (day 2)
1.5.1
General questions
1.5.2
Comparing two means
2
Simple Linear Regression
2.1
The variables
2.2
EDA for SLR
2.3
The model form
2.3.1
Interpretation
2.3.2
Example: Woodpecker nests
2.4
Theory: Estimation
2.4.1
Sampling Distributions for SLR estimates
2.5
Example: Nest depth
2.5.1
Load data
2.5.2
EDA for SLR
2.5.3
The least squares line (the estimated SLR model):
2.5.4
Additional
lm
information
2.6
SLR model simulation
2.6.1
Simulation function
2.6.2
Run the function once
2.6.3
Simulated sampling distribution for
\(\hat{\beta}_1\)
2.6.4
Are slope and intercept estimates correlated?
2.7
Inference for mean parameters
2.7.1
Confidence Intervals
2.7.2
Hypothesis tests
2.7.3
Example: Inference for nest depth coefficients
2.8
Inference for average or predicted response
2.8.1
Confidence intervals for
\(\mu_{y \mid x}\)
2.8.2
Prediction intervals for new cases
2.8.3
Example: Inference for mean or predicted nest depth
2.9
Checking model assumptions and fit
2.9.1
Residuals
2.9.2
Residual plot: linearity and constant variance
2.9.3
Residual normal QQ plot
2.9.4
Independence
2.9.5
Robustness against violations
2.9.6
“Fixes” to violations
2.10
Example: SLR assumptions
2.10.1
Drug offender sentences
2.10.2
Case study 15.2 - Global Warming
2.11
Transformations
2.11.1
Transformation choices
2.11.2
Transformations in R
2.11.3
Interpretation
2.11.4
Review: Logarithms
2.12
Examples: Transformations
2.12.1
Cars 2004
2.12.2
2005 Residential Energy Survey (RECS)
2.13
\(R^2\)
and ANOVA for SLR
2.13.1
Example:
\(R^2\)
2.13.2
ANOVA for SLR
2.13.3
Example: ANOVA
3
Multiple Regression
3.1
The variables
3.2
The model form
3.2.1
Interpretation
3.3
Example: MLR fit and visuals
3.3.1
lm
fit
3.3.2
Graphics for MLR
3.3.3
Residual plots for MLR
3.3.4
EDA for interactions
3.3.5
Quadratic models: Corn yields (exercise 9.15)
3.4
Categorical Predictors
3.4.1
Interpretation: adding a categorical variable
3.4.2
Interpretation: adding a categorical interaction
3.5
Inference for MLR
3.5.1
Inference for a linear combination of
\(\beta\)
’s
3.5.2
Example: Palmer penguins
3.5.3
Example: more penguinns
3.5.4
Example: Sleep
3.6
ANOVA for MLR
3.6.1
Mean Squares
3.6.2
\(R^2\)
and adjusted
\(R^2\)
3.6.3
ANOVA F-tests
3.6.4
Example: Sleep
3.7
Model Checking
3.7.1
Residual plots
3.7.2
Outliers
3.8
MLR: Visualizing effects
3.8.1
Partial residual plots
3.8.2
Example: Sleep
3.9
Collinearity
3.9.1
Example: Sleep
4
Logistic Regression
4.1
The variables
4.1.1
Example: Donner party EDA
4.2
The Bernoulli distribution
4.3
The logistic model form
4.3.1
Interpretation
4.4
Inference and estimation
4.4.1
Confidence intervals for
\(\pmb{\beta_i}\)
4.4.2
Hypothesis tests for
\(\pmb{\beta_i}\)
4.4.3
R
glm
4.4.4
Example: Donner party model
4.4.5
Example: Donner party, adding sex
4.5
Deviance
4.5.1
Drop in Deviance test
4.5.2
Example: NES
4.6
Checking Assumptions
4.6.1
Example: Boundary Waters Canoe Area (BWCA) blowdown
4.7
Residuals and Case influence
4.7.1
Example: Boundary Waters Canoe Area (BWCA) blowdown
4.8
Binomial responses
4.8.1
Connection to binary responses
4.9
Inference for Binomial response models
4.9.1
Example: Krunnit Islands archipelago (Sleuth Case Study 21.1)
4.10
Deviance for Binomial responses
4.10.1
Goodness-of-fit test
4.10.2
Example: Krunnit Islands archipelago (Sleuth Case Study 21.1)
4.11
Checking Assumptions
4.11.1
Example: Krunnit Islands archipelago (Sleuth Case Study 21.1)
4.12
Residuals and case influence for binomial responses
4.12.1
Example: Krunnit Islands archipelago (Sleuth Case Study 21.1)
4.13
Quasi-binomial logistic model
4.13.1
Example: Moth predation (Case Study 21.2)
5
Poisson Regression
5.1
The Poisson distribution
5.2
The Poisson model form
5.2.1
Interpretation
5.3
EDA for Poisson regression
5.3.1
Example: Possums
5.4
Inference and estimation
5.4.1
Confidence intervals for
\(\pmb{\beta_i}\)
5.4.2
Hypothesis tests for
\(\pmb{\beta_i}\)
5.4.3
R
glm
5.4.4
Example: Possums
5.5
Deviance for Binomial responses
5.5.1
Drop-in-deviance test
5.6
Checking Assumptions
5.6.1
Goodness-of-fit test
5.6.2
GOF alternative
5.6.3
Example: Possums
5.7
Residuals and case influence for binomial responses
5.7.1
Example: Possums
5.8
Quasi-Poisson logistic model
Appendix
A
R and Rstudio
A.1
Running Rstudio
A.2
Installing R
A.3
Installing Rstudio
A.4
Installing R packages
B
The R enviroment
B.1
Workspace
B.2
Working directory
B.3
Rstudio projects
C
R for basic data analysis
C.1
Basics
C.1.1
Quick Tips
C.1.2
Objects
C.1.3
Vectors
C.1.4
Arithmetic
C.1.5
Subsetting
C.2
Data
C.2.1
Reading Data into R
C.2.2
Investigating a Data Frame
C.2.3
Accessing Data
C.2.4
Subsetting a Data Frame
C.2.5
Creating a data frame
C.2.6
Adding a new column to a data frame
C.2.7
Missing Data
C.3
EDA
C.3.1
Categorical:
C.3.2
Quantitative:
C.3.3
Quantitative grouped by a categorical
C.3.4
Graphs
C.3.5
Reporting Results
C.4
Factor variables
C.4.1
Renaming factor levels
C.4.2
Recode a categorical variable with many levels
C.4.3
Converting some factor levels to
NA
s
C.4.4
Changing the order of levels
C.4.5
Recode a numerically coded categorical variable
C.4.6
Recode a factor into a numeric
D
R Markdown
D.1
How to write an R Markdown document
D.2
Changing R Markdown chunk evaluation behavior
D.3
Creating a new R Markdown document
D.4
Extra: Graph formatting
D.4.1
Adding figure numbers and captions
D.4.2
Resizing graphs in Markdown
D.4.3
Changing graph formatting in R
D.4.4
Hiding R commands
D.4.5
Global changes in graph format
D.4.6
Comments:
D.5
Extra: Table formatting
D.5.1
Hiding R commands and R output
D.5.2
Markdown tables
D.5.3
Markdown tables via
kable
D.5.4
The
pander
package
D.5.5
The
stargazer
package
E
Math review
E.1
Linear equations
E.2
Logarithms
E.2.1
Interpreting logged variables
E.2.2
Inverse (i.e. reversing the log, getting rid of the log, …)
E.2.3
Logarithms in R
E.3
Exercises
References
Published with bookdown
Stat 230 Notes
References