Introduction
What will we learn?
- Introduction to statistical models
- Unreplicable vs. replicable studies
- The collision of designed experiments vs. observational spatio-temporal data
- Example 1: Spatial statistics
- Example 2: Trajectories
- Models for human movement/trajectories
Intro to statistical models
- What is data?
- Something in the real world that you can, in some way, observe and measure with or without error
- What is a statistic?
- What is a model?
- Simplification of something that is real designed to serve a purpose
- What is a statistical model?
- Simplification of a real data generating mechanism
- Constructed from deterministic mathematical equations and probability density / mass functions
- Capable of generating data
- Generative vs. non-generative models
- What is the purpose of a statistical model
- Capable of making predictions, forecasts, and hindcasts
- Enables statistical inference about observable and unobservable quantities
- Reliability quantify and communicate uncertainty
Intro to spatio-temporal data
- What is spatial data?
- What is a time series?
- What is spatio-temporal data?
Half-baked opinions about designed experiments
- Where did these half-baked opinions come from?
- 15 years experience of statistical consulting (i.e., watching people struggle)
- Authoring or co-authoring ~100 publication and a proud owner of a high rejection rate!
- Writing a text book (link)
- Teaching 20+ graduate-level courses on the topic
- Gold standard: designed experiments, replication, and randomization
- Replication crisis
- ASA’s statment on p-values (link)
- Model systems vs. reality
- My observations
- Most of what I see/work on are almost purely observational data/studies (e.g., here) or data/studies that have some elements of a designed experiment (e.g., the ability to apply a treatment) but lack other features (e.g., ability to replicate)
- Experimental design, just like other frameworks is a tool, that works for some but not all studies (e.g., studies of plant vs. animal diseases)
- Some professions seem very hesitant to use any tool other than designed experiments even when key features are missing
- Ideas and methods from spatial statistics, time series analysis, and spatio-temporal statistics offer an alternative view
- Example from the book Range
- At the end of the day, it is all about trade offs in assumptions (e.g., regression vs. anova)