Course description
The short course will explore some recent advancements in hierarchical models using Markov chain Monte Carlo (MCMC) methods. The focus is on linear and generalized linear modeling frameworks that accommodate spatial and temporal associations. Lectures and exercises offer an applied and practical perspective on model specification, identifiability of parameters, and computational considerations for Bayesian inference using posterior distributions. We begin with a basic introduction to fitting Bayesian hierarchical linear models and proceed to address several common challenges in environmental analysis, including missing data and when the number of observations is too large to efficiently fit the desired models. The exercises blend modeling, computing, and data analysis including a brief introduction to the spBayes and spNNGP R packages. We will also take a “look under the hood” with the aim of developing efficient MCMC algorithms written in C/C++ and FORTRAN that can serve as standalone programs or be called from R. Special attention will be given to setting up an efficient R computing environment that leverages multiple CPUs via OpenMP and threaded BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage).
Participants
The course should be useful for students in applied fields working with spatial and spatiotemporal data who are starting to think about model development and available software. Students in more quantitative fields such as statistics and data analytics will gain from the more detailed look at writing efficient MCMC samplers for random effects models with structured covariance matrices, coding tips and tricks for parallelization, and a few worked examples from which they can build their own codebase.
Getting started (some pre-course work)
This course offers lecture, discussion, and hands-on exercises on topics about efficient computing for spatial data models. This is a hands-on course with exercises that blend C/C++, FORTRAN, and R, so please bring your laptops. To participate fully in the exercises, you’ll need the most recent version of R (version 3.6.1) and RStudio Desktop (at least version 1.1.456) installed. We will make use of RStutio’s new Terminal feature to simplify integration among participants using different operating systems. Please note, you do not need to know C/C++ or FORTRAN to participate and benefit from the course; however, a familiarity with R or similar high-level language (e.g., Python, MATLAB, Julia, etc.) will be helpful.
Additionally, you’ll need to install some development software (e.g., compilers and efficient matrix libraries) if you want to work through the exercises (optional). Follow the instructions here to install Rtools, XCode, and r-base-dev for Windows, Mac OS X, and Linux, respectively. You do not need the latex libraries for our course.
For Windows users, when installing Rtools please keep all suggested install settings with the addition of checking the box “Add rtools to system Path” in the “Select Additional Tasks” dialog window (it is not checked by default).
For Mac OS X users, for some reason even when you install the most recent XCode via the App Store it does not include the most recent version of the clang compiler. So you will need to do the additional step of opening a terminal and typing xcode-select --install
then agreeing to questions in the subsequent install dialog windows. Next, if you want to compile C/C++ code that uses the OpenMP library for parallel computing (which is the focus of several of our exercises) you’ll need to do a bit more work. The issue is the clang compilers that come with XCode are old and don’t include OpenMP support. Read Sections C.3.1 and C.3.2 here. In a nutshell, you’ll need to:
- install gfortran-6.1.pkg from https://cran.r-project.org/bin/macosx/tools/.
- install pcre and xz, see third bullet in Section C.3.1 for software location and install directions (you can likely just copy those commands into your terminal and run it).
- install clang-8.0.0.pkg from https://cran.r-project.org/bin/macosx/tools/.
I’m not a Mac user, but was able to get the course code to run using the steps above. If this is too much trouble, then we can try to troubleshoot it a bit the morning of the course. Again, running the exercises yourself is completely optional and you might actually get more from the course by just sitting back and watching the rest of us struggle :-)
Please email me finleya@msu.edu if you have any questions or issues arise. This software is always changing and hence the install process that worked a few weeks ago might not work today.
Course schedule and materials (download all in one zip or tar.gz)
- 09:00 - 09:15 Set up, welcome, and overview slides
- 9:15 - 10:15 Introduction/review of point-referenced spatial regression models slides
- Exercise 1
- Markdown doc
- Code zip or tar.gz
- Exercise 1
- 10:15 - 10:45 Break
- 10:45 - 12:00 Calling C/C++ and FORTRAN from
R
and fun with OpenMPR
API and OpenMP slides- Exercise 2
- Markdown doc
- Code zip or tar.gz
- Exercise 3
- Markdown doc
- Code zip or tar.gz
- Rmath.h bessel_k.c
- 12:00 - 1:00 Lunch
- 1:00 - 2:00 Leveraging your computer’s CPUs using threaded BLAS and LAPACK slides
- Exercise 4
- Markdown doc
- Code zip or tar.gz
- Exercise 4
- 2:00 - 3:30 Computing for mixed effects models with dense covariance matrices slides
- Exercise 5
- Markdown doc
- Code zip or tar.gz
- Exercise 5
Supplemental material
- Parameter recovery from collapsed mixed effects models slides
- Calling external libraries and sparse matrices slides