FISH 560: Applied Multivariate Statistics for Ecologists

 

Winter 2009

INSTRUCTOR: Julian D. Olden
TA: Eric Larson

 

Multivariate statistics describes the collection of procedures involving the observation and analysis of two or more dependent variables.

 

Office Location: Fisheries Science Blg., Room 318A
Office hours: Thursday 1:00 - 3:00
Contact information: olden@u.washington.edu, 616-3112
Course web page: http://www.fish.washington.edu/classes/fish560
Class: Tuesday and Thursday 3:00 - 4:20; Fisheries Science Blg. Room 136
Prerequisite(s): QSCI 482 or equivalent or permission from instructor


TESTIMONIALS

"This class introduced multivariate methods and forced me to think about how each method could be applied to my research. It stretched my intellect and I now consider multivariate statistics as a tool that I'm comfortable to use."

"Everything we learned in class was immediately applied to the class data set or our own data to give hands-on experience."
"The use of our data was key in making the class an individualized success. It challenged my understanding of techniques and also assumptions in my own data and will contribute to my grad school progress like no other class has."
"I would highly recommend this course to any and every ecologists, and I would lobby anyone to take it from Julian"

 

Examples are taken from all sub-disciplines of ecology, including both aquatic and terrestrial (i.e., this is not a fish-centric course). Previous students have been from Fisheries, Oceanography, Forestry and Biology.

 

ELECTRONIC JOURNAL OF APPLIED MULTIVARIATE STATISTICS (EJAMS) - Click here for access

 

LECTURE NOTES and LAB EXERCISES - Click here for access

 

COURSE DESCRIPTION
With recent advances in data collection technology and ambitious field research, ecologists are increasingly calling upon multivariate statistics to explore and test for patterns in their data.  The goal of this course is to introduce graduate students in the ecological sciences (that’s you!) to the multivariate statistical techniques necessary to carry out sophisticated analyses and to critically evaluate scientific papers using these approaches.  This is a practical, hands-on course emphasizing the analysis and interpretation of multivariate analysis, and covers the majority of approaches in common use by ecologists.  The emphasis of the course is on the conceptual understanding and practical use of the methods (not the matrix algebra), with the singular hope of de-mystifying the "alphabet soup" of multivariate analysis. 

There are three main categories of multivariate analysis that are common in ecology: (i) clustering, (ii) ordination and (iii) statistical tests of hypotheses.  We will cover all three categories in detail.  The intent of this course is to provide you with the following: (1) an introduction to the use of multivariate statistics in ecological research; (2) a conceptual organization of the various multivariate techniques, with respect to the types of research questions and data sets appropriate for each technique; and (3) a working understanding of how to use and interpret the results of each technique, including a conceptual overview, list of assumptions, diagnostics for assessing the assumptions, mechanics of performing the analysis using a variety of software, and how to interpret the statistical output of the analysis. 

METHOD OF INSTRUCTION
Lectures/labs – Lectures will integrate both theoretical aspects of multivariate statistics and provide solutions and interpretations from various software packages.  For each topic there will be a formal lecture followed by a computer-based lab where software will be used to analyze ecological data using the particular multivariate technique.  This course will focus on the use of the R package and PC-ORD, in addition to pointing to other available software packages.

Pop-quiz – A portion of your grade is based on a pop-quiz that will be administered at some point during the quarter. This quiz is used to test your understanding of the material, and promote self-evaluation of your progress in the course.

Final report and peer review – A significant portion of your grade is based on a final written paper and peer review of other class members’ papers (see below).  The final paper will consist of a statistical analysis of a multivariate data set (approved by your instructor).  The nature of the question, the source of the data, and the kinds of analysis employed is flexible.  The primary requirement is that the data and analysis must address one or more specific biological hypotheses, which are to be tested using an appropriate method(s) of multivariate analysis.  The primary goal is a coherent scientific paper, not excessive number crunching.
                                           
DATASETS
Personal dataset – A primary goal of this course is to provide you the opportunity to get better acquainted with your own data.  The data set may be your own, one obtained from the literature or one provided by the Instructor.  Ideally you should use data that you have collected or are otherwise somewhat familiar with. The data set should be one or more matrices of entities × attributes (e.g., samples × species, species × characteristics of species, sites × environmental factors).  The only data requirements are that it be adequate to test the hypotheses addressed in your final report. If you do not have access to a multivariate dataset, then I would be please to provide one.

Class dataset  –Even if you do have a multivariate dataset, it is unlikely to be suitable for all the techniques covered in class.  To address this issue I will provide a common dataset to all students at the beginning of the quarter.  This dataset is in addition to your own personal dataset that forms the basis for your final report.  Using the class data you will be able to conduct all the statistical approaches listed in the syllabus. Moreover, this dataset will serve as the basis for the short assignments.  You will be expected to work with both your own dataset and the class dataset during the labs.

TEXTBOOK(S) AND REQUIRED TOOLS OR SUPPLIES
There is no required text for this course, however I highly recommend:
McGarigal, K., S. Cushman, and S. Stafford. 2000. Multivariate Statistics for Wildlife and Ecology Research. Springer.

Other statistical texts that are likely to be helpful (in order of value based on my personal experience) include:
Legendre, P., and L. Legendre. 1998. Numerical Ecology. 2nd edn. Elsevier Scientific.
Gauch, H.G. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press.
Manly, B.F.J. 2004. Multivariate Statistical Methods: a primer. Chapman and Hall.
Digby, P.G.N. and R.A. Kempton. 1987. Multivariate Analysis of Ecological Communities. Chapman & Hall.
Jongman, R.H.G., C.J.F. ter Braak, and O.F.R. van Tongeren. 1995. Data analysis in Community and Landscape Ecology. Cambridge University Press.
Pielou, E.C. 1984. The interpretation of ecological data: a primer on classification and ordination. Wiley-Interscience.

You will need to bring a USB memory stick to class.

GRADING PLAN
The Standard UW Numerical Grading System will be used according to the breakdown provided below.  See: http://www.washington.edu/students/gencat/front/Grading_Sys.html for university description.

 

Task                                                                                     
Participation in lecture and lab
One-page proposal
Pop-quiz
Final paper
Peer-review reports
Due date 
Never-ending
Feb 5th, 2009
?
March 5, 2009
March 12, 2009
% of grade  
10
10
10
50
20

 

TENTATIVE SCHEDULE

Date Topic
Jan-6 Course overview
The Nature of Multivariate Statistics
Jan-8 Data screening
 
Jan-13 Data screening
 
Jan-15 Ecological resemblance
  • Modes of analysis, analytical spaces
  • Similarity coefficients (binary, categorical, quantitative)
 
Jan-20 Ecological resemblance
  • Distance coefficients
  • Coefficients of dependence
  • Choice of coefficients
 
Jan-22 Cluster analysis
  • Introduction, diversity of approaches
  • Hierarchical agglomerative clustering (e.g., linkage, UPGMA)
 
Jan-27 Cluster analysis
  • Hierarchical divisive clustering (e.g., TWINSPAN, K-means)
Jan-29 Cluster analysis
  • Cluster diagnostics, limitations, recommendations
  • Presenting results from cluster analyses: The dos and don’ts!
Feb-3 Direct Ordination
  • Principal component analysis (PCA)
  • Introduction, purpose, Shepard diagrams
  • Computing eigenvalues, principal components
  • Covariance vs. correlation, meaningful components, misuses
Feb-5 Direct Ordination
  • Principal component analysis (PCA), continued
Feb-10 Direct Ordination
  • Principal coordinate analysis (PCoA)
  • Non-metric multidimensional scaling (NMDS)
Feb-12 Direct Ordination
  • Correspondence analysis (CA)
  • Detrended correspondence analysis (DCA)
  • Presenting results from ordination analyses: The dos and don’ts!
Feb-17 Indirect Ordination
  • Redundancy analysis (RDA)
  • Canonical correspondence analysis (CCA)
  • Canonical correlation analysis (CCorA)
Feb-19 Indirect Ordination
  • Partial RDA and CCA
  • Hierarchical RDA and CCA
Feb-24 Discriminant Function Analysis
Feb-26 Testing for Similarities among Groups
  • Analysis of similarity (ANOSIM)
Mar-3 Testing for Differences among Groups
  • Multi-response Permutation Procedure (MRPP)
Mar-5 Testing for Differences among Groups
  • Permutational MANOVA (perMANOVA)
  • Permutation test of multivariate dispersion
 
Mar-10 Testing for Associations among Matrices
  • Mantel tests
  • Procrustes Analysis
Mar-12 Putting it all together: A general strategy for multivariate analysis