This document provides an introduction to RLQ analysis, as part of a series of tutorials for studying trait-environment relationship. The tutorial targets students and scientists in ecology with previous knowledge of the R software.

Please consult the tutorial by Dray S. et al. 2016 for more details about the RLQ analysis DOI 10.6084/m9.figshare.c.3306393.v1

The example dataset is available for download here ( NEAtl_FishTraitEnv.Rdata) and the script here (RLQ_FishTraitEnv.R)

1. Preliminaries

1.1. Load packages and dataset

The RLQ analyses require the R packages ade4 (v ≥ 1.7.16).

library(ade4)

To plot maps with country border, you also need the packages ggplot2 (v ≥ 3.3).

library(ggplot2)

If you get an error message, check that the R packages are installed correctly. If not, use the command: install.packages(c("ade4", "ggplot2")).

The example dataset is available as the Rdata file NEAtl_FishTraitEnv.Rdata, available for download here.

1.2 Load the example dataset

Make sure the file NEAtl_FishTraitEnv.Rdata is in your working directory, then load it in R.

load("NEAtl_FishTraitEnv.Rdata")

The Rdata file contains four objects:

  • abu containing the abundance of taxa in grid cells
  • env containing the environmental condition per grid cell
  • trait containing the trait information per taxa
  • coo: the coordinates of each grid cell

Importantly, the rows in abu correspond to the same grid cell than the rows in env, and the column in abu correspond to the same taxa than the rows in trait.

all(row.names(abu)==row.names(env))
## [1] TRUE
all(colnames(abu)==row.names(trait))
## [1] TRUE

If you want to learn how to create such dataset, see the short tutorial on setting trait-environement dataset.

Using the fish community of the Northeast Atlantic as an example, we will explore the trait-environment relationship using the RLQ analysis.

1.3 Quick summary of the variables

dim(trait)
## [1] 148   7
names(trait)
## [1] "Trophic.level"      "K"                  "Lmax"              
## [4] "Lifespan"           "Age.maturity"       "Fecundity_log"     
## [7] "Offspring.size_log"

The trait table contains 7 traits (i.e variable, in column) characterizing 148 taxa (in rows). The 7 traits broadly represent the life history and ecology of fish in terms of their feeding, growth, survival and reproduction. These are:

  • Trophic level
  • K: the growth rate, calculated as Von Bertalanffy growth coefficient in year\(^{-1}\)
  • Lmax: maximum body length in cm
  • Lifespan
  • Offspring.size_log: egg diameter, length of egg case or length of pup in mm
  • Fecundity_log: number of offspring produced by a female per year
  • Age.maturity: in years

Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.

dim(env)
## [1] 169   7
names(env)
## [1] "Depth"   "SBT"     "SBS"     "Chl"     "SBT_sea" "Chl_sea" "Fishing"

The env table contains 7 environmental variables (in column) characterizing 169 grid cells (in rows). The environmental variables measure hydrography, habitat, food availability and anthropogenic pressures, which are known to affect the distribution of fish species. These are:

  • Depth: depth in meter, directly measured during the survey.
  • SBT: monthly sea bottom temperature in °C from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • SBS: monthly sea bottom salinity from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • Chl: Chlorophyll a concentration (in \(mg.m^{-3}\)) as a proxy for primary production and food availability from the GlobColour database
  • SBT_sea: seasonality of sea bottom temperature, calculated as the difference between the warmest and the coldest month of the year.
  • Chl_sea: seasonality of chlorophyll a concentration, calculated as the difference between the highest and the lowest primary production in the year
  • Fishing: the cumulative demersal fishing pressure in 2013, estimated globally by Halpern et al. 2015, DOI 10.1038/ncomms8615.

2. Single unconstrained multivariate analysis

The RLQ analysis is performed by four successive steps:

  1. a correspondence analysis (COA) on abundance matrix
  2. a multivariate analysis on trait matrix using column weights from step 1
  3. a multivariate analysis on environmental matrix using row weights from step 1
  4. the RLQ analysis comparing the co-variance of the three previous steps with co-intertia analysis

2.1 COA on abundance matrix L (sites x species)

Correspondence analysis (COA) is a multivariate method suited to frequency data, such as count, presence-absence, or abundance data. It looks for ‘correspondence’, i.e. highlight which sites do the species prefer, in other words, which sites correspond to my species. Compare to PCA which maximize the variance explained, COA maximize the inertia explained, i.e. the ‘correspondence’ between the rows and columns of the table. For more explanation about COA, I recommend the short explanation in GustaMe or the book Numerical Ecology by Legendre brothers.

coa.abu <- dudi.coa(abu, scannf = FALSE, nf=2)

The objective is not to look in details at each step, so we don’t interpret the results of COA here.

2.2 PCA on traits matrix Q (species x traits)

The second step it to analyze the trait matrix with a multivariate analysis. The choice of multivariate method (in the second and third step) is strongly linked with the type of data. It can be:

  • Principal Component Analysis (PCA) if all variables are continuous. PCA is calculated with dudi.pca().
  • Multiple Correspondence Analysis (MCA) if all variables are categorical. MCA is calculated with dudi.acm().
  • Hill and smith analysis if there is a mix of continuous and categorical variables. Hill and smith analysis is calculated with dudi.hillsmith().

In all cases, it is necessary to specify the row weights (corresponding to the row or columns weights of the previous COA) with the parameter row.w.

In our case all traits are continuous, so we use a PCA with the row weights corresponding to the column weight of the previous COA (row.w = coa.abu$cw).

pca.trait <- dudi.pca(trait, scannf = FALSE, 
                      row.w = coa.abu$cw)

2.3 PCA on environment matrix R (sites x environment)

Similarly here, with our seven continuous environmental variable, we chose to use a PCA with the row weights corresponding to the row weight of the previous COA (row.w = coa.abu$lw).

pca.env <- dudi.pca(env, scannf = FALSE, 
                     row.w = coa.abu$lw)

3. Run RLQ analysis and interpret the results

3.1 Compute RLQ analysis

To compute a RLQ analysis, we use the rlq() function with the results of the three previous steps, with R the environment, L the abundance, and Q the traits. RLQ is based on co-inertia analysis, which is a unconstrained symmetrical analysis that look for a compromise between the axes of three unconstrained analyses. RLQ combines the three separate analyses and aims at identifying the main relationships between environmental gradients and trait syndromes mediated by species abundances. For more mathematical definition, see Dray et al. 2003 and Dray et al. 2014

rlqF <- rlq(pca.env, coa.abu, pca.trait, 
            scannf = FALSE)

summary(rlqF)
## RLQ analysis
## 
## Class: rlq dudi
## Call: rlq(dudiR = pca.env, dudiL = coa.abu, dudiQ = pca.trait, scannf = FALSE)
## 
## Total inertia: 0.5998
## 
## Eigenvalues:
##       Ax1       Ax2       Ax3       Ax4       Ax5 
## 0.4764397 0.0725046 0.0472571 0.0027566 0.0005925 
## 
## Projected inertia (%):
##      Ax1      Ax2      Ax3      Ax4      Ax5 
## 79.43448 12.08833  7.87894  0.45959  0.09878 
## 
## Cumulative projected inertia (%):
##     Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
##   79.43   91.52   99.40   99.86   99.96 
## 
## (Only 5 dimensions (out of 7) are shown)
## 
## 
## Eigenvalues decomposition:
##          eig     covar      sdR      sdQ      corr
## 1 0.47643973 0.6902461 1.622070 1.710762 0.2487396
## 2 0.07250456 0.2692667 1.050307 1.156022 0.2217686
## 
## Inertia & coinertia R (pca.env):
##     inertia      max     ratio
## 1  2.631110 3.095306 0.8500321
## 12 3.734255 4.535678 0.8233069
## 
## Inertia & coinertia Q (pca.trait):
##     inertia      max     ratio
## 1  2.926706 3.173574 0.9222113
## 12 4.263094 4.824000 0.8837259
## 
## Correlation L (coa.abu):
##        corr       max     ratio
## 1 0.2487396 0.8494256 0.2928327
## 2 0.2217686 0.6599228 0.3360523

The projected inertia is the amount of co-inertia explained by the the ordination provided by RLQ analysis assigns scores to species, samples, traits, and environmental variables along orthogonal axes and yields graphical summary of the main structures.

The first axis of co-inertia (RLQ1) explain 79% of covariance. Hence, we will only interpret the scores of the first axis (but similar visualization could be carried out for subsequent axis).

3.2 Trait scores

#Plot traits score
t1 <- order(rlqF$c1[,1])
dotchart(rlqF$c1[t1,1], pch=16, 
         labels = names(trait)[t1])
abline(v=0, lty=2)

RLQ1 show the fast-slow continuum in species traits. With positive score (right side) for traits associated to slow life history (high age at maturity, large offspring, large size and high life expectancy) and negative score for traits associated to fast life history (fast growth).

3.3 Species scores

Due to the high number of species (nrow(trait)), it is hard to visualize the score of each individual species. But we can see the species with highest and lowest score on RLQ1.

# top 10 species with positive score
rlqF$mQ[order(rlqF$mQ[,1], decreasing = TRUE)[1:10],]
##                              NorS1      NorS2
## Dipturus batis            3.841011 -1.4317468
## Squalus acanthias         3.263909 -0.0987961
## Galeorhinus galeus        3.091271  0.1618693
## Dipturus oxyrinchus       3.013613 -1.6887759
## Hippoglossus hippoglossus 2.806801  0.9677443
## Bathyraja spinicauda      2.670585 -0.8336734
## Amblyraja hyperborea      2.380230 -1.5471741
## Malacocephalus laevis     2.006567 -0.4382923
## Amblyraja radiata         1.970720 -1.9066521
## Chimaera monstrosa        1.962441 -2.3833948
# top 10 species with negative score
rlqF$mQ[order(rlqF$mQ[,1])[1:10],]
##                             NorS1       NorS2
## Gasterosteus aculeatus  -4.667595 -1.92735675
## Maurolicus muelleri     -2.289692 -1.72408520
## Myctophidae             -1.731315 -0.92239685
## Sprattus sprattus       -1.685007 -1.24161265
## Liparis liparis liparis -1.682222 -0.97866063
## Gobiidae                -1.631882 -1.04425917
## Liparis bathyarcticus   -1.585132 -0.82377709
## Gaidropsarus            -1.528951 -0.05689407
## Engraulis               -1.511147 -0.64166809
## Syngnathus acus         -1.509618 -1.47410553

Among the slowest species (here with positive score) are many sharks, skates and rays (e.g. the Greenland shark Somniosus microcephalus, the common skate Dipturus batis) but also some large bony fish such as the Atlantic halibut Hippoglossus hippoglossus.
Among the fastest species (here with negative score) are small fish such as three-spined stickleback Gasterosteus aculeatus, sprat Sprattus sprattus and gobbies Gobiidae.

3.4 Environmental scores

#Plot environment score
e1 <- order(rlqF$l1[,1])
dotchart(rlqF$l1[e1,1], pch=16,
         labels = names(env)[e1])
abline(v=0, lty=2)

RLQ1 show a gradient between shallow-warm and deep-cold waters. With positive scores are environment with high depth, but low and constant temperature as well as low primary production. Sites with negative scores have high average temperature and high seasonal fluctuation of temperature, as well as high primary production and low depth.

3.5 Sites scores

# Choice of diverging color scale
colpal <- terrain.colors(7)[-7]
# or from RColorBrewer package
# colpal <- rev(RColorBrewer::brewer.pal(6,"RdYlBu")) 

mapggplot(coo[,1], coo[,2], rlqF$lR[,1], 
          colpal, main="RLQ1")

This map summarize the association of trait and environment identified by RLQ analysis. Cells with green color (representing negative scores) are shallower and have higher and more fluctuating water temperature. They host species that are faster, especially with a fast growth coefficient, low age at maturity, small offspring and overall smaller and shorter lived species. On the contrary, cells with pinkish color are deeper and have constantly cold water temperature, which host a fish community with, in average, slower growth, slower maturity, larger offspring and larger and longer lived species.

References

Buttigieg PL, Ramette A (2014) A Guide to Statistical Analysis in Microbial Ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol Ecol. 90: 543–550.

Dray, S., Choler, P., Dolédec, S., Peres-Neto, P. R., Thuiller, W., Pavoine, S., & ter Braak, C. J. (2014). Combining the fourth‐corner and the RLQ methods for assessing trait responses to environmental variation. Ecology, 95(1), 14-21. DOI 10.1890/13-0196.1

Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.