This document provides an introduction to RLQ analysis, as part of a series of tutorials for studying trait-environment relationship. The tutorial targets students and scientists in ecology with previous knowledge of the R software.
Please consult the tutorial by Dray S. et al. 2016 for more details about the RLQ analysis DOI 10.6084/m9.figshare.c.3306393.v1
The example dataset is available for download here ( NEAtl_FishTraitEnv.Rdata) and the script here (RLQ_FishTraitEnv.R)
The RLQ analyses require the R packages ade4 (v ≥ 1.7.16).
library(ade4)
To plot maps with country border, you also need the packages ggplot2 (v ≥ 3.3)
.
library(ggplot2)
If you get an error message, check that the R packages are installed correctly. If not, use the command: install.packages(c("ade4", "ggplot2"))
.
The example dataset is available as the Rdata file NEAtl_FishTraitEnv.Rdata
, available for download here.
Make sure the file NEAtl_FishTraitEnv.Rdata
is in your working directory, then load it in R.
load("NEAtl_FishTraitEnv.Rdata")
The Rdata file contains four objects:
abu
containing the abundance of taxa in grid cellsenv
containing the environmental condition per grid celltrait
containing the trait information per taxacoo
: the coordinates of each grid cellImportantly, the rows in abu
correspond to the same grid cell than the rows in env
, and the column in abu
correspond to the same taxa than the rows in trait
.
all(row.names(abu)==row.names(env))
## [1] TRUE
all(colnames(abu)==row.names(trait))
## [1] TRUE
If you want to learn how to create such dataset, see the short tutorial on setting trait-environement dataset.
Using the fish community of the Northeast Atlantic as an example, we will explore the trait-environment relationship using the RLQ analysis.
dim(trait)
## [1] 148 7
names(trait)
## [1] "Trophic.level" "K" "Lmax"
## [4] "Lifespan" "Age.maturity" "Fecundity_log"
## [7] "Offspring.size_log"
The trait
table contains 7 traits (i.e variable, in column) characterizing 148 taxa (in rows). The 7 traits broadly represent the life history and ecology of fish in terms of their feeding, growth, survival and reproduction. These are:
Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.
dim(env)
## [1] 169 7
names(env)
## [1] "Depth" "SBT" "SBS" "Chl" "SBT_sea" "Chl_sea" "Fishing"
The env
table contains 7 environmental variables (in column) characterizing 169 grid cells (in rows). The environmental variables measure hydrography, habitat, food availability and anthropogenic pressures, which are known to affect the distribution of fish species. These are:
The RLQ analysis is performed by four successive steps:
Correspondence analysis (COA) is a multivariate method suited to frequency data, such as count, presence-absence, or abundance data. It looks for ‘correspondence’, i.e. highlight which sites do the species prefer, in other words, which sites correspond to my species. Compare to PCA which maximize the variance explained, COA maximize the inertia explained, i.e. the ‘correspondence’ between the rows and columns of the table. For more explanation about COA, I recommend the short explanation in GustaMe or the book Numerical Ecology by Legendre brothers.
coa.abu <- dudi.coa(abu, scannf = FALSE, nf=2)
The objective is not to look in details at each step, so we don’t interpret the results of COA here.
The second step it to analyze the trait matrix with a multivariate analysis. The choice of multivariate method (in the second and third step) is strongly linked with the type of data. It can be:
dudi.pca()
.dudi.acm()
.dudi.hillsmith()
.In all cases, it is necessary to specify the row weights (corresponding to the row or columns weights of the previous COA) with the parameter row.w
.
In our case all traits are continuous, so we use a PCA with the row weights corresponding to the column weight of the previous COA (row.w = coa.abu$cw
).
pca.trait <- dudi.pca(trait, scannf = FALSE,
row.w = coa.abu$cw)
Similarly here, with our seven continuous environmental variable, we chose to use a PCA with the row weights corresponding to the row weight of the previous COA (row.w = coa.abu$lw
).
pca.env <- dudi.pca(env, scannf = FALSE,
row.w = coa.abu$lw)
To compute a RLQ analysis, we use the rlq()
function with the results of the three previous steps, with R the environment, L the abundance, and Q the traits. RLQ is based on co-inertia analysis, which is a unconstrained symmetrical analysis that look for a compromise between the axes of three unconstrained analyses. RLQ combines the three separate analyses and aims at identifying the main relationships between environmental gradients and trait syndromes mediated by species abundances. For more mathematical definition, see Dray et al. 2003 and Dray et al. 2014
rlqF <- rlq(pca.env, coa.abu, pca.trait,
scannf = FALSE)
summary(rlqF)
## RLQ analysis
##
## Class: rlq dudi
## Call: rlq(dudiR = pca.env, dudiL = coa.abu, dudiQ = pca.trait, scannf = FALSE)
##
## Total inertia: 0.5998
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 0.4764397 0.0725046 0.0472571 0.0027566 0.0005925
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 79.43448 12.08833 7.87894 0.45959 0.09878
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 79.43 91.52 99.40 99.86 99.96
##
## (Only 5 dimensions (out of 7) are shown)
##
##
## Eigenvalues decomposition:
## eig covar sdR sdQ corr
## 1 0.47643973 0.6902461 1.622070 1.710762 0.2487396
## 2 0.07250456 0.2692667 1.050307 1.156022 0.2217686
##
## Inertia & coinertia R (pca.env):
## inertia max ratio
## 1 2.631110 3.095306 0.8500321
## 12 3.734255 4.535678 0.8233069
##
## Inertia & coinertia Q (pca.trait):
## inertia max ratio
## 1 2.926706 3.173574 0.9222113
## 12 4.263094 4.824000 0.8837259
##
## Correlation L (coa.abu):
## corr max ratio
## 1 0.2487396 0.8494256 0.2928327
## 2 0.2217686 0.6599228 0.3360523
The projected inertia is the amount of co-inertia explained by the the ordination provided by RLQ analysis assigns scores to species, samples, traits, and environmental variables along orthogonal axes and yields graphical summary of the main structures.
The first axis of co-inertia (RLQ1) explain 79% of covariance. Hence, we will only interpret the scores of the first axis (but similar visualization could be carried out for subsequent axis).
#Plot traits score
t1 <- order(rlqF$c1[,1])
dotchart(rlqF$c1[t1,1], pch=16,
labels = names(trait)[t1])
abline(v=0, lty=2)
RLQ1 show the fast-slow continuum in species traits. With positive score (right side) for traits associated to slow life history (high age at maturity, large offspring, large size and high life expectancy) and negative score for traits associated to fast life history (fast growth).
Due to the high number of species (nrow(trait)
), it is hard to visualize the score of each individual species. But we can see the species with highest and lowest score on RLQ1.
# top 10 species with positive score
rlqF$mQ[order(rlqF$mQ[,1], decreasing = TRUE)[1:10],]
## NorS1 NorS2
## Dipturus batis 3.841011 -1.4317468
## Squalus acanthias 3.263909 -0.0987961
## Galeorhinus galeus 3.091271 0.1618693
## Dipturus oxyrinchus 3.013613 -1.6887759
## Hippoglossus hippoglossus 2.806801 0.9677443
## Bathyraja spinicauda 2.670585 -0.8336734
## Amblyraja hyperborea 2.380230 -1.5471741
## Malacocephalus laevis 2.006567 -0.4382923
## Amblyraja radiata 1.970720 -1.9066521
## Chimaera monstrosa 1.962441 -2.3833948
# top 10 species with negative score
rlqF$mQ[order(rlqF$mQ[,1])[1:10],]
## NorS1 NorS2
## Gasterosteus aculeatus -4.667595 -1.92735675
## Maurolicus muelleri -2.289692 -1.72408520
## Myctophidae -1.731315 -0.92239685
## Sprattus sprattus -1.685007 -1.24161265
## Liparis liparis liparis -1.682222 -0.97866063
## Gobiidae -1.631882 -1.04425917
## Liparis bathyarcticus -1.585132 -0.82377709
## Gaidropsarus -1.528951 -0.05689407
## Engraulis -1.511147 -0.64166809
## Syngnathus acus -1.509618 -1.47410553
Among the slowest species (here with positive score) are many sharks, skates and rays (e.g. the Greenland shark Somniosus microcephalus, the common skate Dipturus batis) but also some large bony fish such as the Atlantic halibut Hippoglossus hippoglossus.
Among the fastest species (here with negative score) are small fish such as three-spined stickleback Gasterosteus aculeatus, sprat Sprattus sprattus and gobbies Gobiidae.
#Plot environment score
e1 <- order(rlqF$l1[,1])
dotchart(rlqF$l1[e1,1], pch=16,
labels = names(env)[e1])
abline(v=0, lty=2)
RLQ1 show a gradient between shallow-warm and deep-cold waters. With positive scores are environment with high depth, but low and constant temperature as well as low primary production. Sites with negative scores have high average temperature and high seasonal fluctuation of temperature, as well as high primary production and low depth.
# Choice of diverging color scale
colpal <- terrain.colors(7)[-7]
# or from RColorBrewer package
# colpal <- rev(RColorBrewer::brewer.pal(6,"RdYlBu"))
mapggplot(coo[,1], coo[,2], rlqF$lR[,1],
colpal, main="RLQ1")
This map summarize the association of trait and environment identified by RLQ analysis. Cells with green color (representing negative scores) are shallower and have higher and more fluctuating water temperature. They host species that are faster, especially with a fast growth coefficient, low age at maturity, small offspring and overall smaller and shorter lived species. On the contrary, cells with pinkish color are deeper and have constantly cold water temperature, which host a fish community with, in average, slower growth, slower maturity, larger offspring and larger and longer lived species.
Buttigieg PL, Ramette A (2014) A Guide to Statistical Analysis in Microbial Ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol Ecol. 90: 543–550.
Dray, S., Choler, P., Dolédec, S., Peres-Neto, P. R., Thuiller, W., Pavoine, S., & ter Braak, C. J. (2014). Combining the fourth‐corner and the RLQ methods for assessing trait responses to environmental variation. Ecology, 95(1), 14-21. DOI 10.1890/13-0196.1
Legendre P, Legendre L. Numerical Ecology. 2nd ed. Amsterdam: Elsevier, 1998. ISBN 978-0444892508.