This document is a work in progress ! Use only with extreme care.
This document provides an introduction to double constrained correspondence analysis (DC-CA). The tutorial targets students and scientists in ecology with previous knowledge of the R software.
This tutorial is greatly inspired from the tutorial by ter Braak, C. J., and van Rossum, B. J. (2025). Please read the original publication for more details about the double constrained correspondence analysis analysis and the douconca R-package documentation
The example dataset is available for download here ( NEAtl_FishTraitEnv.Rdata).
The DC-CA analyses require the R packages douconca.
library(douconca)
To plot maps with country border, you also need the packages
ggplot2 (v ≥ 3.3).
library(ggplot2)
If you get an error message, check that the R packages are installed
correctly. If not, use the command:
install.packages(c("douconca", "ggplot2")).
The example dataset is available as the Rdata file
NorthSea_FishTraitEnv.Rdata, available for download here.
Make sure the file NorthSea_FishTraitEnv.Rdata is in
your working directory, then load it in R.
load("NEAtl_FishTraitEnv.Rdata")
The Rdata file contains four objects:
abu containing the abundance of taxa in grid cellsenv containing the environmental condition per grid
celltrait containing the trait information per taxacoo: the coordinates of each grid cellImportantly, the rows in abu correspond to the same grid
cell than the rows in env, and the column in
abu correspond to the same taxa than the rows in
trait.
all(row.names(abu) == row.names(env))
## [1] TRUE
all(colnames(abu) == row.names(trait))
## [1] TRUE
If you want to learn how to create such dataset, see the short tutorial on setting trait-environement dataset.
Using the fish community of the Northeast Atlantic as an example, we will explore the trait-environment relationship using the DC-CA analysis.
dim(trait)
## [1] 148 7
names(trait)
## [1] "Trophic.level" "K" "Lmax" "Lifespan" "Age.maturity" "Fecundity_log" "Offspring.size_log"
The trait table contains 7 traits (i.e variable, in
column) characterizing 148 taxa (in rows). The 7 traits broadly
represent the life history and ecology of fish in terms of their
feeding, growth, survival and reproduction. These are:
Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.
dim(env)
## [1] 169 7
names(env)
## [1] "Depth" "SBT" "SBS" "Chl" "SBT_sea" "Chl_sea" "Fishing"
The env table contains 7 environmental variables (in
column) characterizing 169 grid cells (in rows). The environmental
variables measure hydrography, habitat, food availability and
anthropogenic pressures, which are known to affect the distribution of
fish species. These are:
In this first step, we use DC-CA on all traits and square root transformed abundance values.
mod_CCA <- dc_CA(
formulaEnv = ~.,
formulaTraits = ~.,
response = sqrt(abu),
env,
trait
)
## Step 1: the CCA ordination of the transposed matrix with trait constraints,
## useful in itself and also yielding CWMs of the orthonormalized traits for step 2.
##
## Call: cca0(formula = formulaTraits, response = tY, data = out0$data$dataTraits)
##
## Inertia Proportion Rank
## Total 3.1274 1.0000
## Constrained 0.3202 0.1024 7
## Unconstrained 2.8073 0.8976 147
##
## Inertia is scaled Chi-square
##
## Eigenvalues for constrained axes:
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
## 0.11388 0.09042 0.04998 0.02617 0.01842 0.01265 0.00863
##
## Eigenvalues for unconstrained axes:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8
## NA NA NA NA NA NA NA NA
## (Showing 8 of 147 unconstrained eigenvalues)
##
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
## Avg SDS VIF Regr1 tval1
## Trophic.level 3.7100 0.4113 1.6815 -0.0992 -1.2971
## K 0.2570 0.1966 1.5301 -0.1052 -1.4421
## Lmax 57.1571 41.6248 2.1722 0.0299 0.3438
## Lifespan 18.5241 12.1773 1.8984 -0.1377 -1.6945
## Age.maturity 4.7287 2.7762 2.0248 0.2553 3.0424
## Fecundity_log 10.5245 3.9287 3.3117 -0.3319 -3.0925
## Offspring.size_log 0.8439 1.1866 3.6046 -0.1997 -1.7834
##
## Step 2: the RDA ordination of CWMs of the orthonormalized traits
## of step 1 with environmental constraints:
##
## Call: rda(formula = out1$CWMs_orthonormal_traits ~ Depth + SBT + SBS + Chl + SBT_sea + Chl_sea + Fishing, data = out1$data$dataEnv)
##
## Inertia Proportion Rank
## Total 0.3202 1.0000
## Constrained 0.1813 0.5662 7
## Unconstrained 0.1389 0.4338 7
##
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7
## 0.09736 0.05189 0.02348 0.00551 0.00173 0.00079 0.00050
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## 0.04666 0.02826 0.02365 0.01505 0.01184 0.00732 0.00613
##
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
## Avg SDS VIF Regr1 tval1
## Depth 245.5576 172.4993 1.4687 0.0055 0.4888
## SBT 6.1456 4.3773 2.7029 0.3205 20.8297
## SBS 35.0341 0.8677 2.5862 -0.0176 -1.1722
## Chl 0.4256 0.3858 2.5851 0.0107 0.7107
## SBT_sea 2.6779 2.5838 3.1039 -0.0270 -1.6351
## Chl_sea 2.6074 1.4216 1.5362 0.0204 1.7609
## Fishing 0.5838 0.2651 1.1001 0.0323 3.2929
## Avg SDS VIF Regr1 tval1
## Trophic.level 3.7100 0.4113 1.6815 0.3309 1.2685
## K 0.2570 0.1966 1.5301 0.2071 0.8323
## Lmax 57.1571 41.6248 2.1722 -0.1306 -0.4405
## Lifespan 18.5241 12.1773 1.8984 0.3440 1.2411
## Age.maturity 4.7287 2.7762 2.0248 -0.9481 -3.3122
## Fecundity_log 10.5245 3.9287 3.3117 0.9000 2.4586
## Offspring.size_log 0.8439 1.1866 3.6046 0.5796 1.5175
##
## weighted variance
## total 3.127
## traits_explain 0.320
## env_explain 1.359
## constraintsTE 0.181
## attr(,"meaning")
## meaning
## total "total inertia (= weighted variation)"
## traits_explain "trait-constrained variation"
## env_explain "environment-constrained variation"
## constraintsTE "trait-constrained variation explained by the predictors in formulaEnv"
anova(mod_CCA, by = "axis")
## $species
## Species-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = sqrt(abu), dataEnv = env, dataTraits = trait)
## Residualized predictor permutation
##
## df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.09736 0.071668 11.5784 0.093 .
## dcCA2 1 0.05189 0.038193 6.1703 0.493
## dcCA3 1 0.02348 0.017284 2.7923 0.912
## dcCA4 1 0.00551 0.004056 0.6553 1.000
## dcCA5 1 0.00173 0.001271 0.2053 1.000
## dcCA6 1 0.00079 0.000581 0.0939 1.000
## dcCA7 1 0.00050 0.000371 0.0599 1.000
## Residual 140 1.17729
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $sites
## Df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.097365 0.304116 112.8621 0.001 ***
## dcCA2 1 0.051887 0.162068 60.5193 0.001 ***
## dcCA3 1 0.023481 0.073342 27.5564 0.001 ***
## dcCA4 1 0.005510 0.017211 6.5062 0.003 **
## dcCA5 1 0.001727 0.005393 2.0511 0.384
## dcCA6 1 0.000790 0.002467 0.9442 0.765
## dcCA7 1 0.000504 0.001574 0.6059 0.765
## Residual 161 0.138893
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $maxP
## Max test combining the community- and species- level tests
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = sqrt(abu), dataEnv = env, dataTraits = trait)
##
## Taken from the species-level test:
## Residualized predictor permutation
## Permutation: free
## Number of permutations: 999
##
## df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.09736 0.071668 11.5784 0.093 .
## dcCA2 1 0.05189 0.038193 6.1703 0.493
## dcCA3 1 0.02348 0.017284 2.7923 0.912
## dcCA4 1 0.00551 0.004056 0.6553 1.000
## dcCA5 1 0.00173 0.001271 0.2053 1.000
## dcCA6 1 0.00079 0.000581 0.0939 1.000
## dcCA7 1 0.00050 0.000371 0.0599 1.000
## Residual 140 1.17729
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The axis are not significant, so we try to transform the abundance with square root transformation and Hill-N2.
# N2 transformation
abuN2 <- ipf2N2(sqrt(abu), N2N_N2_species = TRUE)
mod_CCA_N2 <- dc_CA(
formulaEnv = ~.,
formulaTraits = ~.,
response = abuN2,
env,
trait
)
## Argument divideBySiteTotals set to FALSE, as species totals are proportional to N2(N-N2).
## You can overrule this by specifying divideBySiteTotals explicitly.
## Step 1: the CCA ordination of the transposed matrix with trait constraints,
## useful in itself and also yielding CWMs of the orthonormalized traits for step 2.
##
## Call: cca0(formula = formulaTraits, response = tY, data = out0$data$dataTraits)
##
## Inertia Proportion Rank
## Total 3.9330 1.0000
## Constrained 0.3957 0.1006 7
## Unconstrained 3.5374 0.8994 147
##
## Inertia is scaled Chi-square
##
## Eigenvalues for constrained axes:
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7
## 0.16336 0.10217 0.04527 0.03456 0.02320 0.01767 0.00943
##
## Eigenvalues for unconstrained axes:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8
## NA NA NA NA NA NA NA NA
## (Showing 8 of 147 unconstrained eigenvalues)
##
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
## Avg SDS VIF Regr1 tval1
## Trophic.level 3.6775 0.4162 1.5108 -0.0528 -0.7254
## K 0.2761 0.2273 1.3809 -0.0994 -1.4267
## Lmax 58.1653 49.2647 2.3277 0.0393 0.4345
## Lifespan 17.0936 12.8535 2.0269 -0.1269 -1.5044
## Age.maturity 4.3760 2.8810 2.0242 0.2160 2.5621
## Fecundity_log 9.6063 3.9939 3.3568 -0.5403 -4.9756
## Offspring.size_log 0.9873 1.4794 4.5020 -0.4033 -3.2074
##
## Step 2: the RDA ordination of CWMs of the orthonormalized traits
## of step 1 with environmental constraints:
##
## Call: wrda(formula = formulaEnv, response = out1$CWMs_orthonormal_traits, data = out1$data$dataEnv, weights = out1$weights$rows)
##
## Inertia Proportion Rank
## Total 0.3957 1.0000
## Constrained 0.2049 0.5179 7
## Unconstrained 0.1907 0.4821 7
##
## Inertia is weighted variance
##
## Eigenvalues for constrained axes:
## wRDA1 wRDA2 wRDA3 wRDA4 wRDA5 wRDA6 wRDA7
## 0.13122 0.06142 0.00644 0.00359 0.00119 0.00081 0.00024
##
## Eigenvalues for unconstrained axes:
## wPCA1 wPCA2 wPCA3 wPCA4 wPCA5 wPCA6 wPCA7
## NA NA NA NA NA NA NA
##
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
## Avg SDS VIF Regr1 tval1
## Depth 218.1632 149.1385 1.5944 0.0219 1.2831
## SBT 6.8961 4.0517 2.6386 0.3690 16.8119
## SBS 35.0051 0.7140 2.6002 -0.0435 -1.9977
## Chl 0.4819 0.3678 2.6502 0.0062 0.2805
## SBT_sea 2.8974 2.5909 3.4150 -0.0076 -0.3045
## Chl_sea 2.6674 1.2925 1.5924 0.0095 0.5546
## Fishing 0.6110 0.2448 1.0565 0.0847 6.0992
##
## Avg SDS VIF Regr1 tval1
## Trophic.level 3.6775 0.4162 1.5108 0.1052 0.5025
## K 0.2761 0.2273 1.3809 0.1669 0.8337
## Lmax 58.1653 49.2647 2.3277 -0.1091 -0.4199
## Lifespan 17.0936 12.8535 2.0269 0.3253 1.3416
## Age.maturity 4.3760 2.8810 2.0242 -0.7291 -3.0087
## Fecundity_log 9.6063 3.9939 3.3568 1.3057 4.1839
## Offspring.size_log 0.9873 1.4794 4.5020 0.9717 2.6886
##
## weighted variance
## total 3.933
## traits_explain 0.396
## env_explain 1.373
## constraintsTE 0.205
## attr(,"meaning")
## meaning
## total "total inertia (= weighted variation)"
## traits_explain "trait-constrained variation"
## env_explain "environment-constrained variation"
## constraintsTE "trait-constrained variation explained by the predictors in formulaEnv"
anova(mod_CCA_N2, by = "axis")
## $species
## Species-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait)
## Residualized predictor permutation
##
## df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.13122 0.095560 15.7249 0.003 **
## dcCA2 1 0.06142 0.044727 7.3600 0.031 *
## dcCA3 1 0.00644 0.004687 0.7713 0.999
## dcCA4 1 0.00359 0.002615 0.4303 1.000
## dcCA5 1 0.00119 0.000868 0.1428 1.000
## dcCA6 1 0.00081 0.000592 0.0974 1.000
## dcCA7 1 0.00024 0.000172 0.0284 1.000
## Residual 140 1.16830
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $sites
## sites-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait)
## Residualized predictor permutation
##
## df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.131224 0.33166 110.7606 0.001 ***
## dcCA2 1 0.061419 0.15523 51.8413 0.001 ***
## dcCA3 1 0.006437 0.01627 5.4329 0.008 **
## dcCA4 1 0.003591 0.00908 3.0312 0.170
## dcCA5 1 0.001192 0.00301 1.0059 0.920
## dcCA6 1 0.000813 0.00205 0.6862 0.920
## dcCA7 1 0.000237 0.00060 0.1998 0.981
## Residual 161 0.190745
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $maxP
## Max test combining the community- and species- level tests
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait)
##
## Taken from the species-level test:
## Residualized predictor permutation
## Permutation: free
## Number of permutations: 999
##
## df ChiSquare R2 F Pr(>F)
## dcCA1 1 0.13122 0.095560 15.7249 0.003 **
## dcCA2 1 0.06142 0.044727 7.3600 0.031 *
## dcCA3 1 0.00644 0.004687 0.7713 0.999
## dcCA4 1 0.00359 0.002615 0.4303 1.000
## dcCA5 1 0.00119 0.000868 0.1428 1.000
## dcCA6 1 0.00081 0.000592 0.0974 1.000
## dcCA7 1 0.00024 0.000172 0.0284 1.000
## Residual 140 1.16830
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(mod_CCA_N2)
As with RLQ analysis, the DCCA provide scores for species, sites, traits and environmental variables. Nothing is significant.
The eigen values are stored as eigenvalues which are
also the fourth corner correlation.
# dc-CA eigen values
rbind(
"raw" = mod_CCA$eigenvalues,
"N2" = mod_CCA_N2$eigenvalues
)
## dcCA1 dcCA2 dcCA3 dcCA4 dcCA5 dcCA6 dcCA7
## raw 0.09736478 0.05188702 0.023480929 0.005510177 0.001726588 0.0007899790 0.0005039198
## N2 0.13122404 0.06141918 0.006436646 0.003591181 0.001191694 0.0008129612 0.0002367682
# Explained variance (cum)
round(cumsum(mod_CCA_N2$eigenvalues) * 100, 1)
## dcCA1 dcCA2 dcCA3 dcCA4 dcCA5 dcCA6 dcCA7
## 13.1 19.3 19.9 20.3 20.4 20.5 20.5
# Explained fitted variation (cum.)
round(cumsum(mod_CCA_N2$eigenvalues) / sum(mod_CCA_N2$eigenvalues) * 100, 1)
## dcCA1 dcCA2 dcCA3 dcCA4 dcCA5 dcCA6 dcCA7
## 64.0 94.0 97.2 98.9 99.5 99.9 100.0
The first axis explains 64% of covariance. In the next step, we will interpret the scores of the two first axis (but similar visualization could be carried out for subsequent axis).
#Plot traits score
t1 <- mod_CCA_N2$c_traits_normed[, "Regr1"]
t0 <- mod_CCA$c_traits_normed[, "Regr1"]
dotchart(
t1[order(t1)],
pch = 16,
labels = names(t1)[order(t1)],
main = "PC1",
xlim = range(t0, t1)
)
points(t0[order(t1)], 1:length(t0), pch = 16, col = "red")
abline(v = 0, lty = 2)
legend("bottomright", c("N2", "raw"), col = c("black", "red"), pch = 16)
Due to the high number of species (nrow(trait)), it is
hard to visualize the score of each individual species. But we can see
the species with highest and lowest score on PC1.
# top 10 species with positive score
t1 <- mod_CCA_N2$species_axes$species_scores$lc_traits_scores[, 1]
t1[order(t1, decreasing = TRUE)[1:10]]
## Scophthalmus rhombus Sebastes Zeugopterus punctatus Phrynorhombus norvegicus Pollachius pollachius Scophthalmus maximus Melanogrammus aeglefinus
## 2.032850 1.864277 1.826683 1.772409 1.681289 1.664316 1.640513
## Lophius piscatorius Galeorhinus galeus Gadiculus argenteus
## 1.540725 1.488182 1.388923
# top 10 species with negative score
t1[order(t1)[1:10]]
## Coelorinchus caelorhincus Macrourus berglax Malacocephalus laevis Artediellus atlanticus Amblyraja radiata Anisarchus medius Lycenchelys kolthoffi
## -2.502391 -2.077637 -1.954231 -1.924759 -1.919262 -1.836804 -1.764259
## Lycenchelys muraena Lumpenus lampretaeformis Gymnelus retrodorsalis
## -1.725072 -1.677110 -1.608773
#Plot environment score
e1 <- mod_CCA_N2$c_env_normed[, "Regr1"]
e0 <- mod_CCA$c_env_normed[, "Regr1"]
dotchart(
e1[order(e1)],
pch = 16,
labels = names(e1)[order(e1)],
main = "PC1",
xlim = range(e0, e1)
)
points(e0[order(e1)], 1:length(e0), pch = 16, col = "red")
abline(v = 0, lty = 2)
legend("bottomright", c("N2", "raw"), col = c("black", "red"), pch = 16)
# Choice of diverging color scale
colpal <- terrain.colors(7)[-7]
mapggplot(
coo[, 1],
coo[, 2],
mod_CCA_N2$site_axes$site_scores$lc_env_scores[, 1],
colpal,
main = "PC1"
)
In fact PC2 also explain a large proportion of the variance too (30%). So let’s have a quick look at its interpretation.
par(mfrow = c(1, 2))
t2 <- mod_CCA_N2$c_traits_normed[, "Regr2"]
dotchart(t2[order(t2)], pch = 16, labels = names(t2)[order(t2)], main = "PC2")
abline(v = 0, lty = 2)
e2 <- mod_CCA_N2$c_env_normed[, "Regr2"]
dotchart(e2[order(e2)], pch = 16, labels = names(e2)[order(e2)], main = "PC2")
abline(v = 0, lty = 2)
mapggplot(
coo[, 1],
coo[, 2],
mod_CCA_N2$site_axes$site_scores$lc_env_scores[, 2],
colpal,
main = "PC2"
)
ter Braak, C. J., & van Rossum, B. J. (2025). Linking multivariate trait variation to the environment: the advantages of double constrained correspondence analysis with the R package douconca. Ecological Informatics, 88, 103143.
ter Braak, C.J.F., van Rossum, B.-J., 2024. R Package Douconca: Double Constrained Correspondence Analysis for Multi-trait Multi-environment Analysis v1.2.2.