This document is a work in progress ! Use only with extreme care.

This document provides an introduction to double constrained correspondence analysis (DC-CA). The tutorial targets students and scientists in ecology with previous knowledge of the R software.

This tutorial is greatly inspired from the tutorial by ter Braak, C. J., and van Rossum, B. J. (2025). Please read the original publication for more details about the double constrained correspondence analysis analysis and the douconca R-package documentation

The example dataset is available for download here ( NEAtl_FishTraitEnv.Rdata).

0. Preliminaries

Load packages and dataset

The DC-CA analyses require the R packages douconca.

library(douconca)

To plot maps with country border, you also need the packages ggplot2 (v ≥ 3.3).

library(ggplot2)

If you get an error message, check that the R packages are installed correctly. If not, use the command: install.packages(c("douconca", "ggplot2")).

The example dataset is available as the Rdata file NorthSea_FishTraitEnv.Rdata, available for download here.

Load the example dataset

Make sure the file NorthSea_FishTraitEnv.Rdata is in your working directory, then load it in R.

load("NEAtl_FishTraitEnv.Rdata")

The Rdata file contains four objects:

  • abu containing the abundance of taxa in grid cells
  • env containing the environmental condition per grid cell
  • trait containing the trait information per taxa
  • coo: the coordinates of each grid cell

Importantly, the rows in abu correspond to the same grid cell than the rows in env, and the column in abu correspond to the same taxa than the rows in trait.

all(row.names(abu) == row.names(env))
## [1] TRUE
all(colnames(abu) == row.names(trait))
## [1] TRUE

If you want to learn how to create such dataset, see the short tutorial on setting trait-environement dataset.

Using the fish community of the Northeast Atlantic as an example, we will explore the trait-environment relationship using the DC-CA analysis.

Quick summary of the variables

dim(trait)
## [1] 148   7
names(trait)
## [1] "Trophic.level"      "K"                  "Lmax"               "Lifespan"           "Age.maturity"       "Fecundity_log"      "Offspring.size_log"

The trait table contains 7 traits (i.e variable, in column) characterizing 148 taxa (in rows). The 7 traits broadly represent the life history and ecology of fish in terms of their feeding, growth, survival and reproduction. These are:

  • Trophic level
  • K: the growth rate, calculated as Von Bertalanffy growth coefficient in year\(^{-1}\)
  • Lmax: maximum body length in cm
  • Lifespan
  • Offspring.size_log: egg diameter, length of egg case or length of pup in mm
  • Fecundity_log: number of offspring produced by a female per year
  • Age.maturity: in years

Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.

dim(env)
## [1] 169   7
names(env)
## [1] "Depth"   "SBT"     "SBS"     "Chl"     "SBT_sea" "Chl_sea" "Fishing"

The env table contains 7 environmental variables (in column) characterizing 169 grid cells (in rows). The environmental variables measure hydrography, habitat, food availability and anthropogenic pressures, which are known to affect the distribution of fish species. These are:

  • Depth: depth in meter, directly measured during the survey.
  • SBT: monthly sea bottom temperature in °C from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • SBS: monthly sea bottom salinity from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • Chl: Chlorophyll a concentration (in \(mg.m^{-3}\)) as a proxy for primary production and food availability from the GlobColour database
  • SBT_sea: seasonality of sea bottom temperature, calculated as the difference between the warmest and the coldest month of the year.
  • Chl_sea: seasonality of chlorophyll a concentration, calculated as the difference between the highest and the lowest primary production in the year
  • Fishing: the cumulative demersal fishing pressure in 2013, estimated globally by Halpern et al. 2015, DOI 10.1038/ncomms8615.

1. Run DC-CA

In this first step, we use DC-CA on all traits and square root transformed abundance values.

mod_CCA <- dc_CA(
  formulaEnv = ~.,
  formulaTraits = ~.,
  response = sqrt(abu),
  env,
  trait
)
## Step 1: the CCA ordination of the transposed matrix with trait constraints,
##         useful in itself and also yielding CWMs of the orthonormalized traits for step 2.
## 
## Call: cca0(formula = formulaTraits, response = tY, data = out0$data$dataTraits)
## 
##               Inertia Proportion Rank
## Total          3.1274     1.0000     
## Constrained    0.3202     0.1024    7
## Unconstrained  2.8073     0.8976  147
## 
## Inertia is scaled Chi-square
## 
## Eigenvalues for constrained axes:
##    CCA1    CCA2    CCA3    CCA4    CCA5    CCA6    CCA7 
## 0.11388 0.09042 0.04998 0.02617 0.01842 0.01265 0.00863 
## 
## Eigenvalues for unconstrained axes:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 
##  NA  NA  NA  NA  NA  NA  NA  NA 
## (Showing 8 of 147 unconstrained eigenvalues)
## 
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
##                        Avg     SDS    VIF   Regr1   tval1
## Trophic.level       3.7100  0.4113 1.6815 -0.0992 -1.2971
## K                   0.2570  0.1966 1.5301 -0.1052 -1.4421
## Lmax               57.1571 41.6248 2.1722  0.0299  0.3438
## Lifespan           18.5241 12.1773 1.8984 -0.1377 -1.6945
## Age.maturity        4.7287  2.7762 2.0248  0.2553  3.0424
## Fecundity_log      10.5245  3.9287 3.3117 -0.3319 -3.0925
## Offspring.size_log  0.8439  1.1866 3.6046 -0.1997 -1.7834
## 
## Step 2: the RDA ordination of CWMs of the orthonormalized traits 
##         of step 1 with environmental constraints:
## 
## Call: rda(formula = out1$CWMs_orthonormal_traits ~ Depth + SBT + SBS + Chl + SBT_sea + Chl_sea + Fishing, data = out1$data$dataEnv)
## 
##               Inertia Proportion Rank
## Total          0.3202     1.0000     
## Constrained    0.1813     0.5662    7
## Unconstrained  0.1389     0.4338    7
## 
## Inertia is variance
## 
## Eigenvalues for constrained axes:
##    RDA1    RDA2    RDA3    RDA4    RDA5    RDA6    RDA7 
## 0.09736 0.05189 0.02348 0.00551 0.00173 0.00079 0.00050 
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5     PC6     PC7 
## 0.04666 0.02826 0.02365 0.01505 0.01184 0.00732 0.00613 
## 
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
##              Avg      SDS    VIF   Regr1   tval1
## Depth   245.5576 172.4993 1.4687  0.0055  0.4888
## SBT       6.1456   4.3773 2.7029  0.3205 20.8297
## SBS      35.0341   0.8677 2.5862 -0.0176 -1.1722
## Chl       0.4256   0.3858 2.5851  0.0107  0.7107
## SBT_sea   2.6779   2.5838 3.1039 -0.0270 -1.6351
## Chl_sea   2.6074   1.4216 1.5362  0.0204  1.7609
## Fishing   0.5838   0.2651 1.1001  0.0323  3.2929
##                        Avg     SDS    VIF   Regr1   tval1
## Trophic.level       3.7100  0.4113 1.6815  0.3309  1.2685
## K                   0.2570  0.1966 1.5301  0.2071  0.8323
## Lmax               57.1571 41.6248 2.1722 -0.1306 -0.4405
## Lifespan           18.5241 12.1773 1.8984  0.3440  1.2411
## Age.maturity        4.7287  2.7762 2.0248 -0.9481 -3.3122
## Fecundity_log      10.5245  3.9287 3.3117  0.9000  2.4586
## Offspring.size_log  0.8439  1.1866 3.6046  0.5796  1.5175
## 
##                weighted variance
## total                      3.127
## traits_explain             0.320
## env_explain                1.359
## constraintsTE              0.181
## attr(,"meaning")
##                meaning                                                                
## total          "total inertia (= weighted variation)"                                 
## traits_explain "trait-constrained variation"                                          
## env_explain    "environment-constrained variation"                                    
## constraintsTE  "trait-constrained variation explained by the predictors in formulaEnv"
anova(mod_CCA, by = "axis")
## $species
## Species-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = sqrt(abu), dataEnv = env, dataTraits = trait) 
## Residualized predictor permutation
## 
##           df ChiSquare       R2       F Pr(>F)  
## dcCA1      1   0.09736 0.071668 11.5784  0.093 .
## dcCA2      1   0.05189 0.038193  6.1703  0.493  
## dcCA3      1   0.02348 0.017284  2.7923  0.912  
## dcCA4      1   0.00551 0.004056  0.6553  1.000  
## dcCA5      1   0.00173 0.001271  0.2053  1.000  
## dcCA6      1   0.00079 0.000581  0.0939  1.000  
## dcCA7      1   0.00050 0.000371  0.0599  1.000  
## Residual 140   1.17729                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $sites
##           Df ChiSquare       R2        F Pr(>F)    
## dcCA1      1  0.097365 0.304116 112.8621  0.001 ***
## dcCA2      1  0.051887 0.162068  60.5193  0.001 ***
## dcCA3      1  0.023481 0.073342  27.5564  0.001 ***
## dcCA4      1  0.005510 0.017211   6.5062  0.003 ** 
## dcCA5      1  0.001727 0.005393   2.0511  0.384    
## dcCA6      1  0.000790 0.002467   0.9442  0.765    
## dcCA7      1  0.000504 0.001574   0.6059  0.765    
## Residual 161  0.138893                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $maxP
## Max test combining the community- and species- level tests 
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = sqrt(abu), dataEnv = env, dataTraits = trait) 
## 
## Taken from the species-level test:
## Residualized predictor permutation
## Permutation: free
## Number of permutations: 999
## 
##           df ChiSquare       R2       F Pr(>F)  
## dcCA1      1   0.09736 0.071668 11.5784  0.093 .
## dcCA2      1   0.05189 0.038193  6.1703  0.493  
## dcCA3      1   0.02348 0.017284  2.7923  0.912  
## dcCA4      1   0.00551 0.004056  0.6553  1.000  
## dcCA5      1   0.00173 0.001271  0.2053  1.000  
## dcCA6      1   0.00079 0.000581  0.0939  1.000  
## dcCA7      1   0.00050 0.000371  0.0599  1.000  
## Residual 140   1.17729                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The axis are not significant, so we try to transform the abundance with square root transformation and Hill-N2.

# N2 transformation
abuN2 <- ipf2N2(sqrt(abu), N2N_N2_species = TRUE)
mod_CCA_N2 <- dc_CA(
  formulaEnv = ~.,
  formulaTraits = ~.,
  response = abuN2,
  env,
  trait
)
## Argument divideBySiteTotals set to FALSE, as species totals are proportional to N2(N-N2).
## You can overrule this by specifying divideBySiteTotals explicitly.
## Step 1: the CCA ordination of the transposed matrix with trait constraints,
##         useful in itself and also yielding CWMs of the orthonormalized traits for step 2.
## 
## Call: cca0(formula = formulaTraits, response = tY, data = out0$data$dataTraits)
## 
##               Inertia Proportion Rank
## Total          3.9330     1.0000     
## Constrained    0.3957     0.1006    7
## Unconstrained  3.5374     0.8994  147
## 
## Inertia is scaled Chi-square
## 
## Eigenvalues for constrained axes:
##    CCA1    CCA2    CCA3    CCA4    CCA5    CCA6    CCA7 
## 0.16336 0.10217 0.04527 0.03456 0.02320 0.01767 0.00943 
## 
## Eigenvalues for unconstrained axes:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 
##  NA  NA  NA  NA  NA  NA  NA  NA 
## (Showing 8 of 147 unconstrained eigenvalues)
## 
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
##                        Avg     SDS    VIF   Regr1   tval1
## Trophic.level       3.6775  0.4162 1.5108 -0.0528 -0.7254
## K                   0.2761  0.2273 1.3809 -0.0994 -1.4267
## Lmax               58.1653 49.2647 2.3277  0.0393  0.4345
## Lifespan           17.0936 12.8535 2.0269 -0.1269 -1.5044
## Age.maturity        4.3760  2.8810 2.0242  0.2160  2.5621
## Fecundity_log       9.6063  3.9939 3.3568 -0.5403 -4.9756
## Offspring.size_log  0.9873  1.4794 4.5020 -0.4033 -3.2074
## 
## Step 2: the RDA ordination of CWMs of the orthonormalized traits 
##         of step 1 with environmental constraints:
## 
## Call: wrda(formula = formulaEnv, response = out1$CWMs_orthonormal_traits, data = out1$data$dataEnv, weights = out1$weights$rows)
## 
##               Inertia Proportion Rank
## Total          0.3957     1.0000     
## Constrained    0.2049     0.5179    7
## Unconstrained  0.1907     0.4821    7
## 
## Inertia is weighted variance
## 
## Eigenvalues for constrained axes:
##   wRDA1   wRDA2   wRDA3   wRDA4   wRDA5   wRDA6   wRDA7 
## 0.13122 0.06142 0.00644 0.00359 0.00119 0.00081 0.00024 
## 
## Eigenvalues for unconstrained axes:
## wPCA1 wPCA2 wPCA3 wPCA4 wPCA5 wPCA6 wPCA7 
##    NA    NA    NA    NA    NA    NA    NA 
## 
## mean, sd, VIF and canonical coefficients with their optimistic [!] t-values:
##              Avg      SDS    VIF   Regr1   tval1
## Depth   218.1632 149.1385 1.5944  0.0219  1.2831
## SBT       6.8961   4.0517 2.6386  0.3690 16.8119
## SBS      35.0051   0.7140 2.6002 -0.0435 -1.9977
## Chl       0.4819   0.3678 2.6502  0.0062  0.2805
## SBT_sea   2.8974   2.5909 3.4150 -0.0076 -0.3045
## Chl_sea   2.6674   1.2925 1.5924  0.0095  0.5546
## Fishing   0.6110   0.2448 1.0565  0.0847  6.0992
## 
##                        Avg     SDS    VIF   Regr1   tval1
## Trophic.level       3.6775  0.4162 1.5108  0.1052  0.5025
## K                   0.2761  0.2273 1.3809  0.1669  0.8337
## Lmax               58.1653 49.2647 2.3277 -0.1091 -0.4199
## Lifespan           17.0936 12.8535 2.0269  0.3253  1.3416
## Age.maturity        4.3760  2.8810 2.0242 -0.7291 -3.0087
## Fecundity_log       9.6063  3.9939 3.3568  1.3057  4.1839
## Offspring.size_log  0.9873  1.4794 4.5020  0.9717  2.6886
## 
##                weighted variance
## total                      3.933
## traits_explain             0.396
## env_explain                1.373
## constraintsTE              0.205
## attr(,"meaning")
##                meaning                                                                
## total          "total inertia (= weighted variation)"                                 
## traits_explain "trait-constrained variation"                                          
## env_explain    "environment-constrained variation"                                    
## constraintsTE  "trait-constrained variation explained by the predictors in formulaEnv"
anova(mod_CCA_N2, by = "axis")
## $species
## Species-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait) 
## Residualized predictor permutation
## 
##           df ChiSquare       R2       F Pr(>F)   
## dcCA1      1   0.13122 0.095560 15.7249  0.003 **
## dcCA2      1   0.06142 0.044727  7.3600  0.031 * 
## dcCA3      1   0.00644 0.004687  0.7713  0.999   
## dcCA4      1   0.00359 0.002615  0.4303  1.000   
## dcCA5      1   0.00119 0.000868  0.1428  1.000   
## dcCA6      1   0.00081 0.000592  0.0974  1.000   
## dcCA7      1   0.00024 0.000172  0.0284  1.000   
## Residual 140   1.16830                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $sites
## sites-level permutation test using dc-CA
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait) 
## Residualized predictor permutation
## 
##           df ChiSquare      R2        F Pr(>F)    
## dcCA1      1  0.131224 0.33166 110.7606  0.001 ***
## dcCA2      1  0.061419 0.15523  51.8413  0.001 ***
## dcCA3      1  0.006437 0.01627   5.4329  0.008 ** 
## dcCA4      1  0.003591 0.00908   3.0312  0.170    
## dcCA5      1  0.001192 0.00301   1.0059  0.920    
## dcCA6      1  0.000813 0.00205   0.6862  0.920    
## dcCA7      1  0.000237 0.00060   0.1998  0.981    
## Residual 161  0.190745                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $maxP
## Max test combining the community- and species- level tests 
## Model: dc_CA(formulaEnv = ~., formulaTraits = ~., response = abuN2, dataEnv = env, dataTraits = trait) 
## 
## Taken from the species-level test:
## Residualized predictor permutation
## Permutation: free
## Number of permutations: 999
## 
##           df ChiSquare       R2       F Pr(>F)   
## dcCA1      1   0.13122 0.095560 15.7249  0.003 **
## dcCA2      1   0.06142 0.044727  7.3600  0.031 * 
## dcCA3      1   0.00644 0.004687  0.7713  0.999   
## dcCA4      1   0.00359 0.002615  0.4303  1.000   
## dcCA5      1   0.00119 0.000868  0.1428  1.000   
## dcCA6      1   0.00081 0.000592  0.0974  1.000   
## dcCA7      1   0.00024 0.000172  0.0284  1.000   
## Residual 140   1.16830                           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(mod_CCA_N2)

As with RLQ analysis, the DCCA provide scores for species, sites, traits and environmental variables. Nothing is significant.

The eigen values are stored as eigenvalues which are also the fourth corner correlation.

# dc-CA eigen values
rbind(
  "raw" = mod_CCA$eigenvalues,
  "N2" = mod_CCA_N2$eigenvalues
)
##          dcCA1      dcCA2       dcCA3       dcCA4       dcCA5        dcCA6        dcCA7
## raw 0.09736478 0.05188702 0.023480929 0.005510177 0.001726588 0.0007899790 0.0005039198
## N2  0.13122404 0.06141918 0.006436646 0.003591181 0.001191694 0.0008129612 0.0002367682
# Explained variance (cum)
round(cumsum(mod_CCA_N2$eigenvalues) * 100, 1)
## dcCA1 dcCA2 dcCA3 dcCA4 dcCA5 dcCA6 dcCA7 
##  13.1  19.3  19.9  20.3  20.4  20.5  20.5
# Explained fitted variation (cum.)
round(cumsum(mod_CCA_N2$eigenvalues) / sum(mod_CCA_N2$eigenvalues) * 100, 1)
## dcCA1 dcCA2 dcCA3 dcCA4 dcCA5 dcCA6 dcCA7 
##  64.0  94.0  97.2  98.9  99.5  99.9 100.0

The first axis explains 64% of covariance. In the next step, we will interpret the scores of the two first axis (but similar visualization could be carried out for subsequent axis).

1.2 Trait scores

#Plot traits score
t1 <- mod_CCA_N2$c_traits_normed[, "Regr1"]
t0 <- mod_CCA$c_traits_normed[, "Regr1"]
dotchart(
  t1[order(t1)],
  pch = 16,
  labels = names(t1)[order(t1)],
  main = "PC1",
  xlim = range(t0, t1)
)

points(t0[order(t1)], 1:length(t0), pch = 16, col = "red")
abline(v = 0, lty = 2)
legend("bottomright", c("N2", "raw"), col = c("black", "red"), pch = 16)

1.3 Species scores

Due to the high number of species (nrow(trait)), it is hard to visualize the score of each individual species. But we can see the species with highest and lowest score on PC1.

# top 10 species with positive score
t1 <- mod_CCA_N2$species_axes$species_scores$lc_traits_scores[, 1]
t1[order(t1, decreasing = TRUE)[1:10]]
##     Scophthalmus rhombus                 Sebastes    Zeugopterus punctatus Phrynorhombus norvegicus    Pollachius pollachius     Scophthalmus maximus Melanogrammus aeglefinus 
##                 2.032850                 1.864277                 1.826683                 1.772409                 1.681289                 1.664316                 1.640513 
##      Lophius piscatorius       Galeorhinus galeus      Gadiculus argenteus 
##                 1.540725                 1.488182                 1.388923
# top 10 species with negative score
t1[order(t1)[1:10]]
## Coelorinchus caelorhincus         Macrourus berglax     Malacocephalus laevis    Artediellus atlanticus         Amblyraja radiata         Anisarchus medius     Lycenchelys kolthoffi 
##                 -2.502391                 -2.077637                 -1.954231                 -1.924759                 -1.919262                 -1.836804                 -1.764259 
##       Lycenchelys muraena  Lumpenus lampretaeformis    Gymnelus retrodorsalis 
##                 -1.725072                 -1.677110                 -1.608773

1.4 Environmental scores

#Plot environment score
e1 <- mod_CCA_N2$c_env_normed[, "Regr1"]
e0 <- mod_CCA$c_env_normed[, "Regr1"]
dotchart(
  e1[order(e1)],
  pch = 16,
  labels = names(e1)[order(e1)],
  main = "PC1",
  xlim = range(e0, e1)
)

points(e0[order(e1)], 1:length(e0), pch = 16, col = "red")
abline(v = 0, lty = 2)
legend("bottomright", c("N2", "raw"), col = c("black", "red"), pch = 16)

1.5 Sites scores

# Choice of diverging color scale
colpal <- terrain.colors(7)[-7]

mapggplot(
  coo[, 1],
  coo[, 2],
  mod_CCA_N2$site_axes$site_scores$lc_env_scores[, 1],
  colpal,
  main = "PC1"
)

1.6 Second axis

In fact PC2 also explain a large proportion of the variance too (30%). So let’s have a quick look at its interpretation.

par(mfrow = c(1, 2))
t2 <- mod_CCA_N2$c_traits_normed[, "Regr2"]
dotchart(t2[order(t2)], pch = 16, labels = names(t2)[order(t2)], main = "PC2")
abline(v = 0, lty = 2)

e2 <- mod_CCA_N2$c_env_normed[, "Regr2"]
dotchart(e2[order(e2)], pch = 16, labels = names(e2)[order(e2)], main = "PC2")
abline(v = 0, lty = 2)

mapggplot(
  coo[, 1],
  coo[, 2],
  mod_CCA_N2$site_axes$site_scores$lc_env_scores[, 2],
  colpal,
  main = "PC2"
)

References

ter Braak, C. J., & van Rossum, B. J. (2025). Linking multivariate trait variation to the environment: the advantages of double constrained correspondence analysis with the R package douconca. Ecological Informatics, 88, 103143.

ter Braak, C.J.F., van Rossum, B.-J., 2024. R Package Douconca: Double Constrained Correspondence Analysis for Multi-trait Multi-environment Analysis v1.2.2.