This document provides an example data and R code for estimating Community Weighted Mean (CWM) traits and diversity indices, as well as plotting patterns and identifying key drivers and state-pressure relationships (using GAMs and random forest). It is part of a series of tutorials for studying trait-environment relationship.

1. Preliminaries

1.1. Load packages

The analyses require the R packages FD, mgcv, ggplot2, RcolorBrewer, randomForest and MuMIn.

# Clear memory
rm(list=ls())
# Load packages
library(FD) ## Calculate functional diversity indices
library(mgcv)
library(ggplot2)
library(RColorBrewer)
library(randomForest)
library(MuMIn)

If you get an error message, check that the R packages are installed correctly. If not, use the command: install.packages(c("FD", "mgcv", "ggplot2", "RColorBrewer", "randomForest", "MuMIn")).

The example dataset is available as the Rdata file NEAtl_FishTraitEnv.Rdata, available for download here.

1.2 Load and inspect data

Make sure the file NEAtl_FishTraitEnv.Rdata is in your working directory, then load it in R.

load("NEAtl_FishTraitEnv.Rdata")

The Rdata file contains four objects:

  • abu Species abundance by grid cell (the community matrix)
  • env Environmental conditions by grid cell
  • trait Traits by species
  • coo The spatial coordinates (lat/lon) for each grid cell
# Inspect data tables 
dim(abu) 
## [1] 169 148
dim(env)
## [1] 169   7
names(env)
## [1] "Depth"   "SBT"     "SBS"     "Chl"     "SBT_sea" "Chl_sea" "Fishing"
dim(trait)
## [1] 148   7
names(trait)
## [1] "Trophic.level"      "K"                  "Lmax"              
## [4] "Lifespan"           "Age.maturity"       "Fecundity_log"     
## [7] "Offspring.size_log"

The trait table contains 7 traits (i.e variable, in column) characterizing 148 taxa (in rows). The 7 traits broadly represent the life history and ecology of fish in terms of their feeding, growth, survival and reproduction. These are:

  • Trophic level
  • K: the growth rate, calculated as Von Bertalanffy growth coefficient in year\(^{-1}\)
  • Lmax: maximum body length in cm
  • Lifespan
  • Offspring.size_log: egg diameter, length of egg case or length of pup in mm
  • Fecundity_log: number of offspring produced by a female per year
  • Age.maturity: in years

Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.

The env table contains 7 environmental variables (in column) characterizing 169 grid cells (in rows). The environmental variables measure hydrography, habitat, food availability and anthropogenic pressures, which are known to affect the distribution of fish species. These are:

  • Depth: depth in meter, directly measured during the survey.
  • SBT: monthly sea bottom temperature in °C from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • SBS: monthly sea bottom salinity from the Global Ocean Physics Reanalysis (GLORYSs2v4)
  • Chl: Chlorophyll a concentration (in \(mg.m^{-3}\)) as a proxy for primary production and food availability from the GlobColour database
  • SBT_sea: seasonality of sea bottom temperature, calculated as the difference between the warmest and the coldest month of the year.
  • Chl_sea: seasonality of chlorophyll a concentration, calculated as the difference between the highest and the lowest primary production in the year
  • Fishing: the cumulative demersal fishing pressure in 2013, estimated globally by Halpern et al. 2015, DOI 10.1038/ncomms8615.

Importantly, the rows in abu correspond to the same grid cell than the rows in env, and the column in abu correspond to the same taxa than the rows in trait. If you want to learn how to create such dataset, see the short tutorial on setting trait-environement dataset.

all(row.names(abu)==row.names(env))
## [1] TRUE
all(colnames(abu)==row.names(trait))
## [1] TRUE

Using this fish community of the Northeast Atlantic as an example, you will learn how to estimate CWM traits and diversity indices, as well as plotting patterns and identifying key drivers and state-pressure relationships (using GAMs and random forest)

2. Calculate Community-weighted mean (CWM) traits and various diversity indices

2.1 Calculate diversity indices

We use the function FD() from the FD package. Please see the help ?dbFD to inspect the function and all its options. The option stand.FRic=T means that FRic will range between 0 and 1

Be aware, the computation of dbFD can take several minutes.

fishFD<-dbFD(trait, abu, stand.FRic=T)  
## FRic: Dimensionality reduction was required. The last 2 PCoA axes (out of 7 in total) were removed. 
## FRic: Quality of the reduced-space representation = 0.9339844

2.2 Inspect results

names(fishFD)
## [1] "nbsp"      "sing.sp"   "FRic"      "qual.FRic" "FEve"      "FDiv"     
## [7] "FDis"      "RaoQ"      "CWM"
  • nbsp is the Species Richness
  • FRic is the Functional Richness
  • FEve is the Functional Eveness
  • FDiv is the Functional Divergence
  • RaoQ is the Functional Entropy
  • CWM contains the Community weighted mean traits

2.3 Reformat the output

# Reformat into an output data frame
fishFD<-do.call(cbind, fishFD)

## Add lon, lat to output data frame
fishFD[,c("lon","lat")]<-coo

2.4 Map biodiversity indices

colpal<-rev(brewer.pal(11,"RdYlBu")) # Color palette for the map
#We plot the CWM of Trophic level
mapggplot(fishFD$lon, fishFD$lat, 
          fishFD$CWM.Trophic.level, colpal = colpal)

3. Statistically investigate trait-environment relationships

Here, we model with GAMS and random forest the trait responses at the community level (using CWM).

3.1 Preparation of the dataset

We want to model CWM from environmental data, so we need to merge these information in one data frame.

# Merge fishFD with env data frame
# Because the rows of fishFD match the rows of env, we can combine them
FDenv<-cbind(fishFD,env)
head(FDenv)
##      nbsp sing.sp       FRic qual.FRic      FEve      FDiv     FDis     RaoQ
## 1555   23      23 0.03318215 0.9339844 0.4844362 0.8170802 2.056787 4.557791
## 1578   49      49 0.17086472 0.9339844 0.3444103 0.7191432 2.038024 4.461121
## 1579   56      56 0.13707499 0.9339844 0.3921806 0.8019998 2.061788 4.572184
## 1602   57      57 0.22411443 0.9339844 0.2897074 0.6978616 2.081040 4.713043
## 1603   56      56 0.09093367 0.9339844 0.3275283 0.8356100 2.050836 4.552714
## 1626   55      55 0.15636872 0.9339844 0.3190296 0.7080933 2.071518 4.735231
##      CWM.Trophic.level     CWM.K CWM.Lmax CWM.Lifespan CWM.Age.maturity
## 1555          3.759600 0.1323085 77.20661     30.00660         6.602444
## 1578          3.779644 0.1851375 69.73056     28.40112         5.860068
## 1579          3.765347 0.1877436 75.54200     26.97240         6.036463
## 1602          3.815806 0.1951130 71.59213     26.36614         5.779212
## 1603          3.771634 0.1771489 77.12801     25.39299         6.305185
## 1626          3.789088 0.2164347 68.08252     24.52774         5.460321
##      CWM.Fecundity_log CWM.Offspring.size_log    lon   lat    Depth      SBT
## 1555          11.43293              1.2055065 -28.75 66.25 344.1053 3.174351
## 1578          12.57550              0.9443465 -26.25 63.75 235.9173 6.407298
## 1579          12.53985              0.9843635 -26.25 66.25 194.9718 5.241003
## 1602          12.74557              0.8876792 -23.75 63.75 211.2289 6.478890
## 1603          12.29180              1.0182321 -23.75 66.25 154.5477 3.792796
## 1626          12.74245              0.7414449 -21.25 63.75 174.4151 6.669821
##           SBS       Chl  SBT_sea  Chl_sea   Fishing
## 1555 34.98827 0.0980102 1.155758 2.001407 0.3313633
## 1578 35.18146 0.1746692 1.180325 2.457544 1.0856360
## 1579 35.12616 0.2138716 2.429845 2.769558 1.0601547
## 1602 35.18106 0.2857816 1.561733 3.132057 1.0111725
## 1603 34.98987 0.3290308 3.570157 2.763918 0.8878969
## 1626 35.17215 0.2756463 1.682268 3.165201 0.9561967

3.2 Simple exploratory GAMs

We use the function gam() from the mgcv package. Please see the documentation ?gam to inspect the function and all its options.

We define a function to quickly plot GAM partial smooth plots and return summary statistics for the fitted model

funGam<-function(Var){
  colnames(FDenv)[which(colnames(FDenv)==Var)]<-"Var"
  fit<-gam(Var~s(Depth, k=3)+s(SBT,k=3)+s(SBS,k=3)+s(Chl,k=3)+s(SBT_sea,k=3)+ s(Chl_sea, k=3)+s(Fishing, k=3), na.action=na.exclude, data=FDenv)
  par(mfrow=c(4,2),mar=c(4,4,0.2,0.2))
  plot(fit,shade=T,shade.col="grey",res=T,rug=F,pch=20,ylab="")
  par(mfrow=c(1,1),mar=c(5,4,2,2))
  return(summary(fit))
}

Now, let’s try to model trophic level from environmental data

# list all output of FD calculation
Metrics<-colnames(fishFD)
Metrics
##  [1] "nbsp"                   "sing.sp"                "FRic"                  
##  [4] "qual.FRic"              "FEve"                   "FDiv"                  
##  [7] "FDis"                   "RaoQ"                   "CWM.Trophic.level"     
## [10] "CWM.K"                  "CWM.Lmax"               "CWM.Lifespan"          
## [13] "CWM.Age.maturity"       "CWM.Fecundity_log"      "CWM.Offspring.size_log"
## [16] "lon"                    "lat"
#Plot GAM partial smooth plots and return summary statistics for the fitted model
funGam(Metrics[9])

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Var ~ s(Depth, k = 3) + s(SBT, k = 3) + s(SBS, k = 3) + s(Chl, 
##     k = 3) + s(SBT_sea, k = 3) + s(Chl_sea, k = 3) + s(Fishing, 
##     k = 3)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.740309   0.006468   578.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df      F  p-value    
## s(Depth)   1.000  1.000 76.237  < 2e-16 ***
## s(SBT)     1.000  1.000 13.650 0.000304 ***
## s(SBS)     2.000  2.000  6.258 0.002427 ** 
## s(Chl)     1.756  1.940  7.699 0.001761 ** 
## s(SBT_sea) 1.887  1.987 17.608 1.51e-07 ***
## s(Chl_sea) 1.000  1.000  0.845 0.359415    
## s(Fishing) 1.960  1.998 28.887  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.621   Deviance explained = 64.5%
## GCV = 0.0075905  Scale est. = 0.0070694  n = 169

3.3 Detailed and thorough model selection

3.3.1 Considering potential spatial autocorrelation

Here we use a mixed GAM approach taking into account potential spatial autocorrelation (by allowing for correlated error structures) We use the function uGamm() from the MuMIn package which is a wrapper function for gamm. Please see the documentation ?uGamm to inspect the function and all its options.

We will use trophic level as an example. First, we check distribution of response (below with hist()). Consider log transforming or change family statement in gam in case of strong deviations from normal distribution.

# Check distribution of response
hist(FDenv$CWM.Trophic.level) 

Then we can compute the mixed GAM considering potential spatial autocorrelation.

Be aware, the computation can take several minutes.

# Create dummy variable to include as random factor
dummy <- rep(1, dim(FDenv)[1]) 
### Dummy variables to be add as random factor (mandatory, but won't 'change' anything)

fitTL <- uGamm(CWM.Trophic.level ~ s(Depth, k=3)
                   +s(SBT,k=3)
                   +s(SBS,k=3)
                   +s(Chl,k=3)
                   +s(SBT_sea,k=3)
                   +s(Chl_sea, k=3)
                   +s(Fishing, k=3),
                   gaussian(link = "identity"),random = list(dummy=~1), correlation = corGaus(form = ~ lon+lat), # Can try other error structures
                   data = FDenv, control=lmeControl(opt="optim"))
# Inspect summary statistics
summary(fitTL$lme)
## Linear mixed-effects model fit by maximum likelihood
##   Data: strip.offset(mf) 
##         AIC       BIC   logLik
##   -369.4445 -313.1063 202.7222
## 
## Random effects:
##  Formula: ~Xr - 1 | g
##                   Xr
## StdDev: 8.225054e-05
## 
##  Formula: ~Xr.0 - 1 | g.0 %in% g
##                 Xr.0
## StdDev: 0.0006573795
## 
##  Formula: ~Xr.1 - 1 | g.1 %in% g.0 %in% g
##              Xr.1
## StdDev: 0.3050302
## 
##  Formula: ~Xr.2 - 1 | g.2 %in% g.1 %in% g.0 %in% g
##                Xr.2
## StdDev: 3.31291e-05
## 
##  Formula: ~Xr.3 - 1 | g.3 %in% g.2 %in% g.1 %in% g.0 %in% g
##                 Xr.3
## StdDev: 0.0002430148
## 
##  Formula: ~Xr.4 - 1 | g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g
##                 Xr.4
## StdDev: 0.0002030332
## 
##  Formula: ~Xr.5 - 1 | g.5 %in% g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g
##              Xr.5
## StdDev: 0.3853628
## 
##  Formula: ~1 | dummy %in% g.5 %in% g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g
##          (Intercept)   Residual
## StdDev: 6.875295e-06 0.09010437
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~lon + lat | g/g.0/g.1/g.2/g.3/g.4/g.5/dummy 
##  Parameter estimate(s):
##    range 
## 2.856249 
## Fixed effects:  y ~ X - 1 
##                    Value   Std.Error  DF   t-value p-value
## X(Intercept)    3.737548 0.012296004 161 303.96447  0.0000
## Xs(Depth)Fx1   -0.055514 0.009432417 161  -5.88545  0.0000
## Xs(SBT)Fx1     -0.057006 0.014433316 161  -3.94963  0.0001
## Xs(SBS)Fx1      0.002583 0.013556229 161   0.19052  0.8491
## Xs(Chl)Fx1     -0.006707 0.009871167 161  -0.67941  0.4979
## Xs(SBT_sea)Fx1  0.049210 0.013750423 161   3.57883  0.0005
## Xs(Chl_sea)Fx1 -0.003280 0.008450415 161  -0.38809  0.6985
## Xs(Fishing)Fx1 -0.029487 0.008172496 161  -3.60808  0.0004
##  Correlation: 
##                X(Int) X(D)F1 X(SBT) X(SBS) X(C)F1 X(SBT_ X(C_)F
## Xs(Depth)Fx1    0.082                                          
## Xs(SBT)Fx1      0.029  0.191                                   
## Xs(SBS)Fx1     -0.012  0.026 -0.485                            
## Xs(Chl)Fx1     -0.026  0.161 -0.276  0.156                     
## Xs(SBT_sea)Fx1 -0.002  0.168 -0.477  0.689  0.019              
## Xs(Chl_sea)Fx1  0.008  0.074  0.214  0.140 -0.381 -0.043       
## Xs(Fishing)Fx1 -0.050  0.165 -0.156  0.029  0.122  0.077 -0.177
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.37147404 -0.46470308  0.04422587  0.56661012  3.39099376 
## 
## Number of Observations: 169
## Number of Groups: 
##                                                                  g 
##                                                                  1 
##                                                         g.0 %in% g 
##                                                                  1 
##                                                g.1 %in% g.0 %in% g 
##                                                                  1 
##                                       g.2 %in% g.1 %in% g.0 %in% g 
##                                                                  1 
##                              g.3 %in% g.2 %in% g.1 %in% g.0 %in% g 
##                                                                  1 
##                     g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g 
##                                                                  1 
##            g.5 %in% g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g 
##                                                                  1 
## dummy %in% g.5 %in% g.4 %in% g.3 %in% g.2 %in% g.1 %in% g.0 %in% g 
##                                                                  1
summary(fitTL$gam)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## CWM.Trophic.level ~ s(Depth, k = 3) + s(SBT, k = 3) + s(SBS, 
##     k = 3) + s(Chl, k = 3) + s(SBT_sea, k = 3) + s(Chl_sea, k = 3) + 
##     s(Fishing, k = 3)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.73755    0.01204   310.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df      F  p-value    
## s(Depth)   1.000  1.000 36.145  < 2e-16 ***
## s(SBT)     1.000  1.000 16.278 8.49e-05 ***
## s(SBS)     1.485  1.485  0.276 0.658821    
## s(Chl)     1.000  1.000  0.482 0.488674    
## s(SBT_sea) 1.000  1.000 13.365 0.000348 ***
## s(Chl_sea) 1.000  1.000  0.157 0.692312    
## s(Fishing) 1.948  1.948 11.284 2.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.54   
##   Scale est. = 0.0081188  n = 169
# Check diagnostics
par(mfrow=c(2,2))
gam.check(fitTL$gam)

## 
## 'gamm' based fit - care required with interpretation.
## Checks based on working residuals may be misleading.
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##              k'  edf k-index p-value    
## s(Depth)   2.00 1.00    0.90   0.115    
## s(SBT)     2.00 1.00    0.78  <2e-16 ***
## s(SBS)     2.00 1.48    0.96   0.240    
## s(Chl)     2.00 1.00    0.81   0.005 ** 
## s(SBT_sea) 2.00 1.00    0.95   0.180    
## s(Chl_sea) 2.00 1.00    0.98   0.310    
## s(Fishing) 2.00 1.95    0.86   0.030 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow=c(4,2),mar=c(4,4,0.2,0.2))
plot(fitTL$gam,shade=T,shade.col="grey",res=T,rug=F,pch=20)
par(mfrow=c(1,1),mar=c(5,4,2,2))

3.3.2 Automated model selection

Here, we perform automated model selection testing all combinations of predictors using the dredge() function (and rank according to AICc) Please see the documentation ?dredge from the MuMIn package to inspect the function and all its options.

Be aware, the computation can take several minutes.

results <- dredge(fitTL, m.lim=c(1,4), rank="AICc") # Here test max 4 variables per model to reduce run time
subset(results, delta <5)  # Depth, SBT and fishing are key variables
## Global model call: uGamm(formula = CWM.Trophic.level ~ s(Depth, k = 3) + s(SBT, 
##     k = 3) + s(SBS, k = 3) + s(Chl, k = 3) + s(SBT_sea, k = 3) + 
##     s(Chl_sea, k = 3) + s(Fishing, k = 3), random = list(dummy = ~1), 
##     gaussian(link = "identity"), correlation = corGaus(form = ~lon + 
##         lat), data = FDenv, control = lmeControl(opt = "optim"))
## ---
## Model selection table 
##     (Int) s(Dpt,3) s(Fsh,3) s(SBT_sea,3) s(SBT,3) df  logLik   AICc delta
## 109 3.738        +        +            +        + 12 201.895 -377.8     0
##     weight
## 109      1
## Models ranked by AICc(x)
# Calculate and view relative variable importance (RVI) scores
importance(results) 
## Warning: 'importance' is deprecated.
## Use 'sw' instead.
## See help("Deprecated")
## function (x) 
## UseMethod("sw")
## <bytecode: 0x7f80a54f99e0>
## <environment: namespace:MuMIn>
# Fit and inspect the "best" model
fitTLb <- uGamm(CWM.Trophic.level ~ s(Depth, k=3)
               +s(SBT,k=3),
               #+s(Fishing, k=3),
               gaussian(link = "identity"),random = list(dummy=~1), correlation = corGaus(form = ~ lon+lat), # Can try other error structures
               data = FDenv, control=lmeControl(opt="optim"))
# Inspect summary statistics
summary(fitTLb$lme)
## Linear mixed-effects model fit by maximum likelihood
##   Data: strip.offset(mf) 
##         AIC       BIC   logLik
##   -351.8951 -326.8559 183.9476
## 
## Random effects:
##  Formula: ~Xr - 1 | g
##                Xr
## StdDev: 0.2691244
## 
##  Formula: ~Xr.0 - 1 | g.0 %in% g
##              Xr.0
## StdDev: 0.1859293
## 
##  Formula: ~1 | dummy %in% g.0 %in% g
##          (Intercept)  Residual
## StdDev: 0.0002000571 0.1066766
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~lon + lat | g/g.0/dummy 
##  Parameter estimate(s):
##    range 
## 3.018607 
## Fixed effects:  y ~ X - 1 
##                  Value   Std.Error  DF   t-value p-value
## X(Intercept)  3.732015 0.014839298 166 251.49538   0e+00
## Xs(Depth)Fx1 -0.061379 0.009882393 166  -6.21096   0e+00
## Xs(SBT)Fx1   -0.044205 0.012825777 166  -3.44661   7e-04
##  Correlation: 
##              X(Int) X(D)F1
## Xs(Depth)Fx1 0.097        
## Xs(SBT)Fx1   0.036  0.316 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.7278180 -0.3977666  0.2045820  0.6788054  2.9158327 
## 
## Number of Observations: 169
## Number of Groups: 
##                     g            g.0 %in% g dummy %in% g.0 %in% g 
##                     1                     1                     1
summary(fitTLb$gam)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## CWM.Trophic.level ~ s(Depth, k = 3) + s(SBT, k = 3)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.73201    0.01475     253   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df      F  p-value    
## s(Depth) 1.725  1.725 17.653 9.84e-07 ***
## s(SBT)   1.588  1.588  9.963  0.00117 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.355   
##   Scale est. = 0.01138   n = 169
# Check diagnostics
par(mfrow=c(2,2))
gam.check(fitTLb$gam)

## 
## 'gamm' based fit - care required with interpretation.
## Checks based on working residuals may be misleading.
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##            k'  edf k-index p-value   
## s(Depth) 2.00 1.72    0.83   0.010 **
## s(SBT)   2.00 1.59    0.80   0.005 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Plot model smooth terms
par(mfrow=c(2,1),mar=c(4,4,0.2,0.2))
plot(fitTLb$gam,shade=T,shade.col="grey",res=T,rug=F,pch=20)

4. Random forest

Instead of GAM, we can try to model the CWM trait with random forest. We use the function randomForest() from the randomForest package. Please see the documentation ?randomForest to inspect the function and all its options.

Again we create a function to compute random forests and plot the outputs.

# Wrapper function to explore RF for a given trait
funRf<-function(Var){
  colnames(FDenv)[which(colnames(FDenv)==Var)]<-"Var"
  fit<-randomForest(Var~Depth+SBT+SBS+Chl+SBT_sea+Chl_sea+Fishing, data=FDenv,ntree=1000,importance=T, mtry=2)
  par(mfrow=c(4,2),mar=c(4,4,0.2,0.2))
  partialPlot(fit, x.var=Depth,FDenv,main="")
  partialPlot(fit, x.var=SBT,FDenv,main="")
  partialPlot(fit, x.var=SBS,FDenv,main="")
  partialPlot(fit, x.var=Chl,FDenv,main="")
  partialPlot(fit, x.var=SBT_sea,FDenv,main="")
  partialPlot(fit, x.var=Chl_sea,FDenv,main="")
  partialPlot(fit, x.var=Fishing,FDenv,main="")
  par(mfrow=c(1,1),mar=c(5,4,2,2))
  return(fit)
}

Now, we compute the randomForest for CWM trophic level.

fitTLrf <- funRf(Metrics[9])# Plot random forest response plots and summary stats

Check error against number of trees used for training

plot(fitTLrf) 

It is stable after 200 (hence 100 trees is more than enough).

Inspect summary stats

print(fitTLrf) # Inspect summary stats
## 
## Call:
##  randomForest(formula = Var ~ Depth + SBT + SBS + Chl + SBT_sea +      Chl_sea + Fishing, data = FDenv, ntree = 1000, importance = T,      mtry = 2) 
##                Type of random forest: regression
##                      Number of trees: 1000
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 0.006384695
##                     % Var explained: 65.55

Check variable importance plots

varImpPlot(fitTLrf)

Optional - Generalized linear mixed models (GLMM)

We test GLMM and extract parameters to facilitate comparisons across organism groups and areas.

Please note that you may need to introduce square terms to account for non-linearities.

Additionally, you need to load the package nlme.

library(nlme)
fitTLglm <- lme(CWM.Trophic.level~Depth+SBT+SBS+Chl+SBT_sea+Chl_sea+Fishing,
                random = list(dummy=~1), correlation = corGaus(form = ~ lon+lat),
                data = FDenv, control=lmeControl(opt="optim"))
summary(fitTLglm)
## Linear mixed-effects model fit by REML
##   Data: FDenv 
##         AIC       BIC   logLik
##   -300.0037 -266.1083 161.0019
## 
## Random effects:
##  Formula: ~1 | dummy
##         (Intercept)  Residual
## StdDev:  0.02069699 0.1008978
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~lon + lat | dummy 
##  Parameter estimate(s):
##    range 
## 2.969371 
## Fixed effects:  CWM.Trophic.level ~ Depth + SBT + SBS + Chl + SBT_sea + Chl_sea +      Fishing 
##                 Value Std.Error  DF   t-value p-value
## (Intercept)  3.756969 0.5970636 161  6.292411  0.0000
## Depth        0.000330 0.0000582 161  5.672968  0.0000
## SBT          0.014832 0.0034706 161  4.273509  0.0000
## SBS         -0.005815 0.0168659 161 -0.344788  0.7307
## Chl          0.011201 0.0264687 161  0.423197  0.6727
## SBT_sea     -0.018698 0.0056356 161 -3.317887  0.0011
## Chl_sea      0.003276 0.0061583 161  0.531993  0.5955
## Fishing      0.077550 0.0317978 161  2.438846  0.0158
##  Correlation: 
##         (Intr) Depth  SBT    SBS    Chl    SBT_se Chl_se
## Depth   -0.068                                          
## SBT      0.473  0.175                                   
## SBS     -0.997  0.021 -0.503                            
## Chl     -0.145  0.175 -0.259  0.140                     
## SBT_sea -0.704  0.153 -0.521  0.699  0.033              
## Chl_sea -0.163  0.062  0.191  0.143 -0.390 -0.032       
## Fishing -0.062  0.193 -0.115  0.032  0.110  0.098 -0.180
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.7833294 -0.3417456  0.1134884  0.5412211  3.1613291 
## 
## Number of Observations: 169
## Number of Groups: 1
anova(fitTLglm)
##             numDF denDF   F-value p-value
## (Intercept)     1   161 22756.033  <.0001
## Depth           1   161    29.730  <.0001
## SBT             1   161    16.630  0.0001
## SBS             1   161     7.270  0.0078
## Chl             1   161     0.404  0.5257
## SBT_sea         1   161    12.676  0.0005
## Chl_sea         1   161     0.973  0.3254
## Fishing         1   161     5.948  0.0158
# Extract parameters
summary(fitTLglm)
## Linear mixed-effects model fit by REML
##   Data: FDenv 
##         AIC       BIC   logLik
##   -300.0037 -266.1083 161.0019
## 
## Random effects:
##  Formula: ~1 | dummy
##         (Intercept)  Residual
## StdDev:  0.02069699 0.1008978
## 
## Correlation Structure: Gaussian spatial correlation
##  Formula: ~lon + lat | dummy 
##  Parameter estimate(s):
##    range 
## 2.969371 
## Fixed effects:  CWM.Trophic.level ~ Depth + SBT + SBS + Chl + SBT_sea + Chl_sea +      Fishing 
##                 Value Std.Error  DF   t-value p-value
## (Intercept)  3.756969 0.5970636 161  6.292411  0.0000
## Depth        0.000330 0.0000582 161  5.672968  0.0000
## SBT          0.014832 0.0034706 161  4.273509  0.0000
## SBS         -0.005815 0.0168659 161 -0.344788  0.7307
## Chl          0.011201 0.0264687 161  0.423197  0.6727
## SBT_sea     -0.018698 0.0056356 161 -3.317887  0.0011
## Chl_sea      0.003276 0.0061583 161  0.531993  0.5955
## Fishing      0.077550 0.0317978 161  2.438846  0.0158
##  Correlation: 
##         (Intr) Depth  SBT    SBS    Chl    SBT_se Chl_se
## Depth   -0.068                                          
## SBT      0.473  0.175                                   
## SBS     -0.997  0.021 -0.503                            
## Chl     -0.145  0.175 -0.259  0.140                     
## SBT_sea -0.704  0.153 -0.521  0.699  0.033              
## Chl_sea -0.163  0.062  0.191  0.143 -0.390 -0.032       
## Fishing -0.062  0.193 -0.115  0.032  0.110  0.098 -0.180
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.7833294 -0.3417456  0.1134884  0.5412211  3.1613291 
## 
## Number of Observations: 169
## Number of Groups: 1
summary(fitTLglm)$coefficients
## $fixed
##   (Intercept)         Depth           SBT           SBS           Chl 
##  3.7569695000  0.0003300202  0.0148315218 -0.0058151504  0.0112014983 
##       SBT_sea       Chl_sea       Fishing 
## -0.0186984120  0.0032761845  0.0775498552 
## 
## $random
## $random$dummy
##     (Intercept)
## 1 -5.331865e-16
##Plot some diagnostics
plot(fitTLglm)

qqnorm(residuals(fitTLglm))

#observed versus fitted values
plot(fitTLglm, CWM.Trophic.level~ fitted(.), abline = c(0,1))