This tutorial illustrates how to build and post-process a spatial HMSC model including traits and species ‘phylogeny’. The focus is set on modeling trait-environment relationships but we will also explore some other species and community based outputs.

1. Preliminaries

Load packages and dataset

library(Hmsc) 
library(tidyverse)
library(viridis)
library(vioplot)
library(abind)
library(RColorBrewer)
library(ape)
library(corrplot)

If you get an error message, check that the R packages and their dependencies are installed correctly. If not, use the command to install all listed packages: install.packages(c("Hmsc", "tidyverse", "viridis","vioplot","abind","RColorBrewer, "ape", "corrplot")).

The example data set is available as the Rdata file NEAtl_FishTraitEnv.Rdata, available for download here.

Make sure the file NEAtl_FishTraitEnv.Rdata is in your working directory, then load it in R.

load("NEAtl_FishTraitEnv.Rdata")

The Rdata file contains four objects:

abu containing the abundance of taxa in grid cells
env containing the environmental condition per grid cell
trait containing the trait information per taxa
coo the coordinates of each grid cell
taxo taxonomic relationship of species (taxonomic/phylogenetic tree)

Importantly, the rows in abu correspond to the same grid cell than the rows in env, and the column in abu correspond to the same taxa than the rows in trait.

all(row.names(abu)==row.names(env))

## [1] TRUE

all(colnames(abu)==row.names(trait))

## [1] TRUE

Quick summary of the variables

Traits

dim(trait)

## [1] 148   7

names(trait)

## [1] "Trophic.level"      "K"                  "Lmax"              
## [4] "Lifespan"           "Age.maturity"       "Fecundity_log"     
## [7] "Offspring.size_log"

The trait table contains 7 traits (i.e variable, in column) characterizing 148 taxa (in rows). The 7 traits broadly represent the life history and ecology of fish in terms of their feeding, growth, survival and reproduction. These are:

Trophic level
K: the growth rate, calculated as Von Bertalanffy growth coefficient in year$^{-1}$
Lmax: maximum body length in cm
Lifespan
Offspring.size_log: egg diameter, length of egg case or length of pup in mm
Fecundity_log: number of offspring produced by a female per year
Age.maturity: in years

Trait values for fecundity and offspring size were log-transformed to reduce the influence of outliers.

For further analysis in HMSC, convert trait names to one one word per column in case traits consist of multiple words

trait<-trait %>%  
  dplyr::rename_all(make.names)
head(trait)

##                      Trophic.level    K Lmax Lifespan Age.maturity
## Agonus cataphractus           3.43 0.47   21      1.0          3.0
## Alosa alosa                   2.95 0.32   65     10.0          3.5
## Alosa fallax                  4.03 0.33   43     25.0          2.5
## Amblyraja hyperborea          4.31 0.15   89     24.0         11.0
## Amblyraja radiata             4.20 0.15   69     16.0         11.0
## Ammodytes                     3.15 0.55   23      7.7          3.2
##                      Fecundity_log Offspring.size_log
## Agonus cataphractus       7.843849          0.6931472
## Alosa alosa              12.652360          0.0000000
## Alosa fallax             11.875831          0.0000000
## Amblyraja hyperborea      3.401197          4.6298628
## Amblyraja radiata         2.833213          4.0073332
## Ammodytes                 9.429476         -0.0618754

Environment

dim(env)

## [1] 169   7

names(env)

## [1] "Depth"   "SBT"     "SBS"     "Chl"     "SBT_sea" "Chl_sea" "Fishing"

The env table contains 7 environmental variables (in column) characterizing 169 grid cells (in rows). The environmental variables measure hydrography, habitat, food availability and anthropogenic pressures, which are known to affect the distribution of fish species. These are:

Depth: depth in meter, directly measured during the survey.
SBT: monthly sea bottom temperature in °C from the Global Ocean Physics Reanalysis (GLORYSs2v4)
SBS: monthly sea bottom salinity from the Global Ocean Physics Reanalysis (GLORYSs2v4)
Chl: Chlorophyll a concentration (in $mg.m^{-3}$) as a proxy for primary production and food availability from the GlobColour database
SBT_sea: seasonality of sea bottom temperature, calculated as the difference between the warmest and the coldest month of the year.
Chl_sea: seasonality of chlorophyll a concentration, calculated as the difference between the highest and the lowest primary production in the year
Fishing: the cumulative demersal fishing pressure in 2013, estimated globally by Halpern et al. 2015, DOI 10.1038/ncomms8615.

Taxonomic tree

We build a phylo object from the taxonomic relationships provided in taxo

First make sure the taxonomic units are factors

str(taxo) #listed as chr

## 'data.frame':    148 obs. of  6 variables:
##  $ phylum : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class  : chr  "Actinopterygii" "Actinopterygii" "Actinopterygii" "Elasmobranchii" ...
##  $ order  : chr  "Scorpaeniformes" "Clupeiformes" "Clupeiformes" "Rajiformes" ...
##  $ family : chr  "Agonidae" "Clupeidae" "Clupeidae" "Rajidae" ...
##  $ genus  : chr  "Agonus" "Alosa" "Alosa" "Amblyraja" ...
##  $ species: chr  "Agonus cataphractus" "Alosa alosa" "Alosa fallax" "Amblyraja hyperborea" ...

taxo <- as.data.frame(lapply(taxo, as.factor)) # make factors
str(taxo) # now they are!

## 'data.frame':    148 obs. of  6 variables:
##  $ phylum : Factor w/ 1 level "Chordata": 1 1 1 1 1 1 1 1 1 1 ...
##  $ class  : Factor w/ 4 levels "Actinopterygii",..: 1 1 1 2 2 1 1 1 1 1 ...
##  $ order  : Factor w/ 20 levels "Anguilliformes",..: 16 6 6 15 15 12 12 12 12 12 ...
##  $ family : Factor w/ 54 levels "Agonidae","Ammodytidae",..: 1 12 12 38 38 2 3 3 3 48 ...
##  $ genus  : Factor w/ 105 levels "Agonus","Alosa",..: 1 2 2 3 3 4 5 5 5 6 ...
##  $ species: Factor w/ 148 levels "Agonus cataphractus",..: 1 2 3 4 5 6 7 8 9 10 ...

# build the tree  
tree<-as.phylo(~class/order/family/genus/species, data=taxo, collapse = FALSE)

## Warning in as.phylo.formula(~class/order/family/genus/species, data = taxo, :
## NAs introduced by coercion

tree$edge.length<-rep(1,length(tree$edge))

# It's important to check if the tip lables of tree correspond to the names in abu
if(all(sort(tree$tip.label) == sort(colnames(abu)))){
  print("species names in tree and abu match")
} else{
  print("species names in tree and abu do not match")
} #there is some problem which we can already see from the NA message before.

## Warning in sort(tree$tip.label) == sort(colnames(abu)): longer object length is
## not a multiple of shorter object length

## [1] "species names in tree and abu do not match"

# Looks like we have some NA in genus, so what we do now is take the species information for all NA at genus level to fill the NA correctly.
taxo2<-taxo%>% 
  mutate(genus = coalesce(genus,species))

# Build tree again with new taxo2
tree<-as.phylo(~class/order/family/genus/species, data=taxo2, collapse = FALSE) # no NA message
tree$edge.length<-rep(1,length(tree$edge))

# Check lable correspondence
 if(all(sort(tree$tip.label) == sort(colnames(abu)))){
  print("species names in tree and abu match")
} else{
  print("species names in tree and abu do not match")
} # All good now!

## [1] "species names in tree and abu match"

#Have a look at the tree
str(tree)

## List of 5
##  $ edge       : int [1:334, 1:2] 149 150 151 152 153 152 154 151 155 156 ...
##  $ Nnode      : int 187
##  $ node.label : chr [1:187] "" "Actinopterygii" "Scorpaeniformes" "Agonidae" ...
##  $ tip.label  : chr [1:148] "Agonus cataphractus" "Leptagonus decagonus" "Artediellus atlanticus" "Icelus bicornis" ...
##  $ edge.length: num [1:668] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

plot(tree, cex=0.5)

2. Setting up the HMSC model

2.1 Define regression model for environmental covariates

XFormula = as.formula(paste("~",paste(colnames(env), collapse="+")))
XFormula

## ~Depth + SBT + SBS + Chl + SBT_sea + Chl_sea + Fishing

You can also specify second order polynomial response terms for your regression model. Then HMSC fits both, the liner term and the quadratic term as covariates. Here, this is exemplified for SBT, SBS and Chl, where one could ecologically assume bell shaped responses. For reasons of simplicity, we continue with the above defined regression model.

XFormula = ~ Depth + poly(SBT, degree = 2, raw= TRUE) + poly(SBT, degree = 2, raw= TRUE) + poly(SBS, degree = 2, raw= TRUE) + poly(Chl, degree = 2, raw= TRUE) + SBT_sea + Chl_sea + Fishing

2.2 Define regression model for traits

TrFormula = as.formula(paste("~",paste(colnames(trait), collapse="+")))
TrFormula

## ~Trophic.level + K + Lmax + Lifespan + Age.maturity + Fecundity_log + 
##     Offspring.size_log

2.3 Set up study design with spatial random effect

studyDesign = data.frame(grid.cell = as.factor(rownames(coo))) # "sample" level is defined as grid.cell
rL = HmscRandomLevel(sData = coo) # here the random effect is defined to be spatially explicit based on the coordinates of the grid cell
rL = setPriors(rL,nfMin=1,nfMax=2) # here we set the number of latent variable factors. In this example they are low for computational reasons 

# Have a look at the random effect
rL

## Hmsc random level object with 169 units. Spatial dimensionality is 2 and number of covariates is 0.

head(rL$s)

##      Longitude Latitude
## 1554    -28.75    66.25
## 1577    -26.25    63.75
## 1578    -26.25    66.25
## 1601    -23.75    63.75
## 1602    -23.75    66.25
## 1625    -21.25    63.75

2.4 Set up MCMC sample specifications

test.run = TRUE # TEST RUN TAKES ABOUT 2 MIN ON MY COMPUTER
if (test.run){
  # with this option, the model runs fast but results are not reliable
  thin = 1 # interval of iterations per samples 
  samples = 10 # number of samples taken per chain
  transient = ceiling(0.5*samples*thin) # burn-in, i.e. number of first iterations which will be cut off
  nChains = 2 # number of independent MCMC chains 
} else { 
  # with this option, the model evaluates slow but it reproduces the results shown in this tutorial
  thin = 100 # interval of iterations per samples 
  samples = 250 # number of samples taken per chain
  transient = ceiling(0.5*samples*thin) # burn-in, i.e. number of first iterations which will be cut off
  nChains = 4 # number of independent MCMC chains 
  }

2.4 Specify (unfitted) HMSC model structure

m = Hmsc(Y= abu, XData = env,  XFormula = XFormula, TrFormula = TrFormula, TrData = trait, phyloTree = tree, studyDesign = studyDesign, ranLevels = list("grid.cell"= rL),  distr = "normal")

2.5 Fit and save model

 m = sampleMcmc(m, thin = thin, samples = samples, transient = transient, nChains = nChains) 

# You can also run chains in parallel by adding 'nParallel = nChains' in the sampleMCMC() function, but you won't see the progress of the run until it finished.

filename =  paste("NorthAtlantic","_thin_", as.character(thin),"_samples_", as.character(samples),"_chains_", as.character(nChains), ".Rdata",sep = "")

save(m,file=filename)

You can now either continue with the post-processing of the model with your fitted test run, or maybe better, load the previously fitted model which used a higher thinning, more samples and chains, hence, included more iterations for better convergence of the MCMC chains.

load("NorthAtlantic_rL_tree_thin_10_samples_250_chains_4.Rdata")

3. Check MCMC convergence diagnostics

# convert model to coda object
mpost = convertToCodaObject(m)

You can visually check the MCMC convergence and mixing for beta (species-environment) and gamma (trait-environment) parameters

Since we have 7 environmental covariates and 148 species we get 1036 specific mixing plots for the beta parameters … not too convenient to go through.

plot(mpost$Beta)

Rather have a look at summarized diagnostics

# check effective sample size 
# should ideally be close to your drawn samples, here, 4 (chains) * 250 (samples) = 1000 samples)
summary(effectiveSize(mpost$Beta))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   786.3   973.5  1000.0  1039.7  1073.1  3009.1

# check the distribution of the effective sample size (ess) and the Gelman-Rubin potential scale reduction factor (psrf), the latter should be ideally (very) close to 1, if not, chains did not properly converge and need more iterations. 
# These should be checked for all parameters of interest, i.e. beta and gamma
# Provided model looks quite okay already but could be better.

par(mfrow=c(1,3))
# Beta parameters (species-environment) 
hist(effectiveSize(mpost$Beta), main="ess(beta)")
hist(gelman.diag(mpost$Beta, multivariate = FALSE)$psrf, main="psrf(beta)")
vioplot(gelman.diag(mpost$Beta,multivariate = FALSE)$psrf,main="psrf(beta)")

# Gamma parameters (trait-environment)
hist(effectiveSize(mpost$Gamma), main="ess(gamma)")
hist(gelman.diag(mpost$Gamma, multivariate = FALSE)$psrf, main="psrf(gamma)")
vioplot(gelman.diag(mpost$Gamma,multivariate = FALSE)$psrf,main="psrf(gamma)")

4. Post process model results

4.1 Compute predicted values

# compute predicted species abundance matrix/ posterior samples 
predY=computePredictedValues(m)

4.2 Evaluate model fit (MF)

For our model type (abundance, normal distribution) MF provides two metrics, calculated for all individual species.

Root Mean Square Error ($RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.

R$^{2}$ ($R2): your “usual” measure of explained variation. Here prvided species specifically as result of the regression model (XFormula), with included enviornmental covariates

MF = evaluateModelFit(hM = m, predY = predY)
MF

## $RMSE
##   [1] 0.0241029358 0.0014314489 0.0073169183 0.0187911408 0.0885159012
##   [6] 0.0346446105 0.0136305201 0.0943027509 0.0360033458 0.0208712288
##  [11] 0.0154651006 0.1461136596 0.1203870472 0.0408219267 0.0515242462
##  [16] 0.1210255635 0.0055227474 0.0005995548 0.0128888613 0.2045381038
##  [21] 0.0579907823 0.0367240697 0.0543206618 0.0523487887 0.0018713588
##  [26] 0.0433405301 0.0615989253 0.0079690515 0.0177833047 0.0849765722
##  [31] 0.0060515463 0.0011669438 0.1376531756 0.0258356317 0.0392692258
##  [36] 0.0162264416 0.0014285170 0.0304224151 0.0137556508 0.0134335595
##  [41] 0.0086522366 0.0373137086 0.0269322690 0.0591678346 0.0319173319
##  [46] 0.0637858504 0.0186484259 0.0894529194 0.1557369308 0.1736746185
##  [51] 0.0166861096 0.0037573490 0.0933817135 0.0470956150 0.0798659479
##  [56] 0.0521142782 0.0052529823 0.0070319185 0.1078269908 0.1851185561
##  [61] 0.0080399183 0.0027945350 0.0321503914 0.0729486076 0.0010771206
##  [66] 0.0015120361 0.0068494672 0.0894043068 0.1055254614 0.0739688439
##  [71] 0.0971860282 0.0018487378 0.0032043031 0.0232099973 0.1300051939
##  [76] 0.0113907181 0.0892338450 0.0425038936 0.0304542148 0.0363220015
##  [81] 0.0790463131 0.0033406344 0.0035364000 0.0410142846 0.0282597170
##  [86] 0.0653487791 0.0696642229 0.0198900945 0.0620962660 0.0385681678
##  [91] 0.0160730309 0.0289539578 0.0165150585 0.1855458666 0.0340455503
##  [96] 0.2751537032 0.1429938941 0.1637984239 0.0696600465 0.2129265801
## [101] 0.0929376991 0.0361639325 0.0448428580 0.0274158349 0.0298960751
## [106] 0.0259057176 0.0392662996 0.0219753868 0.0354233989 0.0022716614
## [111] 0.0003792495 0.0030218258 0.0041627969 0.0608144431 0.0444505572
## [116] 0.0768815680 0.0095162587 0.1055773264 0.0035414957 0.0281635514
## [121] 0.0263790250 0.0130644601 0.0014693578 0.1418579748 0.0368109862
## [126] 0.0763812344 0.0054467997 0.0151676489 0.0103469626 0.0912308275
## [131] 0.0101719696 0.3477377602 0.0160081278 0.0495907360 0.1163158962
## [136] 0.0169024463 0.0347735323 0.0356100947 0.1023453991 0.0090917374
## [141] 0.0832952640 0.1899526158 0.0255823513 0.3298954455 0.0362762143
## [146] 0.1280924593 0.0010536769 0.0229889704
## 
## $R2
##   [1] 0.48334800 0.21134569 0.26939255 0.49018658 0.37220233 0.30125187
##   [7] 0.35742836 0.21470593 0.23726842 0.10341809 0.28461468 0.40550609
##  [13] 0.54343096 0.36378860 0.28327302 0.51454944 0.17075013 0.09448774
##  [19] 0.14183871 0.53572196 0.30319968 0.45354565 0.66146518 0.43327172
##  [25] 0.34344043 0.37855572 0.70240299 0.40640819 0.42261932 0.35717771
##  [31] 0.36822101 0.19487403 0.80322460 0.24793953 0.45209886 0.05802683
##  [37] 0.28380160 0.37262344 0.55688773 0.29448202 0.28956787 0.34404460
##  [43] 0.34351027 0.39777003 0.31827264 0.27473236 0.12880359 0.83042682
##  [49] 0.32802159 0.68048040 0.10038272 0.32936502 0.54997327 0.10285840
##  [55] 0.41673962 0.43296509 0.09756813 0.19446871 0.41832397 0.67546889
##  [61] 0.44414168 0.38859948 0.40332475 0.21799855 0.34886891 0.29572280
##  [67] 0.05778352 0.44492038 0.43538088 0.55718027 0.59974749 0.08081121
##  [73] 0.11308655 0.57573746 0.80031489 0.23978694 0.22967848 0.07085071
##  [79] 0.48016469 0.51944714 0.24126949 0.05348689 0.04748534 0.24422022
##  [85] 0.34328053 0.18463908 0.20770095 0.26799344 0.29196864 0.27014509
##  [91] 0.14385522 0.28130739 0.14911916 0.53910893 0.18958411 0.44271186
##  [97] 0.83163447 0.60234361 0.44140420 0.57077242 0.61137988 0.05872545
## [103] 0.38538305 0.38681484 0.55980961 0.39099866 0.22710801 0.52688414
## [109] 0.56314817 0.21602132 0.02156530 0.27497435 0.10001841 0.53105487
## [115] 0.63789700 0.81485566 0.19083529 0.47385487 0.40923046 0.52483313
## [121] 0.51025962 0.12886184 0.09380564 0.64007033 0.52590710 0.76114005
## [127] 0.60816816 0.43082696 0.17797052 0.76162523 0.31433613 0.40154301
## [133] 0.60903099 0.52615478 0.81181975 0.46821117 0.04339615 0.33103648
## [139] 0.84848783 0.15524245 0.27244413 0.30134776 0.28207730 0.42933768
## [145] 0.62776174 0.75578451 0.29152432 0.76118208

mean(MF$R2)

## [1] 0.3793841

#Here, we take the mean of posterior samples for further processing
predY = apply(abind(predY,along=3),c(1,2), mean)

# Plot species specific R2 in relation to prevalence
plot(colSums(((m$Y>0)*1)/m$ny), MF$R2,main=paste("Mean R2 = ", round(mean(MF$R2),2),".", sep=""), xlab = "Prevalence")

4.3 Varience Partitioning

par(mfrow=c(1,1))
# Specify groups of how the variation should be partitioned You can also combine groups 
group=c(1,2,3,4,5,6,7)
# Specify group names m$covNames[-1] gives the included covariate names excluding the intercept
groupnames = m$covNames[-1]
#compte species specific variance partitioning
VP = computeVariancePartitioning(hM = m, group = group, groupnames = groupnames)

plotVariancePartitioning(m, VP, viridis(8))

4.4 Plot species-environment relationships

# Shown are all species with 90% posterior support for having positive (red) or negative(blue) responses to environmental covariates
beta = getPostEstimate(m, "Beta")
plotBeta(m, beta, supportLevel=.9, spNamesNumbers = c(FALSE, FALSE), covNamesNumbers = c(TRUE, FALSE), plotTree = T)

## [1] 0.3 1.0 0.0 1.0

4.5 Plot trait-environment relationships

gamma = getPostEstimate(m, "Gamma")
plotGamma(m, gamma, supportLevel=.8)

# modify support level to highlight stronger or weaker posterior support for positive/ negative relationship

4.6 Trait specifc metrics

We can ask about the influence of traits on species abundances, i.e. how much do traits explain out of the variation in species abundances

VP$R2T$Y

## [1] 0.01933062

# The traits explain only 1.93% of variation in species abundances

Let us then ask how much the traits explain out of the variation among the species in their responses to environmental covariates.

VP$R2T$Beta

## (Intercept)       Depth         SBT         SBS         Chl     SBT_sea 
## 0.011770572 0.010954368 0.024170906 0.011867357 0.007802353 0.023027623 
##     Chl_sea     Fishing 
## 0.008830540 0.026924321

barplot(VP$R2T$Beta[-1])

#The traits explain also not much of the variation in the species niches, with traits explaining most out of the variation in species responses to SBT, SBT_sea and fishing with ca. 2.5% each

4.7 Trait-enviornment relationships

Now let’s see how the predicted trait-environment responses are over the realized environmental gradient. Here we exemplified for all CWM trait responses as well as species richness with SBT as focal variable.

# Construct environmental Gradient based on fitted model, specify your focal variable. 
Gradient = constructGradient(m, focalVariable="SBT", ngrid = 25)
# make predictions based on fitted model 
predYgradient = predict(m,XData=Gradient$XDataNew, ranLevels = Gradient$rLNew, studyDesign = Gradient$studyDesignNew, expected = TRUE) 

par(mfrow=c(2,4))
# measure = T implies trait relationship to focal variable, index = 2 implies second trait of trait matrix (1 is the intercept). 

# traits ~ SBT 
plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 2, showData =F)

## [1] 0.999

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 3, showData =F)

## [1] 0.422

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 4, showData =F)

## [1] 0.021

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 5, showData =F)

## [1] 0.977

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 6, showData =F)

## [1] 0.003

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 7, showData =F)

## [1] 0.984

plotGradient(m, Gradient, pred=predYgradient, measure="T", index = 8, showData =F)

## [1] 0.045

# Species  ~ SBT // If you specify measure = Y, this is the species response, with index being the number of species in the abu matrix
plotGradient(m, Gradient, pred=predYgradient, measure="Y", index = 50, showData =F) #here examplified for species 50, cod

## [1] 0

# Note that the scale of all relationships is very small for the traits

Plotting predicted trait distribution over space

S=rowSums(predY)
#predicted CWM
predT = (predY%*%m$Tr)/matrix(rep(S,m$nt),ncol=m$nt)

#make data frame for plotting
data<-data.frame(predT, coo)
colpal<-rev(brewer.pal(11,"RdYlBu")) # Color palette for the map

funPlot<-function(Var){
colnames(data)[which(colnames(data)==Var)]<-"Var"
ggplot(data, aes(x= Longitude, y= Latitude, fill= Var))+
  geom_tile(data=data,aes(x=Longitude,y=Latitude,fill= Var)) + # 
  scale_fill_gradientn(name = Var, colours=colpal, na.value = 'white')+
  borders(fill="gray44",colour="black") +
  coord_quickmap(xlim=c(range(data$Longitude)),ylim=c(range(data$Latitude)))+
  labs(x = "Lon",y="Lat")+
  theme(legend.position = c(0.93,0.8))+
  theme(legend.title = element_text(size = 8),legend.text = element_text(size = 7))+
  guides(shape = guide_legend(override.aes = list(size = 0.2)))+
  theme(panel.background = element_rect(fill=alpha('light blue', 0.4), colour = 'black'))
}
# list all output 
Metrics<-colnames(data)
Metrics

##  [1] "X.Intercept."       "Trophic.level"      "K"                 
##  [4] "Lmax"               "Lifespan"           "Age.maturity"      
##  [7] "Fecundity_log"      "Offspring.size_log" "Longitude"         
## [10] "Latitude"

#We plot the CWM of Trophic level, which is the 2nd variable
funPlot(Var=Metrics[2])# Plot CWM traits and diversity metrics

4.8 Species co-occurrance patterns at random effect level

Here we illustrate the residual association plot among species at the level of grid cell. Pairs of species illustrated by red and blue show positive and negative associations, respectively, with statistical support of at least 95% posterior probability

par(mfrow=c(1,1))
OmegaCor = computeAssociations(m)
supportLevel = 0.9
for (r in 1:m$nr){ 
  plotOrder = corrMatOrder(OmegaCor[[r]]$mean,order="AOE") 
  toPlot = ((OmegaCor[[r]]$support>supportLevel) +                                    (OmegaCor[[r]]$support<(1-supportLevel))>0)*OmegaCor[[r]]$mean 
  par(xpd=T) 
  colnames(toPlot)=rownames(toPlot)=gsub("_"," ",x=colnames(toPlot)) 
  corrplot(toPlot[plotOrder,plotOrder], method = "color", col=colorRampPalette(c("blue","white","red"))(200), title=paste("random effect level:",m$rLNames[r]),type="full",tl.col="black",tl.cex=.4, mar=c(0,0,6,0))
}

5. A brief excursion considering only occurrance data instead of abundance

We build a presence-absence model with the same structure and specifications as above. For this we need to change the link function to “probit” and transform the abundance data to presence-absence.

m = Hmsc(Y= 1*(abu>0), XData = env,  XFormula = XFormula, TrFormula = TrFormula, TrData = trait, phyloTree = tree, studyDesign = studyDesign, ranLevels = list("grid.cell"= rL),  distr = "probit")

Next we fit the model as above.

 m = sampleMcmc(m, thin = thin, samples = samples, transient = transient, nChains = nChains)

# compute predicted species occurrence matrix from posterior samples 
predY=computePredictedValues(m)

#evaluate model fit
MF = evaluateModelFit(hM = m, predY = predY)
MF

## $RMSE
##   [1] 0.2043474 0.2112074 0.2452425 0.3058084 0.2307849 0.3485816 0.2276085
##   [8] 0.2664577 0.2199623 0.2859988 0.3718063 0.2314359 0.2754418 0.1829732
##  [15] 0.1117695 0.2274460 0.3024150 0.2876661 0.2177659 0.1902863 0.2194968
##  [22] 0.1349278 0.1103794 0.2041797 0.2620985 0.2736369 0.1643216 0.2375135
##  [29] 0.1119734 0.2920832 0.2595443 0.2685171 0.2100611 0.2605150 0.1702702
##  [36] 0.2864073 0.2233830 0.2460489 0.2368907 0.3380119 0.2811300 0.1741510
##  [43] 0.2302583 0.1583545 0.3596973 0.3233873 0.2793448 0.1780917 0.2581432
##  [50] 0.1778816 0.3800001 0.2473961 0.2974098 0.3329670 0.2583362 0.1822965
##  [57] 0.2383175 0.2752547 0.2543954 0.1325111 0.2346596 0.2164220 0.2649561
##  [64] 0.2766946 0.2061530 0.2200791 0.2679550 0.2447704 0.2416894 0.2825703
##  [71] 0.2272519 0.2339338 0.2523387 0.2338831 0.2363901 0.2265008 0.2880394
##  [78] 0.4123583 0.1896096 0.1966470 0.2626998 0.2666296 0.2586606 0.2789196
##  [85] 0.2918235 0.2798699 0.2651267 0.2742749 0.2031151 0.3345763 0.2587659
##  [92] 0.2986930 0.1827315 0.1912792 0.3201268 0.2072077 0.1754651 0.2724417
##  [99] 0.1722335 0.2541437 0.1881461 0.3214160 0.2202247 0.2494580 0.1653224
## [106] 0.2242608 0.4324346 0.3492411 0.1767657 0.1827091 0.2555083 0.1501612
## [113] 0.2802601 0.2442998 0.2532104 0.1988362 0.2845569 0.2122642 0.1762328
## [120] 0.2956183 0.2106699 0.2638517 0.2324514 0.1995834 0.1204574 0.2286059
## [127] 0.2095511 0.1784304 0.3070114 0.1350802 0.2580376 0.1770515 0.1826851
## [134] 0.1120672 0.2027442 0.2687508 0.2148657 0.2503981 0.1339848 0.2621735
## [141] 0.2830376 0.1895614 0.3055313 0.2290751 0.1843219 0.1699762 0.1875599
## [148] 0.1318489
## 
## $AUC
##   [1] 0.9855643 0.9624724 0.9633987 0.9401018 0.9840217 0.9079710 0.9842262
##   [8] 0.9699015 0.9860160 0.9416462 0.8536232 0.9803783 0.9648426 0.9913511
##  [15] 0.9993197 0.9839074 0.9366899 0.9105373 0.9691120 0.9915179 0.9869702
##  [22] 0.9991810 0.9995610 0.9887574 0.9500000 0.9657612 0.9956083 0.9660929
##  [29] 0.9995069 0.9525307 0.9328774 0.9221477 0.9834152 0.9571846 0.9961240
##  [36] 0.8968391 0.9702204 0.9788173 0.9666145 0.9097479 0.8992921 0.9935255
##  [43] 0.9841737 0.9951691 0.8976731 0.9272183 0.9244217 0.9934118 0.9739423
##  [50] 0.9833244 0.8706525 0.9604863 0.9457594 0.8872093 0.9726661 0.9931694
##  [57] 0.9431579 0.9530572 0.9688073 0.9938424 0.9805855 0.9769444 0.9610909
##  [64] 0.9583708 0.9785965 0.9750000 0.8649381 0.9726284 0.9745617 0.9630641
##  [71] 0.9817876 0.9462074 0.9462677 0.9734889 0.9827781 0.9710145 0.9422637
##  [78] 0.8084746 0.9918699 0.9899972 0.9689755 0.8980868 0.9395109 0.9593364
##  [85] 0.9534017 0.9631505 0.9650231 0.9564004 0.9895050 0.9092437 0.9419540
##  [92] 0.9231283 0.9819820 0.9925750 0.9374825 0.9801309 0.9980023 0.9580339
##  [99] 0.9948552 0.9629630 0.9934063 0.9193548 0.9879518 0.9520115 0.9906602
## [106] 0.9781037 0.7862349 0.8730909 0.9885591 0.9870175 0.9049123 0.9940351
## [113] 0.9564086 0.9739007 0.9535058 0.9933989 0.9544283 0.9871302 0.9915849
## [120] 0.9528139 0.9863910 0.9673950 0.9697222 0.9904735 0.9980481 0.9819668
## [127] 0.9832578 0.9911957 0.8299320 0.9980267 0.9277193 0.9933977 0.9918699
## [134] 0.9996321 0.9803818 0.9662080 0.9747546 0.9577001 0.9987267 0.9445939
## [141] 0.9611814 0.9883629 0.9358844 0.9757394 0.9911406 0.9953623 0.9863946
## [148] 0.9986133
## 
## $TjurR2
##   [1] 0.7427867 0.5028891 0.6037010 0.5659250 0.7146765 0.4921621 0.7239853
##   [8] 0.6830180 0.7644368 0.4751269 0.3286146 0.7256685 0.6468773 0.7029917
##  [15] 0.9102108 0.7762363 0.5322472 0.3527813 0.5125437 0.8061841 0.7468028
##  [22] 0.8424068 0.9176976 0.7976506 0.4868334 0.6797622 0.8408287 0.5283133
##  [29] 0.9065774 0.6091045 0.4922892 0.2966961 0.7104487 0.5441139 0.7764817
##  [36] 0.2830684 0.5236197 0.7016190 0.5466084 0.4316732 0.2211566 0.7832787
##  [43] 0.7467659 0.8592530 0.4630760 0.4678506 0.3745661 0.8305136 0.6802259
##  [50] 0.7137340 0.3866086 0.5422406 0.5495501 0.3555109 0.7061227 0.8377863
##  [57] 0.3834153 0.5243009 0.6821491 0.8312885 0.7089314 0.5623340 0.6227560
##  [64] 0.6342641 0.4902411 0.5707337 0.1859354 0.6272984 0.7235269 0.6703727
##  [71] 0.7690837 0.3332472 0.3438651 0.6991626 0.7277365 0.6099776 0.5149654
##  [78] 0.2461256 0.7689551 0.8120902 0.6694316 0.2250497 0.3389160 0.5869128
##  [85] 0.5851379 0.6569832 0.6748573 0.5967688 0.7729210 0.4519119 0.3835149
##  [92] 0.3970930 0.6253413 0.8150964 0.5616674 0.6696742 0.8185810 0.6731251
##  [99] 0.7843435 0.5449114 0.8212888 0.4441025 0.7667406 0.4430943 0.8329886
## [106] 0.6785970 0.2419629 0.3396439 0.6627763 0.5878529 0.2922326 0.7134422
## [113] 0.5882655 0.7219765 0.6214107 0.7786137 0.5839923 0.7677816 0.7288576
## [120] 0.6008543 0.7557928 0.6130714 0.4826802 0.8047514 0.8983628 0.7785269
## [127] 0.7583392 0.7767699 0.1580521 0.8901874 0.3212448 0.7939858 0.8096178
## [134] 0.7636307 0.7542837 0.6548449 0.6414627 0.5224251 0.8941671 0.4636315
## [141] 0.6572211 0.7452532 0.5236303 0.7085883 0.7859511 0.8475944 0.6198922
## [148] 0.8928947

mean(MF$AUC)

## [1] 0.9613633

mean(MF$TjurR2)

## [1] 0.6226501

#Here, we take the mean of posterior samples for further processing
predY = apply(abind(predY,along=3),c(1,2), mean)

# Plot species specific R2 in relation to prevalence
plot(colSums(((m$Y>0)*1)/m$ny), MF$AUC,main=paste("Mean AUC = ", round(mean(MF$AUC),2),".", sep=""), xlab = "Prevalence")

Let’s jump straight to the model outputs we examined earlier (NOTE: this model did not converge as well and would need to run it longer for more reliable results)

par(mfrow=c(1,1))
# Specify groups of how the variation should be partitioned You can also combine groups 
group=c(1,2,3,4,5,6,7)
# Specify group names m$covNames[-1] gives the included covariate names excluding the intercept
groupnames = m$covNames[-1]
#compte species specific variance partitioning
VP = computeVariancePartitioning(hM = m, group = group, groupnames = groupnames)

plotVariancePartitioning(m, VP, viridis(8))

Note that e.g. SBT seems to contribute more to the explained variation for the occurrences of species (17.4%) compared to the abundances (computed above 10.8%)

# Shown are all species with 90% posterior support for having positive (red) or negative(blue) responses to environmental covariates
beta = getPostEstimate(m, "Beta")
plotBeta(m, beta, supportLevel=.9, spNamesNumbers = c(FALSE, FALSE), covNamesNumbers = c(TRUE, FALSE), plotTree = T)

## [1] 0.3 1.0 0.0 1.0

Also, responses of species to covariates structuring their presence or absence differ to those structuring abundances. See e.g. clear change of species response patterns in Chl and fishing (now positive) compared to the more negative responses for abundances calculated above.

gamma = getPostEstimate(m, "Gamma")
plotGamma(m, gamma, supportLevel=.9)

VP$R2T$Beta

## (Intercept)       Depth         SBT         SBS         Chl     SBT_sea 
##  0.08335066  0.10070599  0.12260729  0.06524582  0.13810405  0.20474257 
##     Chl_sea     Fishing 
##  0.12547593  0.11947270

VP$R2T$Y

## [1] 0.1431769

Here we also see more relationships with high statistical support of traits and environment. However, the patterns we saw earlier, i.e. positive relationships between SBT and Lifespan and Fecundity, are also found here.

We also discover that traits contribute substantially more to the variation of species occurrences than to species abundances (14.3% vs. ~2%).

Locking at VP$R2T$Beta we see that traits explain 20% of the variation among the species in their responses SBT_sea

# Construct environmental Gradient based on fitted model, specify your focal variable. 
Gradient = constructGradient(m, focalVariable="SBT", ngrid = 25)
# make predictions based on fitted model 
predYgradient = predict(m,XData=Gradient$XDataNew, ranLevels = Gradient$rLNew, studyDesign = Gradient$studyDesignNew, expected = TRUE) 

par(mfrow=c(2,4))
# measure = T implies trait relationship to focal variable, index = 2 implies second trait of trait matrix (1 is the intercept). 

# traits ~ SBT 
plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 2, showData =F, showPosteriorSupport=F)

## [1] 0.994

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 3, showData =F, showPosteriorSupport=F)

## [1] 0.539

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 4, showData =F, showPosteriorSupport=F)

## [1] 0.712

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 5, showData =F, showPosteriorSupport=F)

## [1] 0.979

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 6, showData =F, showPosteriorSupport=F)

## [1] 0.001

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 7, showData =F, showPosteriorSupport=F)

## [1] 0.971

plotGradient(m, Gradient, pred=predYgradient, measure="T", 
             index = 8, showData =F, showPosteriorSupport=F)

## [1] 0.923

#in a presence absence probit model, measure S provides the response of species richness against your chosen covariate
plotGradient(m, Gradient, pred=predYgradient, measure="S", 
             index = 1, showData =F, showPosteriorSupport=F)

## [1] 0.998

Introduction to Hierarchical Modeling of Species Communities (HMSC) - trait environment responses

Weigel, B.

7th June 2021