I am an ecologist using mainly vegan R.
I have 2 matrices (number x x x) (see data below):
matrix 1 / nrow = 6replicates * 24sites, ncol = 15 species (fish) matrix 2 / nrow = 3replicates * 24sites, ncol = 15 species (fish)
In both matrices, the sites are the same. I want to get a general resemblance to my brother (taking into account both matrices) among pairs of sites. I see 2 options:
option 1, averaging over the replicas (at the scale of the plot) of fish and the abundance of macrovertebrates, cbind two average abundance matrices (nrow = 24sites, ncol = 15 + 10 average abundances) and the calculation of bray-curtis.
option 2 for each assembly, calculating the bray-curtis dissimilarity between pairs of nodes, calculating the distances between the centroids of sites. Then we summarize the two distance matrices.
If I do not understand, I did these 2 operations in the R codes below.
Please could you tell me if option 2 is correct and more suitable than option 1.
thank you in advance.
Pierre
here below R code examples
data generation
library(plyr);library(vegan) #assemblage 1: 15 fish species, 6 replicates per site a1.env=data.frame( Habitat=paste("H",gl(2,12*6),sep=""), Site=paste("S",gl(24,6),sep=""), Replicate=rep(paste("R",1:6,sep=""),24)) summary(a1.env) a1.bio=as.data.frame(replicate(15,rpois(144,sample(1:10,1)))) names(a1.bio)=paste("F",1:15,sep="") a1.bio[1:72,]=2*a1.bio[1:72,] #assemblage 2: 10 taxa of macro-invertebrates, 3 replicates per site a2.env=a1.env[a1.env$Replicate%in%c("R1","R2","R3"),] summary(a2.env) a2.bio=as.data.frame(replicate(10,rpois(72,sample(10:100,1)))) names(a2.bio)=paste("I",1:10,sep="") a2.bio[1:36,]=0.5*a2.bio[1:36,] #environmental data at the sit scale env=unique(a1.env[,c("Habitat","Site")]) env=env[order(env$Site),]
OPTION 1, averaging abundance and cbind
a1.bio.mean=ddply(cbind(a1.bio,a1.env),.(Habitat,Site),numcolwise(mean)) a1.bio.mean=a1.bio.mean[order(a1.bio.mean$Site),] a2.bio.mean=ddply(cbind(a2.bio,a2.env),.(Habitat,Site),numcolwise(mean)) a2.bio.mean=a2.bio.mean[order(a2.bio.mean$Site),] bio.mean=cbind(a1.bio.mean[,-c(1:2)],a2.bio.mean[,-c(1:2)]) dist.mean=vegdist(sqrt(bio.mean),"bray")
OPTION 2, calculating for each assembly distance between centroids and summing the matrix of 2 distances
a1.dist=vegdist(sqrt(a1.bio),"bray") a1.coord.centroid=betadisper(a1.dist,a1.env$Site)$centroids a1.dist.centroid=vegdist(a1.coord.centroid,"eucl") a2.dist=vegdist(sqrt(a2.bio),"bray") a2.coord.centroid=betadisper(a2.dist,a2.env$Site)$centroids a2.dist.centroid=vegdist(a2.coord.centroid,"eucl")
summing two distance matrices using a Gavin Simpson fuse ()
dist.centroid=fuse(a1.dist.centroid,a2.dist.centroid,weights=c(15/25,10/25))
summation of two Euclidean distance matrices (due to correction by Jari Oksanen)
dist.centroid=sqrt(a1.dist.centroid^2 + a2.dist.centroid^2)
and "coord.centroid" below for further remote analysis (is this correct?)
coord.centroid=cmdscale(dist.centroid,k=23,add=TRUE)
COMPARISON OF OPTION 1 AND 2
pco.mean=cmdscale(vegdist(sqrt(bio.mean),"bray")) pco.centroid=cmdscale(dist.centroid) comparison=procrustes(pco.centroid,pco.mean) protest(pco.centroid,pco.mean)