I have calculated an NMDS from vegetation community data taken from two habitat types using the `metaMDS()`

function in `vegan`

. My goal is to see if habitat type helps explain community composition and it was suggested to me that I could use the site scores of the NMDS axes as response variables in a linear model with habitat type as an explanatory variable.

I have also been told to NEVER do this. It is a bit unclear to me why it is not appropriate however. I know that it is relatively common to include PCA scores/projection as variables in linear models, but I am told that the “arbitrary” nature of NMDS axes maxes this inappropriate and meaningless. An analogy was made to coordinate systems in that it would not make sense to use Northing data without Easting data, but I find this confusing considering that 1 dimensional distances are commonly used as explanatory variables.

I would appreciate any help in distinguishing why it is okay to use principal components as variables, but not NMDS axes.

**Contents**hide

#### Best Answer

As someone who has used that analogy and tells people that it is incorrect or inappropriate to take nMDS "axes" and use them in subsequent analyses as if they were principal components, I should probably address this.

Firstly, the analogy; when I have used this, perhaps I have not been clear enough to specifically mention the issue of *mapping*. Yes, people use easting and northing, or latitude and longitude as explanatory variables in subsequent models, but they do this because these variables *mean something*. By that I mean these variables are related to features of interest, such as climate. A species probably doesn't care nor know what latitude it's supposed to live at, but where that species is found is determined by many things, some of which are related in some way to the latitudes of the locations where that species is observed to inhabit.

What I have always meant to express with the analogy is that nMDS is also a mapping of the original dissimilarities into a (usually) 2D space. If you were to map locations of towns or cities in Europe say, you'd do a poor job if you only used one of the X or Y spatial coordinates. Put another way, seeing as these locations are already mapped, you'd do a pretty poor job of explaining or summarizing that map if you only used one of the X or Y spatial coordinates. You need both of these coordinates to do a reasonable job of preserving the distances between the locations.

Even that analogy is incomplete however. At least there, an easting and northing or a latitude and longitude describe locations of places in a system where the two axes of the space relate physically to something. These are not arbitrary coordinate systems at the scales at which they are used. It's not like latitude actually measures distances from a line passing through the earth at some randomly determined angle to the equator.

In an nMDS, the mapping you get is just that, a mapping. But there the "axes" are entirely arbitrary; you can rotate and reflect the resultant mapping an infinite number of ways and you will never change the information contained in the mapping or configuration of points. To me, it seems quite strange to think that including an arbitrary "axis" like this in a model makes any kind of intuitive sense. Why use the horizontal nMDS "axis" when a line at 7 degrees to the horizontal might better explain the spread of samples in the 2D space? Or 37 degrees, or 12? nMDS is trying to find a low-D mapping of the data that only tries to preserve the rank order of the original dissimilarities between sites. It is a step removed from the original data. PCA on the other hand is directly decomposing the data so that the largest directions of variation in the original data are represented using only the first few PCs.

Remember also that this is a mapping that aims to preserve in $k$ dimensions the *rank order* of the dissimilarities between sites. If you ask for a 2D mapping, you need both coordinates to preserve the resultant mapping. Neither of the horizontal or vertical "axes" correspond to the mapping you'd get if you asked for a 1D solution or a 3D solution. If you want an nMDS mapping you need to use the full solution. You can't pick and choose.

Why doing what you describe might work quite well in practice (not that that is a good reason to do anything) is because of tricks that are often employed in software that implements nMDS. In the **vegan** package for R, Jari Oksanen implemented some best-practice recommendations from the ecology literature. One of those involves rotating the resultant nMDS mapping using principal components so that we get a nicer map; what this does it set the direction of greatest spread in the locations of samples in the original nMDS mapping as the new horizontal axis (NMDS1), and the next largest direction of spread (subject to that direction being orthogonal to the first) is used as the new vertical axis of the plot (NMDS2). Now, if there are strong gradients in your species data, these should be evident in the dissimilarities and evident in the nMDS mapping, and because we just rotated the mapping using principal components, it is not surprising that NMDS1 can also be interpreted in terms of the environment or as representing some component of *species compositional change* for use in a subsequent model. So taking NMDS1 *only* works as well as it does in practice because we made the arbitrary choice to rotate the original nMDS solution using PCA.

Finally, a comment on the use of PCA axes. Often people just take PC1. Or perhaps they take PCs 1-3 because some test or other suggested that they were the "significant" axes. What they are doing is taking the coordinates of a mapping of the data into 1 or 3 dimensions and then using all of the dimensions that represent structure in the original data. That some people sometimes take PC2 only, reflects the fact that the individual PCs are orthogonal to one another — PC2 explains a different part of the variation in the original data set (when decomposed via linear combinations of the original variables). nMDS solutions are not necessarily (and are unlikely to be) orthogonal *in terms of the original data* (the fact that in some software we do rotation so that in the rotated coordinate system the new axes are orthogonal to one another, this is not with respect to the original data).

in short nMDS "axes" and PCA axes are two very different things and shouldn't be thought of as being the same kind of thing.

This is not to say that you can't take the nMDS mapping and use it in a model; we already do that in two different ways in **vegan** for example. In `envfit()`

we fit the linear model

$$y = beta_1 X_1 + beta_2 X_2 + varepsilon$$

where $y$ is the thing we're interested in and $X_1$ and $X_2$ are the coordinates on NMDS1 and NMDS2 respectively. This fits a plane through the mapping in terms of these two coordinates. You could extend this to an interaction term too, although because of the PCA rotation we do with `metaMDS()`

, this is probably not that useful if you're assuming a linear model. I also suspect that this is going to give you a terrible fit if the truth is not linear and I think we would have good reason to think the underlying relationships are nonlinear because the nMDS only aims to preserve the rank order of the original dissimilarities.

We also fit a nonlinear model of a similar nature using `ordisurf()`

, which fits the GAM

$$y = f(X_1, X_2) + varepsilon$$

with the same meaning for $X_1$ and $X_2$. Now the model describes a smooth surface.

If you were to include something like the latter in a subsequent model, then I would argue that this usage is perfectly acceptable. What I disagree with is why it makes any sense whatsoever to pick and choose from the resulting nMDS "axes" to use one of them in a model. What the GAM/`ordisurf()`

method is doing is determining if there is any relationship between the *spatial configuration* of the samples in nMDS space and the response. As we're using the configuration as a whole, we're respecting the way that configuration was determined and the special nature of the rank-order preserving behaviour of Euclidean distances among the samples in the configuration.