I am computing a matrix of distances for categorical data. I am using the Jaccard distance since as far as I understood it should be working properly with this kind of data. I have BOTH binary and non-binary.

My question is: can I use the Jaccard method to compute distances for data including BOTH binary and non-binary variables (as in `Mydata`

in the example below) WITHOUT transforming the non-binary in binary? If the answer is not, is there an alternative way or I have to transform every attribute in a (0,1) variable? A Jaccard code in `R`

(function `vegdist`

in package `vegan`

) provides me results but I am not able to reproduce them if I include both the binary and non binary attributes.

I provide an example of the data I have

`a <- c(1,1,0,0) b <- c(0,1,0,1) c <- c(3,2,1,0) Mydata <- as.data.frame(cbind(a,b,c)) >Mydata 1 0 3 1 1 2 0 0 1 0 1 0 `

where the attribute `c`

is the non-binary, with possible values within (0,4). The `R`

function provides me the following distance matrix for `Mydata`

but I am not able to reproduce it manually. For instance, the first element `0.40`

is the distance between

observation 1 and 2 along the 3 attributes)

` 1 2 3 2 0.40 3 0.75 0.75 4 1.00 0.75 1.00 `

**Contents**hide

#### Best Answer

If you are willing to treat c as a continuous variable, you can use Gower's dissimilarity coefficient on a mixture of binary and continuous data. This can sometimes be done with ordered categorical variables with no ill effects.

For your toy data, this would look like:

` obs1 obs2 obs3 obs4 obs1 0 obs2 .44444444 0 obs3 .55555556 .77777778 0 obs4 1 .55555556 .44444444 0 `

### Similar Posts:

- Solved – Distances for binary and non binary categorical data
- Solved – Distances for binary and non binary categorical data
- Solved – use Manhattan distance on binary data for hierarchical clustering
- Solved – use Manhattan distance on binary data for hierarchical clustering
- Solved – NMDS from Jaccard and Bray-Curtis identical. Is that a bad thing