# Solved – Distances for binary and non binary categorical data

I am computing a matrix of distances for categorical data. I am using the Jaccard distance since as far as I understood it should be working properly with this kind of data. I have BOTH binary and non-binary.

My question is: can I use the Jaccard method to compute distances for data including BOTH binary and non-binary variables (as in `Mydata` in the example below) WITHOUT transforming the non-binary in binary? If the answer is not, is there an alternative way or I have to transform every attribute in a (0,1) variable? A Jaccard code in `R` (function `vegdist` in package `vegan`) provides me results but I am not able to reproduce them if I include both the binary and non binary attributes.

I provide an example of the data I have

``a <- c(1,1,0,0) b <- c(0,1,0,1) c <- c(3,2,1,0) Mydata <- as.data.frame(cbind(a,b,c))  >Mydata  1 0 3  1 1 2  0 0 1  0 1 0 ``

where the attribute `c` is the non-binary, with possible values within (0,4). The `R` function provides me the following distance matrix for `Mydata` but I am not able to reproduce it manually. For instance, the first element `0.40` is the distance between
observation 1 and 2 along the 3 attributes)

``     1    2    3   2 0.40             3 0.75 0.75        4 1.00 0.75 1.00 ``
Contents

If you are willing to treat c as a continuous variable, you can use Gower's dissimilarity coefficient on a mixture of binary and continuous data. This can sometimes be done with ordered categorical variables with no ill effects.

For your toy data, this would look like:

``           obs1       obs2       obs3       obs4 obs1          0 obs2  .44444444          0 obs3  .55555556  .77777778          0 obs4          1  .55555556  .44444444          0 ``

Rate this post