Solved – PCA/factor analysis of mixed (quantitative + qualitative) data: inconsistent results

I have a dataset composed of 4 variables, 2 being numerical and 2 categorical (ordinal in fact). They all represent 4 types of indicators/measures of the same phenomenon .
I want to analyse them in a multivariate way. I tried to apply first a PCA on the 4 variables (forcing the ordinal into numerical which is sometimes suggested), i get this graph:

enter image description here

then i tried to do a FAMD (factor analysis of mixed data) which was recommended with the factominer package.Unfortunately there is not a lot of documentation about it.
This is the output:
enter image description here

And the other output (observations + levels):
enter image description here

my problem: the FAMD variable graph gives completely different results. It seems from the data (and the PCA) that quanti2 and quali2 should be closely related, but that's not what shows the famd's variable plot. Why so?

Moreover, on the second FAMD graph (observations), i get this "V" shape. How can i interpret it and draw conclusions about the relationship between this 4 indicators?

And of course, if you have a more clever way to analyse this dataset, please explain it!

I dput my data here:

    data <- structure(list(quanti1 = c(0.57, 0.56, 0.46, 0.63, 0.71, 0.66,                                         0.48, 0.39, 0.57, 0.78, 0.67, 0.63, 0.55, 0.62, 0.66, 0.5, 0.5,                                         0.41, 0.5, 0.46, 0.53, 0.59, 0.58, 0.66, 0.62, 0.65, 0.58, 0.62,                                         0.66, 0.67, 0.66, 0.59, 0.41, 0.57, 0.6, 0.42, 0.48, 0.44, 0.47     ), quanti2 = c(3.01, 2.71, 2.51, 5.26, 5.36, 2.66, 3.01, 5.31,                     4.71, 5.76, 7.01, 5.96, 4.01, 2.86, 5.26, 3.26, 4.51, 3.41, 2.61,                     3.66, 3.01, 3.76, 4.26, 4.01, 4.76, 4.66, 2.76, 3.96, 5.01, 6.16,                     7.86, 5.96, 2.51, 3.21, 5.51, 4.41, 4.01, 2.21, 2.51), quali1 = structure(c(3L,                                                                                                 2L, 1L, 4L, 4L, 3L, 2L, 3L, 3L, 4L, 4L, 4L, 3L, 2L, 4L, 3L, 3L,                                                                                                 2L, 1L, 3L, 2L, 4L, 3L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 1L,                                                                                                 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("1", "2", "3", "4"), class = "factor"),      quali2 = structure(c(4L, 3L, 3L, 7L, 4L, 4L, 2L, 4L, 5L,                           7L, 7L, 6L, 6L, 4L, 7L, 4L, 5L, 3L, 2L, 5L, 4L, 5L, 5L, 5L,                           7L, 6L, 2L, 5L, 5L, 7L, 7L, 7L, 2L, 5L, 6L, 4L, 4L, 1L, 5L     ), .Label = c("1", "2", "3", "4", "5", "6", "7"), class = "factor")), .Names = c("quanti1",                                                                                       "quanti2", "quali1", "quali2"), row.names = c(NA, -39L), class = "data.frame")     library(FactoMineR); library(dplyr) lapply(data, as.numeric) %>% as.data.frame %>% PCA     FAMD(data) 

It is a very small sample for factor analytic procedures, but besides that I used the data to run a unidimensional factor model with lavaan. Here is the code

# load package library(lavaan)  # lavaan requires the factors to be ordered data$quali1 <- factor(data$quali1,ordered=T) data$quali2 <- factor(data$quali2,ordered=T)      # fit a unidimensional model     mod1 <- 'f =~ quanti1 + quanti2 + quali1 + quali2'     # estimate model parameters     mod1.cfa <- cfa(mod=mod1, data=data, ordered=3:4, std.lv=T)     # assess model fit     fitMeasures(mod1.cfa)     # show results     summary(mod1.cfa,stan=T) 

The results are clearly in favour of a unidimensional model!

But recommendations for sample size for factor analysis from statistics textbooks normally suggest at least n = 50 subjects and only if a clear factor structure is given and the model is not too complex!

Similar Posts:

Rate this post

Leave a Comment