Solved – How to run a logistic regression with multiple related dependent variables

Background
In a small online study I asked people (n=100) which products they would purchase. The choice set contained 20 products for which they had to indicate whether they would buy a product or not (= DV, yes/no format). Respondents could indicate yes for any number of products.
The IVs were product and presentation characteristics.

Data format
resp_id (in rows), buy_prod_1, buy_prod_2 … and IVs in columns

Previous analyses
I did a “simple” logistic regression in Stata. For this I reshaped my data so that there is one line for each respondent and each product (100*20 lines, buy_prod and IVs in columns). In the logistic regression buy_prod was my DV.

Problem
I do not know how to integrate the fact that the product choices for each respondent are not independent, i.e. how to combine the different DVs in the right way in one model.

It sounds like you have created a Binary Discrete Choice Experiment. The way that you have combined your 20 products is often called "stacking" and most software for analyzing choice models assumes your data is in such a format. There are two common ways of modeling the non-independence of the choices:

  • Random parameters logit model, estimating a multivariate normal or some other distribution for the parameters (also sometimes referred to as "mixed logit").
  • Latent class logit, estimating a series of segments, each with a separate set of parameters.

I believe that both of these models can be estimated in Stata.

Train's Discrete Choice Methods with Simulation is the bible in this particular field.

Similar Posts:

Rate this post

Leave a Comment