Solved – Emphasize a link between two predictor variables (Machine Learning)

I am creating a machine learning application which will utilize logistic regression (though I haven't ruled out bayesian regression). I have multiple predictor variables that I believe to be non-orthogonal. But let me emphasize here that I believe my question is intrinsically different from those asked here regarding collinearity, as I am not too concerned about collinearity skewing my results (should I be?)

More what I am concerned about is whether logistic regression is powerful enough to take into account that there is some relationship between different features/dimensions. For instance, in text classification the word "base" would have a dramatically different impact when seen with the word "acid" (chemistry context) than with the word "structure" (engineering).

Likewise, a measurement of an ambient temperature of 90degrees fahrenheit in Houston, TX wouldn't be statistically significant unless that temperature was registered mid January (don't worry our climate isn't that screwed up yet!).

Whether, it is text classification or some other classification, are there any methods for aiding the model in determining when two features/dimensions are related?

EDIT:

I am currently reading up on something called n-gram, which looks promising for text classifications, but is there something similar to this for classification regarding continuous values or constant values?

Thanks

I think what you need is interaction terms. For example, in your example about Houston, TX, temperature and date are interacting. We can only say that there is something wrong with the climate only if both temperature is above a threshold and it is mid-January. A model that has temperature and date as separate inputs will not be able to discover that interaction because the effect of these predictors on the outcome are not independent. However, if you add an interaction term (e.g., temperature*date), the regression model can capture that dependence. The Wikipedia page on interaction might be useful: http://en.wikipedia.org/wiki/Interaction_(statistics)

Similar Posts:

Rate this post

Leave a Comment