I'm currently trying to predict the probability for low probability events (~1%).
I have large DB with ~200,000 vectors (~2000 plus examples) with ~200 features.
I'm trying to find the the best features for my problem. What are the recommended method? (preferred in Python or R, but not necessarily)
Best Answer
My first advice would be that unless identifying the informative features is a goal of the analysis, don't bother with feature selection and just use a regularised model, such a penalised logistic regression, ridge regression or SVM, and let the regularisation handle the over-fitting. It is often said that feature selection improves classifier performance, but it isn't always true.
To deal with the class imbalance problem, give different weights to the patterns from each class in calculating the loss function used to fit the model. Choose the ratio of weights by cross-validation (for a probabilistic classifier you can work out the asymptically optimal weights, but it generally won't give optimal results on a finite sample). If you are using a classifier that can't give different weights to each class, then sub-sample the majority class instead, where again the ratio of positive and negative patterns is determined by cross-validation (make sure the test partition in each fold of the cross-validation procedure has the same relative class frequencies you expect to see in operation).
Lastly, it is often the case in practical application with a class imbalance that false-positives and false-negatives are not of equal seriousness, so incorporate this into the construction of the classifier.