I have a training set where the inputs & outputs are all present, but I suspect that in the data where I want to do prediction, I will occasionally encounter scenarios where a small fraction of the input features are missing. Are there any machine learning methods that, once learning is complete, can provide reasonable prediction amidst missing inputs like this? If it matters, I'm looking for real-valued predictions (ideally multivariate, as I have 2 outputs to predict per input set).
Best Answer
Substituting by the mean value is problematic and can lead to poor results. A principled way to tackle this problem is described in this paper. The idea is to formulate the problem in a probabilistic model which allows treating the missing components as hidden variables, and use the EM algorithm to estimate them. The paper also explains why is not recommendable to use the mean value.
If your model is a graphical model, then you can just integrate over the missing components. This gives you the most likely output compatible with the values of the observed components, averaged over all possible combinations of the missing values.
Similar Posts:
- Solved – Training for Regression with Multiple Outputs per Input Data
- Solved – What does alignment between input and output mean for recurrent neural network
- Solved – Prediction vs. Classification in neural networks
- Solved – How to perform deep Q-learning batch update step on a neural network with multiple outputs
- Solved – Simple Neural Network for time series prediction