Solved – SVM Prediction with instances with missing features

Say I am working on a binary classification problem and that I have a feature matrix $X$ where some entries are missing (NaN). The rest of the entries in $X$ are real numbers.

How can I apply SVMs on this data?

This largely depends on the nature of your data. In an ideal case, a domain expert could specify the missing value. When no prior knowledge about your data exists, the following procedures are commonly done:

  • Replace the missing value with the mean (for continuous values) or the median (for nominal values) of that feature.

    or

  • Take the instance with missing value(s) as a query and search for the $K$ closest instances to it in the data using all features with known values. The missing feature(s) is (are) then set to a value based on the instance's nearest neighbors. This is a generalized version of the procedure above, where $K$ is set to the largest possible value and the aggregation is mean or median. In this procedure, one needs to specify a proper distance function on the feature space. Euclidean distance is often used.

Similar Posts:

Rate this post

Leave a Comment