Say I am working on a **binary classification problem** and that I have a feature matrix $X$ where some entries are **missing** (`NaN`

). The rest of the entries in $X$ are real numbers.

How can I apply **SVMs** on this data?

**Contents**hide

#### Best Answer

This largely depends on the nature of your data. In an ideal case, a domain expert could specify the missing value. When no prior knowledge about your data exists, the following procedures are commonly done:

Replace the missing value with the mean (for continuous values) or the median (for nominal values) of that feature.

or

Take the instance with missing value(s) as a query and search for the $K$ closest instances to it in the data using all features with known values. The missing feature(s) is (are) then set to a value based on the instance's nearest neighbors. This is a generalized version of the procedure above, where $K$ is set to the largest possible value and the aggregation is mean or median. In this procedure, one needs to specify a proper distance function on the feature space. Euclidean distance is often used.

### Similar Posts:

- Solved – Binary classification when many binary features are missing
- Solved – Mahalanobis distance in a LDA classifier
- Solved – probablistic output for binary SVM classification
- Solved – Is it better to replace missing values by mean or mean by class
- Solved – Distances for binary and non binary categorical data