Solved – Do we have to scale new unseen feature data for prediction

In machine learning most algorithms require some kind of scaling to decrease error. This is my code:

# ensure results are repeatable set.seed(7) # load the library library(caret) # load the dataset data(iris) head(iris) X=scale(iris[,-5]) X=data.frame(X) head(X) y=iris[,5] y=data.frame(y) head(y) X=cbind(X,y) # prepare training scheme control <- trainControl(method="repeatedcv", number=5, repeats=1) # train the model    model <- train(y~., data=X, method="svmLinear2", trControl=control, tuneLength=5)   # summarize the model   print(model) #saving model save(model, file="model.Rdata")  #loading model supmod<-load("model.Rdata")  #new data # Sepal.Length Sepal.Width Petal.Length Petal.Width  # 4.2             3.2          1.7         0.23   new<-c(4.2,3.2,1.7,0.23) pre<-predict(supmod,new) #dont know how to predict this model with unseen data 

In the above code I have two question one related to scaling of the new data and other related to coding error passing the new data to the loaded model.

The real iris feature data looks like this

  Sepal.Length Sepal.Width Petal.Length Petal.Width 1          5.1         3.5          1.4         0.2 2          4.9         3.0          1.4         0.2 3          4.7         3.2          1.3         0.2 4          4.6         3.1          1.5         0.2 5          5.0         3.6          1.4         0.2 6          5.4         3.9          1.7         0.4 

But before passing to svm algorithm we have to scale the data and i use scale() to scale data and its look like this.

  Sepal.Length Sepal.Width Petal.Length Petal.Width 1   -0.8976739  1.01560199    -1.335752   -1.311052 2   -1.1392005 -0.13153881    -1.335752   -1.311052 3   -1.3807271  0.32731751    -1.392399   -1.311052 4   -1.5014904  0.09788935    -1.279104   -1.311052 5   -1.0184372  1.24503015    -1.335752   -1.311052 6   -0.5353840  1.93331463    -1.165809   -1.048667 

It is this scaled data that we use for training and testing our model. lets say I have successfully trained the model and use it for prediction of new unseen data (eg this one row).

Sepal.Length Sepal.Width Petal.Length Petal.Width   4.2             3.2          1.7         0.23   
  1. Do I need to scale this new data? or I just have to pass this data directly to my model?
  2. The next question is related to a coding error

predict(supmod,new) returns this error

Error in UseMethod("predict") : no applicable method for 'predict'
applied to an object of class "character"

1) You should scale the new data as well. You can scale all the data, training and new data together, if possible. Or you store the scaling function and apply it later to the new data. If you have data d that is normally distributed with, lets say mean=m and sd=s, you scale the data by: (d-m)/s. Just apply this function to the new data as well, using the same mean and sd.

2) You can't assign the data you load directly.

#loading model supmod<-load("model.Rdata") 

The resulting variable does only contain the string "model".

Try this:


This loads the model, the name of the variable is "model".

3) Futher, you have to pass a data.frame (with the same rownames as the training dataset) to predict:

new <- data.frame(Sepal.Length=4.2, Sepal.Width=3.2, Petal.Length=1.7, Petal.Width=0.23)  pre<-predict(model,new) 

Similar Posts:

Rate this post

Leave a Comment