In machine learning most algorithms require some kind of scaling to decrease error. This is my code:
# ensure results are repeatable set.seed(7) # load the library library(caret) # load the dataset data(iris) head(iris) X=scale(iris[,-5]) X=data.frame(X) head(X) y=iris[,5] y=data.frame(y) head(y) X=cbind(X,y) # prepare training scheme control <- trainControl(method="repeatedcv", number=5, repeats=1) # train the model model <- train(y~., data=X, method="svmLinear2", trControl=control, tuneLength=5) # summarize the model print(model) #saving model save(model, file="model.Rdata") #loading model supmod<-load("model.Rdata") #new data # Sepal.Length Sepal.Width Petal.Length Petal.Width # 4.2 3.2 1.7 0.23 new<-c(4.2,3.2,1.7,0.23) pre<-predict(supmod,new) #dont know how to predict this model with unseen data
In the above code I have two question one related to scaling of the new data and other related to coding error passing the new data to the loaded model.
The real iris feature data looks like this
Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4
But before passing to svm algorithm we have to scale the data and i use scale() to scale data and its look like this.
Sepal.Length Sepal.Width Petal.Length Petal.Width 1 -0.8976739 1.01560199 -1.335752 -1.311052 2 -1.1392005 -0.13153881 -1.335752 -1.311052 3 -1.3807271 0.32731751 -1.392399 -1.311052 4 -1.5014904 0.09788935 -1.279104 -1.311052 5 -1.0184372 1.24503015 -1.335752 -1.311052 6 -0.5353840 1.93331463 -1.165809 -1.048667
It is this scaled data that we use for training and testing our model. lets say I have successfully trained the model and use it for prediction of new unseen data (eg this one row).
Sepal.Length Sepal.Width Petal.Length Petal.Width 4.2 3.2 1.7 0.23
- Do I need to scale this new data? or I just have to pass this data directly to my model?
- The next question is related to a coding error
predict(supmod,new)
returns this error
Error in UseMethod("predict") : no applicable method for 'predict'
applied to an object of class "character"
Best Answer
1) You should scale the new data as well. You can scale all the data, training and new data together, if possible. Or you store the scaling function and apply it later to the new data. If you have data d that is normally distributed with, lets say mean=m and sd=s, you scale the data by: (d-m)/s. Just apply this function to the new data as well, using the same mean and sd.
2) You can't assign the data you load directly.
#loading model supmod<-load("model.Rdata")
The resulting variable does only contain the string "model".
Try this:
load("model.Rdata")
This loads the model, the name of the variable is "model".
3) Futher, you have to pass a data.frame (with the same rownames as the training dataset) to predict:
new <- data.frame(Sepal.Length=4.2, Sepal.Width=3.2, Petal.Length=1.7, Petal.Width=0.23) pre<-predict(model,new)
Similar Posts:
- Solved – Estimate specific y value in linear multiple regression using R
- Solved – Relation between R2 and the covariate correlation matrix (multidimensional case)
- Solved – Density plot of parameter estimates from linear regression model
- Solved – P values of coefficients in rlm robust regression
- Solved – Permutation tests in R for correlations