I have made a one-class SVM model using the Linear kernel.
I trained my model with 100000 positive examples, each one having a vector of 59 dimensions.
The examples come from a public dataset, which I consider it to be high quality for my purpose.
I chose to set the value of nu parameter to 0.01 and the prediction results I am getting are not that bad. I was thinking whether my nu value is set correctly or not, so I started looking for advice on setting the nu parameter for one-class svm with linear kernel, but couldn't find any resources specific to that.
I am aware that:
The parameter nu is an upper bound on the fraction of margin errors
and a lower bound of the fraction of support vectors relative to the
total number of training examples
however, I am not sure how should I apply the above definition for one-class linear kernel SVM
Any advice/resources on that would be really helpful. Thanks in advance.
One class SVM are hard!!
1) First, in general, one class SVM is a unsupervised learning technique. So there is no correct answer, like there is no correct answer for the number of clusters in k-means. Like for k-means, there may be some metric that evaluates the quality of the solution but they are all heuristic and therefore there are many of them (http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation lists some of the cluster metrics implemented in sklearn). But unfortunately I do not know any quality metrics for 1-class. Hopefully someone in CV can answer that.
Nor can you use cross validation to select the hyperparameter because again you have no correct solution to measure some form of accuracy.
So unfortunately you have to set some possible values of the nu hyperparmeter and verify if the solution "makes sense" to your problem. As far as I know there is no simpler heuristics to solve this problem.
Of course, if you have the correct class for the data, that is, you know which data is normal and which is not-normal, them you can use cross validation to select the hyperparameters. In this case, your problem is really a classification problem, and you are using a 1-class as a classifier. Almost always this is not a good idea – if you have a classification problem use a classification algorithm not an unsupervised algorithm!!
2) I do not think a Linear Kernel 1-class makes a lot of sense. What 1-cass does is to find at most nu of your data to be considered as non-normal – let us call it the positive class, and the rest will be the negative class. And solve the usual SVM optimization to find the location of the separating hyperplane. In the linear kernel, the hyperplane will be a plane – that is an odd sentence! So what you get at the end is that half of the space will be called negative (the normal) and the other half positive.
Usually what one wants is to create some curved closed surface that contains the negative, and all of the "outside" will be positive. The normal data is contained in the curved closed surface (or surfaces). This can only be accomplished with a non-linear kernel – certainly the RBF kernel will do it (I dont know about the polynomial kernel).
But using a RBF kernel increases your problem because now you have two hyperparameters nu and gamma and no way to select them.
3) There is a technique called SVDD (support vector data description) that tries to create hypersheres around your negative data (which can be deformed by using kernles) but at least on the linear case, you only have one hyperparameter – the nu – but your negative space will be contained inside the sphere, and not be a half -space as in the case of the 1-class linear SVM. I have never used SVDD – I no experience in it.
And that is why 1-class SVM are hard. Sorry not much help there.