Solved – LSTM network window size selection and effect

When working with an LSTM network in Keras. The first layer has the input_shape parameter show below. model.add(LSTM(50, input_shape=(window_size, num_features), return_sequences=True)) I don't quite follow the window size parameter and the effect it will have on the model. As far as I understand, to make a decision the network not only makes use of current … Read more

Solved – Intuitive Understanding of Expected Improvement for Gaussian Process

So I am learning Bayesian Optimization and came across expected improvement. My question is are we searching for the point in the Gaussian Process model whose expected value (determined by mean and confidence) shall be decreased the most if sampled at that point? So is the starting criteria is to take the lowest point in … Read more

Solved – How to handle high dimensional feature vector in probability graph model

I was doing some NLP related stuff which involves training a hidden Markov model, and use the model to segment sentences. For every sentence, I translate the tokens into feature vectors. The features are manually picked by me, and I can only think of 20 features temporarily. All of the features are binary. So an … Read more

Solved – Increasing the sample size does not help the classification performance

I am training a SVM classifier based on a given document collections. I started from using 500 documents for training, then I add another 500 for training, and so on. In other words, I have three training sets, 500, 1000, 1500. And the smaller training set is a subset of the sequential larger set. I … Read more

Solved – Model that optimizes mean absolute error always gives same prediction

My gradient boosting regression model (GBM) is trained to minimize mean absolute error (MAE) but gives the same prediction for every record on my highly skewed dataset. I believe there is a quick fix to the immediate problem (use RMSE) but my situation is complicated, and I worry that using RMSE will lead to a … Read more

Solved – Softmax with log-likelihood cost

I am working on my understanding of neural networks using Michael Nielsen's "Neural networks and deep learning." Now in the third chapter, I am trying to develop an intuition of how softmax works together with a log-likelihood cost function. Nielsen defines the log-likelihood cost associated with a training input (eq. 80) as $$C equiv … Read more

Solved – “mixture” in a gaussian mixture model

We often study Gaussian Mixture model as a useful model in machine learning and its applications. What is the physical significance of this "Mixture"? Is it used because a Gaussian Mixture Model models the probability of a number of random variables each with its own value of mean? If not, then what is the correct … Read more

Solved – Random Forest and Decision Tree Algorithm

A random forest is a collection of decision trees following the bagging concept. When we move from one decision tree to the next decision tree then how does the information learned by last decision tree move forward to the next? Because, as per my understanding, there is nothing like a trained model which gets created … Read more

Solved – Normalizing SVM predications to [0,1]

I have trained an linear SVM which takes a pair of objects, computes features and is expected to learn a semantic similarity function between objects(we can say that it predicts whether the two objects are similar enough that they should be merged or not). The problem I am facing is that the predictions can be … Read more

Solved – Can someone please explain to me what the particular scenarios mean

"The set of points in $mathbb{R}^2$ classified ORANGE corresponds to {$x:x^Tβ>0.5$}, indicated in Figure 2.1, and the two predicted classes are separated by the decision boundary {$x:x^Tβ=0.5$}, which is linear in this case. We see that for these data there are several misclassifications on both sides of the decision boundary. Perhaps our linear model is … Read more