Solved – Training in one step vs multiple steps

I'm about 5 minutes in to learning about machine learning (using the Tensorflow MNIST tutorial) and have already managed to confuse myself. No big surprise there. But Google isn't giving me any good answers, so I was hoping someone here could.

Why does training this example model in multiple smaller batches produce such better results than training the model in one batch?

And if thats always going to be the case, is there a rule of thumb for training batch size and number of batches? It appeared that in the example 1000 and 100 were just chosen arbitrarily.

For example, here is the tutorial file.

I replaced:

# Train for _ in range(1000):     batch_xs, batch_ys = mnist.train.next_batch(100)     sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) 

With a single training step on the whole data:

sess.run(train_step, feed_dict={x: mnist.test.images,                                                              y_: mnist.test.labels})) 

And the accuracy dropped from 0.91 to 0.66.

Training a neural net consists of finding the parameters that minimize a cost function. This is an iterative process. At each step, the parameters are updated using data in the current batch, which may consist of the entire dataset (batch training) or a subset (minibatch training). Each step gives a small reduction in the cost function, so multiple steps are needed (whether using batch or minibatch training). The reason your modification to the code doesn't perform well is that you're using only a single step, so the parameters are only updated once. This is not enough to learn a good model; instead, multiple sweeps through the data are necessary.

Your example consists of fitting a logistic regression model. The cost function in this case is convex. So, if you train til convergence–that is, the point where the parameters stop changing because no further improvements are possible–all consistent training procedures should find the same parameters and therefore have the same accuracy. In general, this is not always true.

is there a rule of thumb for training batch size and number of batches? It appeared that in the example 1000 and 100 were just chosen arbitrarily.

Batch size and number of iterations can affect convergence speed, and even produce different solutions with different generalization performance. Number of iterations can be chosen by checking for convergence (as above). A better strategy is to use early stopping, which chooses the number of iterations to maximize performance on held-out data. Batch size can also be tuned this way, but it's common in practice to use arbitrary values that are known to work well (e.g. 10 or 100). To learn about the effect of batch size, try searching for something like 'batch vs. stochastic gradient descent' and 'minibatch size'.

I'm about 5 minutes in to learning about machine learning (using the Tensorflow MNIST tutorial)

I would recommend against using the TensorFlow tutorial to familiarize yourself with machine learning. It's great if you're already know the concepts and want to learn how to implement them using TensorFlow. Otherwise, it's best to learn the concepts and math first, and you can get a much better understanding by working with courses/textbooks/tutorials that are dedicated to these topics.

Similar Posts:

Rate this post

Leave a Comment