Han et al. (2015) used a method of iterative pruning to reduce their network to only 10% of its original size with no loss of accuracy by removing weights with very low values, since these changed very little. As someone new to the machine learning area, why wouldn't you do this (unless your network is already very small)? It seems to me that for deep learning your network would be smaller, faster, more energy efficient, etc. at no real cost. Should we all use this method for larger neural networks?
Pruning is indeed remarkably effective and I think it is pretty commonly used on networks which are "deployed" for use after training.
The catch about pruning is that you can only increase efficiency, speed, etc. after training is done. You still have to train with the full size network. Most computation time throughout the lifetime of a model's development and deployment is spent during development: training networks, playing with model architectures, tweaking parameters, etc. You might train a network several hundred times before you settle on the final model. Reducing computation of the deployed network is a drop in the bucket compared to this.
Among ML researchers, we're mainly trying to improve training techniques for DNN's. We usually aren't concerned with deployment, so pruning isn't used there.
There is some research on utilizing pruning techniques to speed up network training, but not much progress has been made. See, for example, my own paper from 2018 which experimented with training on pruned and other structurally sparse NN architectures: https://arxiv.org/abs/1810.00299