Solved – Garson’s algorithm for fully connected LSTMs

Garson proposed an algorithm, later modified by Goh (1995) for determining the relative importance of an input node to a network. In the case of a single layer of hidden units, the equation is

$$ Q_{ik} = frac{ sum_{j=1}^L | w_{ij} v_{jk} | / sum_{r=1}^N | w_{rj}|}{sum_{i=1}^N sum_{j=1}^Lbig(|w_{ij}v_{jk}| / sum_{r=1}^N|w_{rj}|big)} $$

where $w_{ij}$ is the weight between the $i$th input and the $j$th hidden unit, and $v_{jk}$ is the weight between the $j$th hidden unit and the $k$th output.

I am interested in the case where the neural network is fully connected and has a single output. In this case, the only difference between the $Q_i$s for each input $i$ is the $sum_{j=1}^L |w_{ij}|$, and so if we only care about the relative importance between the inputs, we can define
$$ Q_{ik} = sum_{j=1}^L |w_{ij}|.$$
That is, the only thing that matters are the inputs weights leaving that hidden unit, even if this is generalized to a multi-hidden layer neural network.

I was wondering if the same would hold if the hidden layer was replaced by a layer of LSTM units? My rationale is that since LSTMs are fully connected, we would still be able to say that
$$ Q_{ik} = sum_{j=1}^L |w_{ij}|.$$

I'm actually doing a bit of work on this stuff at the minute. From what I've read in the literature, the connection weight method is actually better as it takes into account the magnitude and sign of the network. So maybe that would be a better starting point.

http://www.sciencedirect.com/science/article/pii/S0304380004001565

**Disclaimer: I've just got a paper accepted where I generalise that method to arbitrary depth deep networks 🙂 Can post a link when its up.

Re LSTMs, I'd say it would be a lot more complex to do, but I would say either method could be used for it. Just a bit more thinking involved! Hope this helps some bit. Not a lot of people doing stuff in this area, as in network summarisation.

Similar Posts:

Rate this post

Leave a Comment