#tweeprint time for our new work out on arXiv!
We've been trying to understand how recurrent neural networks (RNNs) work, by reverse engineering them using tools from dynamical systems analysisâwith @SussilloDavid. https://arxiv.org/abs/2004.08013

We wanted to understand how neural networks process contextual information, such as phrases like âThis movie is not awesome.â
vs âThis movie is extremely awesome.â
Here, the words ânotâ and âextremelyâ act as modifier words.


At last year's NeurIPS, we presented work that showed that RNNs trained on sentiment classification use a line attractor to integrate positive/negative valence from words in a review (with Alex Williams, Matt Golub, & Surya Ganguli). http://papers.nips.cc/paper/9700-reverse-engineering-recurrent-networks-for-sentiment-classification-reveals-line-attractor-dynamics
However, line attractor dynamics cannot explain how modifier words (eg ânotâ) change the meaning of valence words (eg âgoodâ), so we were left with a big mystery⊠but now, weâve solved it! 


We show that modifier words place the RNN state in a low-d âmodifier subspaceâ. In this modifier subspace, the valence of words changes dramatically, e.g. potentially flipping sign (ânotâ) or being strongly accentuated (âextremelyâ).
There are transient dynamics in this modifier subspace, which lets us quantify the strength and timescale of modifier effects. Moreover, preventing the RNN from entering the modifier subspace completely abolishes the networkâs ability to understand modifier words.
This work on understanding the modifier subspace also led us to understand new types of contextual effects that we had no idea were lurking inside the RNNs:
First, we figured out that RNNs emphasize words at the beginning of reviews. This is implemented by making the trained initial condition project into the modifier subspace The initial condition (t=0) itself is an intensification modifier, like âextremelyâ!
Second, we figured out that RNNs accentuate the end of reviews, by projecting fast decaying modes onto the readout. Thus if the review ends, the transient valence is counted. We could understand all of this through analyses of linear approximations.
As Feynman wrote, âWhat I cannot create, I do not understand.â We augmented Bag-of-Words baseline models with modifier effects based on our analyses, and found that we could recover nearly all of the difference in performance between the Bag-of-Words model and the best RNN.

We think of this work as building new tools for reverse engineering neural networks to really understand their learned mechanisms and how to perturb/amplify/isolate their effects.
For more information, check out the paper!
https://arxiv.org/abs/2004.08013
For more information, check out the paper!
