#tweeprint time for our new work out on arXiv!📖We've been trying to understand how recurrent neural networks (RNNs) work, by reverse engineering them using tools from dynamical systems analysis—with @SussilloDavid. https://arxiv.org/abs/2004.08013 
We wanted to understand how neural networks process contextual information, such as phrases like “This movie is not awesome.” đŸ‘ŽđŸŸ vs “This movie is extremely awesome.” đŸ‘đŸŸ Here, the words “not” and “extremely” act as modifier words.
At last year's NeurIPS, we presented work that showed that RNNs trained on sentiment classification use a line attractor to integrate positive/negative valence from words in a review (with Alex Williams, Matt Golub, & Surya Ganguli). http://papers.nips.cc/paper/9700-reverse-engineering-recurrent-networks-for-sentiment-classification-reveals-line-attractor-dynamics
However, line attractor dynamics cannot explain how modifier words (eg “not”) change the meaning of valence words (eg “good”), so we were left with a big mystery
 but now, we’ve solved it! 🎉🎊
We show that modifier words place the RNN state in a low-d “modifier subspace”. In this modifier subspace, the valence of words changes dramatically, e.g. potentially flipping sign (“not”) or being strongly accentuated (“extremely”).
There are transient dynamics in this modifier subspace, which lets us quantify the strength and timescale of modifier effects. Moreover, preventing the RNN from entering the modifier subspace completely abolishes the network’s ability to understand modifier words.
This work on understanding the modifier subspace also led us to understand new types of contextual effects that we had no idea were lurking inside the RNNs:
First, we figured out that RNNs emphasize words at the beginning of reviews. This is implemented by making the trained initial condition project into the modifier subspace The initial condition (t=0) itself is an intensification modifier, like “extremely”!
Second, we figured out that RNNs accentuate the end of reviews, by projecting fast decaying modes onto the readout. Thus if the review ends, the transient valence is counted. We could understand all of this through analyses of linear approximations.
As Feynman wrote, “What I cannot create, I do not understand.” We augmented Bag-of-Words baseline models with modifier effects based on our analyses, and found that we could recover nearly all of the difference in performance between the Bag-of-Words model and the best RNN. ⚙
We think of this work as building new tools for reverse engineering neural networks to really understand their learned mechanisms and how to perturb/amplify/isolate their effects.

For more information, check out the paper! 😍 https://arxiv.org/abs/2004.08013 
You can follow @niru_m.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: