Our work 'Systematic generalization emerges in Seq2Seq models with variability in data' published at #ICLR2020 workshop Bridging Cognitive Science and AI.

We show that LSTM+attn model can exhibit sys.gen. and analyze it's behavior when it does. 👇thread.

https://baicsworkshop.github.io/papers/BAICS_11.pdf
Fodor et. al. 1988 argues that language is systematic. That if we understand 'John loves Mary', we also understand 'Mary loves John' as the underlying concepts to process both the sentences (in fact, anything in the form NP Vt NP) are the same.
Fodor argues that as language is systematic and thought/cognition is like a language (language of thought, 1975), connectionist approaches can't model cognition. But classical approaches (Symbolic AI) are cool wrt this as they have combinatorial syntax & semantics by design.
Systematicity, seen as an aspect of generalization, argues that a model should understand all the data points containing the same concepts if it understands any one of them. Going further, after learning N concepts, a model should generalize to new combinations of those concepts.
@LakeBrenden et. al. 2017's SCAN dataset evaluates sys. gen. of a seq2seq model trained on pairs ('walk', 'WALK'), ('walk twice', 'WALK WALK') and ('jump', 'JUMP') with pairs like ('jump twice', 'JUMP JUMP') as they are just a new combination of already learned concepts.
@FelixHill84 @santoroAI et. al. 2019 showed that an RL agent systematically generalizes to a new object if a command like 'pick' is trained with more objects. i.e. generalies to 'pick NewObj' after trained on 'pick Obj1', 'pick Obj2'... 'pick ObjN', where N is cool wrt task.
Knowing the above work it is fairly a no-brainer to see whether a seq2seq model exhibits systematic generalization solving ('jump twice', 'JUMP JUMP') after trained on a pretty-good-N number of different primitives like 'walk1 twice', 'walk2 twice' ... 'walkN twice'.
Turns out, just an LSTM+attn learns 6 modifiers in SCAN (out of 8 modifiers & 2 conjunctions) with an increased number of distinct primitives in the dataset and generalizes to commands with new primitives never trained with any modifiers like ('jump twice', 'JUMP JUMP').
We found a behavior that is highly correlated with sys. gen. i.e. if we subtract 'walk' from the command 'walk twice' and add 'jump' the model gives the output of 'jump twice'. Vectors added/subtracted are from the encoder. This behavior and sys. gen. has 0.99 pearson coeff.
Insight: More num. of primitives a modifier is operated on gives rise to sys. gen. and also makes the model represent a modifier independent of any instance of variable it operated on. Instance Independent representations for modifiers. And it is highly correlated with sys. gen.
Models trained on prim. variables 'jump twice' generalized to compound variables '{jump twice} twice', though in a limited way. Syntactic attn and meta-seq2seq learning which solved SCAN didn't show this behavior, more interestingly, even when trained with 300 diff. primitives.
As you come this far into this thread, I really want to talk to you and know your comments on this simple, but interesting (really?) work.

Also, Check out all the work on SCAN( @LakeBrenden ), CLOSURE( @DBahdanau ), and gSCAN ( @jacobandreas, @LauraRuis7 ) datasets.
Finally, I would like to thank my pal @DMohith_ for great discussions.

* I somehow think that @tallinzen is the one who reviewed (which was double-blinded) this work and gave a green signal. Nonetheless, I am inspired and thankful for his opinions/thoughts/papers.
You can follow @prakashkagitha.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: