| Are Sixteen Heads Really Better than One? | Paul Michel · Omer Levy · Graham Neubig |
| Compositional De-Attention Networks | Yi Tay · Anh Tuan Luu · Aston Zhang · Shuohang Wang · Siu Cheung Hui |
| Geometry-Aware Neural Rendering | Joshua Tobin · Wojciech Zaremba · Pieter Abbeel |
| Image Captioning: Transforming Objects into Words | Simao Herdade · Armin Kappeler · Kofi Boakye · Joao Soares |
| Learning by Abstraction: The Neural State Machine | Drew Hudson · Christopher Manning |
| Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time | Karlis Freivalds · Emīls Ozoliņš · Agris Šostaks |
| Novel positional encodings to enable tree-based transformers | Vighnesh Shiv · Chris Quirk |
| Self-attention with Functional Time Representation Learning | Da Xu · Chuanwei Ruan · Evren Korpeoglu · Sushant Kumar · Kannan Achan |
| Understanding Attention and Generalization in Graph Neural Networks | Boris Knyazev · Graham W Taylor · Mohamed Amer |