| SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems | Alex Wang · Yada Pruksachatkun · Nikita Nangia · Amanpreet Singh · Julian Michael · Felix Hill · Omer Levy · Samuel Bowman |
| A Tensorized Transformer for Language Modeling | Xindian Ma · Peng Zhang · Shuai Zhang · Nan Duan · Yuexian Hou · Ming Zhou · Dawei Song |
| AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification | Ronghui You · Zihan Zhang · Ziye Wang · Suyang Dai · Hiroshi Mamitsuka · Shanfeng Zhu |
| Comparing Unsupervised Word Translation Methods Step by Step | Mareike Hartmann · Yova Kementchedjhieva · Anders Søgaard |
| Glyce: Glyph-vectors for Chinese Character Representations | Yuxian Meng · Wei Wu · Fei Wang · Xiaoya Li · Ping Nie · Fan Yin · Muyu Li · Qinghong Han · Yuxian Meng · Jiwei Li |
| Hierarchical Optimal Transport for Document Representation | Mikhail Yurochkin · Sebastian Claici · Edward Chien · Farzaneh Mirzazadeh · Justin M Solomon |
| Improving Textual Network Learning with Variational Homophilic Embeddings | Wenlin Wang · Chenyang Tao · Zhe Gan · Guoyin Wang · Liqun Chen · Xinyuan Zhang · Ruiyi Zhang · Qian Yang · Ricardo Henao · Lawrence Carin |
| Ouroboros: On Accelerating Training of Transformer-Based Language Models | Qian Yang · Zhouyuan Huo · Wenlin Wang · Lawrence Carin |
| Fast Structured Decoding for Sequence Models | Zhiqing Sun · Zhuohan Li · Haoqing Wang · Di He · Zi Lin · Zhihong Deng |
| Can Unconditional Language Models Recover Arbitrary Sentences? | Nishant Subramani · Samuel Bowman · Kyunghyun Cho |
| Controllable Unsupervised Text Attribute Transfer via Editing Entangled Latent Representation | Ke Wang · Hang Hua · Xiaojun Wan |
| Defending Against Neural Fake News | Rowan Zellers · Ari Holtzman · Hannah Rashkin · Yonatan Bisk · Ali Farhadi · Franziska Roesner · Yejin Choi |
| Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain) | Mariya Toneva · Leila Wehbe |
| Invariance and identifiability issues for word embeddings | Rachel Carrington · Karthik Bharath · Simon Preston |
| Kernelized Bayesian Softmax for Text Generation | Ning Miao · Hao Zhou · Chengqi Zhao · Wenxian Shi · Lei Li |
| Levenshtein Transformer | Jiatao Gu · Changhan Wang · Junbo Zhao |
| Neural Machine Translation with Soft Prototype | Yiren Wang · Yingce Xia · Fei Tian · Fei Gao · Tao Qin · Cheng Xiang Zhai · Tie-Yan Liu |
| Paraphrase Generation with Latent Bag of Words | Yao Fu · Yansong Feng · John Cunningham |
| Unified Language Model Pre-training for Natural Language Understanding and Generation | Li Dong · Nan Yang · Wenhui Wang · Furu Wei · Xiaodong Liu · Yu Wang · Jianfeng Gao · Ming Zhou · Hsiao-Wuen Hon |
| XLNet: Generalized Autoregressive Pretraining for Language Understanding | Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le |