No. | Title |
1 | Empirical Bayes Transductive Meta-Learning with Synthetic Gradients |
2 | Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering |
3 | Generalized Domain Adaptation with Covariate and Label Shift CO-ALignment |
4 | Quaternion Equivariant Capsule Networks for 3D Point Clouds |
5 | Pay Attention to Features, Transfer Learn faster CNNs |
6 | Differentiable Hebbian Consolidation for Continual Learning |
7 | Generative Hierarchical Models for Parts, Objects, and Scenes |
8 | Mixture Distributions for Scalable Bayesian Inference |
9 | Best feature performance in codeswitched hate speech texts |
10 | Geom-GCN: Geometric Graph Convolutional Networks |
11 | Smart Ternary Quantization |
12 | HIPPOCAMPAL NEURONAL REPRESENTATIONS IN CONTINUAL LEARNING |
13 | A GOODNESS OF FIT MEASURE FOR GENERATIVE NETWORKS |
14 | Gradients as Features for Deep Representation Learning |
15 | Deceptive Opponent Modeling with Proactive Network Interdiction for Stochastic Goal Recognition Control |
16 | Monotonic Multihead Attention |
17 | Massively Multilingual Sparse Word Representations |
18 | Attention over Phrases |
19 | Query-efficient Meta Attack to Deep Neural Networks |
20 | BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES |
21 | Meta-Learning Initializations for Image Segmentation |
22 | Privacy-preserving Representation Learning by Disentanglement |
23 | Building Hierarchical Interpretations in Natural Language via Feature Interaction Detection |
24 | AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS |
25 | End-to-end learning of energy-based representations for irregularly-sampled signals and images |
26 | Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation |
27 | How to 0wn the NAS in Your Spare Time |
28 | Generalized Zero-shot ICD Coding |
29 | EXACT ANALYSIS OF CURVATURE CORRECTED LEARNING DYNAMICS IN DEEP LINEAR NETWORKS |
30 | WEEGNET: an wavelet based Convnet for Brain-computer interfaces |
31 | Meta Label Correction for Learning with Weak Supervision |
32 | Toward Controllable Text Content Manipulation |
33 | NAMSG: An Efficient Method for Training Neural Networks |
34 | Learning to Reason: Distilling Hierarchy via Self-Supervision and Reinforcement Learning |
35 | The Shape of Data: Intrinsic Distance for Data Distributions |
36 | Measuring Numerical Common Sense: Is A Word Embedding Approach Effective? |
37 | Learning DNA folding patterns with Recurrent Neural Networks |
38 | Generative Adversarial Nets for Multiple Text Corpora |
39 | Understanding Generalization in Recurrent Neural Networks |
40 | Measure by Measure: Automatic Music Composition with Traditional Western Music Notation |
41 | Weakly-Supervised Trajectory Segmentation for Learning Reusable Skills |
42 | Learn Interpretable Word Embeddings Efficiently with von Mises-Fisher Distribution |
43 | Goten: GPU-Outsourcing Trusted Execution of Neural Network Training and Prediction |
44 | Limitations for Learning from Point Clouds |
45 | DOUBLE-HARD DEBIASING: TAILORING WORD EMBEDDINGS FOR GENDER BIAS MITIGATION |
46 | Conservative Uncertainty Estimation By Fitting Prior Networks |
47 | Re-Examining Linear Embeddings for High-dimensional Bayesian Optimization |
48 | ASYNCHRONOUS MULTI-AGENT GENERATIVE ADVERSARIAL IMITATION LEARNING |
49 | Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards |
50 | NORML: Nodal Optimization for Recurrent Meta-Learning |
51 | Keyword Spotter Model for Crop Pest and Disease Monitoring from Community Radio Data |
52 | NAS-BENCH-1SHOT1: BENCHMARKING AND DISSECTING ONE-SHOT NEURAL ARCHITECTURE SEARCH |
53 | Defense against Adversarial Examples by Encoder-Assisted Search in the Latent Coding Space |
54 | Fuzzing-Based Hard-Label Black-Box Attacks Against Machine Learning Models |
55 | Conditional generation of molecules from disentangled representations |
56 | Dataset Distillation |
57 | Learning RNNs with Commutative State Transitions |
58 | XD: Cross-lingual Knowledge Distillation for Polyglot Sentence Embeddings |
59 | LAVAE: Disentangling Location and Appearance |
60 | Sparse Skill Coding: Learning Behavioral Hierarchies with Sparse Codes |
61 | REFINING MONTE CARLO TREE SEARCH AGENTS BY MONTE CARLO TREE SEARCH |
62 | WHAT DATA IS USEFUL FOR MY DATA: TRANSFER LEARNING WITH A MIXTURE OF SELF-SUPERVISED EXPERTS |
63 | A Bilingual Generative Transformer for Semantic Sentence Embedding |
64 | Learning to Coordinate Manipulation Skills via Skill Behavior Diversification |
65 | DeepPCM: Predicting Protein-Ligand Binding using Unsupervised Learned Representations |
66 | Ternary MobileNets via Per-Layer Hybrid Filter Banks |
67 | Constant Curvature Graph Convolutional Networks |
68 | Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding |
69 | Combining graph and sequence information to learn protein representations |
70 | FINBERT: FINANCIAL SENTIMENT ANALYSIS WITH PRE-TRAINED LANGUAGE MODELS |
71 | Cancer homogeneity in single cell revealed by Bi-state model and Binary matrix factorization |
72 | Robust Subspace Recovery Layer for Unsupervised Anomaly Detection |
73 | Learning Nearly Decomposable Value Functions Via Communication Minimization |
74 | Batch Normalization is a Cause of Adversarial Vulnerability |
75 | Undersensitivity in Neural Reading Comprehension |
76 | Extreme Classification via Adversarial Softmax Approximation |
77 | IS THE LABEL TRUSTFUL: TRAINING BETTER DEEP LEARNING MODEL VIA UNCERTAINTY MINING NET |
78 | Information Geometry of Orthogonal Initializations and Training |
79 | Multi-Step Decentralized Domain Adaptation |
80 | Mixed Precision DNNs: All you need is a good parametrization |
81 | PROGRESSIVE LEARNING AND DISENTANGLEMENT OF HIERARCHICAL REPRESENTATIONS |
82 | Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Ocurring in Data |
83 | Improving the Gating Mechanism of Recurrent Neural Networks |
84 | Learning to Transfer via Modelling Multi-level Task Dependency |
85 | Latent Variables on Spheres for Sampling and Inference |
86 | Deep Orientation Uncertainty Learning based on a Bingham Loss |
87 | Analyzing Privacy Loss in Updates of Natural Language Models |
88 | Learning from Positive and Unlabeled Data with Adversarial Training |
89 | Deep exploration by novelty-pursuit with maximum state entropy |
90 | Reconstructing continuous distributions of 3D protein structure from cryo-EM images |
91 | Deep Evidential Uncertainty |
92 | Tree-structured Attention Module for Image Classification |
93 | Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint |
94 | Better Knowledge Retention through Metric Learning |
95 | Winning the Lottery with Continuous Sparsification |
96 | Critical initialisation in continuous approximations of binary neural networks |
97 | Learning to Learn via Gradient Component Corrections |
98 | LEARNING DIFFICULT PERCEPTUAL TASKS WITH HODGKIN-HUXLEY NETWORKS |
99 | Filter redistribution templates for iteration-lessconvolutional model reduction |
100 | Universal Safeguarded Learned Convex Optimization with Guaranteed Convergence |
101 | A Gradient-Based Approach to Neural Networks Structure Learning |
102 | Sub-policy Adaptation for Hierarchical Reinforcement Learning |
103 | AdvCodec: Towards A Unified Framework for Adversarial Text Generation |
104 | PROVABLY BENEFITS OF DEEP HIERARCHICAL RL |
105 | Learning Latent State Spaces for Planning through Reward Prediction |
106 | Variational lower bounds on mutual information based on nonextensive statistical mechanics |
107 | Hope For The Best But Prepare For The Worst: Cautious Adaptation In RL Agents |
108 | Semi-Supervised Boosting via Self Labelling |
109 | Fractional Graph Convolutional Networks (FGCN) for Semi-Supervised Learning |
110 | Antifragile and Robust Heteroscedastic Bayesian Optimisation |
111 | Generalizing Reinforcement Learning to Unseen Actions |
112 | Provable Representation Learning for Imitation Learning via Bi-level Optimization |
113 | Episodic Reinforcement Learning with Associative Memory |
114 | Flexible and Efficient Long-Range Planning Through Curious Exploration |
115 | Learning to Prove Theorems by Learning to Generate Theorems |
116 | Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem |
117 | Common sense and Semantic-Guided Navigation via Language in Embodied Environments |
118 | Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces |
119 | Neural Phrase-to-Phrase Machine Translation |
120 | At Your Fingertips: Automatic Piano Fingering Detection |
121 | Energy-based models for atomic-resolution protein conformations |
122 | Federated Learning with Matched Averaging |
123 | Clustered Reinforcement Learning |
124 | Understanding the (Un)interpretability of Natural Image Distributions Using Generative Models |
125 | Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning |
126 | Efficient and Robust Asynchronous Federated Learning with Stragglers |
127 | Handwritten Amharic Character Recognition System Using Convolutional Neural Networks |
128 | Effects of Linguistic Labels on Learned Visual Representations in Convolutional Neural Networks: Labels matter! |
129 | Differentiable Programming for Physical Simulation |
130 | Fooling Pre-trained Language Models: An Evolutionary Approach to Generate Wrong Sentences with High Acceptability Score |
131 | Implicit Rugosity Regularization via Data Augmentation |
132 | A Mutual Information Maximization Perspective of Language Representation Learning |
133 | Goal-Conditioned Video Prediction |
134 | Accelerate DNN Inference By Inter-Operator Parallelization |
135 | Compression without Quantization |
136 | Geometry-Aware Visual Predictive Models of Intuitive Physics |
137 | Growing Up Together: Structured Exploration for Large Action Spaces |
138 | Adversarial Training with Voronoi Constraints |
139 | A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed |
140 | RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers |
141 | Towards Understanding the Spectral Bias of Deep Learning |
142 | Domain Adaptive Multiflow Networks |
143 | Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models |
144 | Unsupervised Distillation of Syntactic Information from Contextualized Word Representations |
145 | Optimal Unsupervised Domain Translation |
146 | Multi-task Network Embedding with Adaptive Loss Weighting |
147 | Biologically Plausible Neural Networks via Evolutionary Dynamics and Dopaminergic Plasticity |
148 | ON SOLVING COOPERATIVE DECENTRALIZED MARL PROBLEMS WITH SPARSE REINFORCEMENTS |
149 | Continual Learning using the SHDL Framework with Skewed Replay Distributions |
150 | Semi-supervised Autoencoding Projective Dependency Parsing |
151 | Differentiable Reasoning over a Virtual Knowledge Base |
152 | Making Sense of Reinforcement Learning and Probabilistic Inference |
153 | Negative Sampling in Variational Autoencoders |
154 | Improved Training of Certifiably Robust Models |
155 | Unsupervised Generative 3D Shape Learning from Natural Images |
156 | Diagnosing the Environment Bias in Vision-and-Language Navigation |
157 | Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation |
158 | Learning Mahalanobis Metric Spaces via Geometric Approximation Algorithms |
159 | Laconic Image Classification: Human vs. Machine Performance |
160 | Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks |
161 | Reinforcement Learning with Structured Hierarchical Grammar Representations of Actions |
162 | The Usual Suspects? Reassessing Blame for VAE Posterior Collapse |
163 | Dynamical System Embedding for Efficient Intrinsically Motivated Artificial Agents |
164 | BERT for Sequence-to-Sequence Milti-Label Text Classification |
165 | SCALABLE OBJECT-ORIENTED SEQUENTIAL GENERATIVE MODELS |
166 | Evaluations and Methods for Explanation through Robustness Analysis |
167 | Attributed Graph Learning with 2-D Graph Convolution |
168 | Stochastic Neural Physics Predictor |
169 | Neural tangent kernels, transportation mappings, and universal approximation |
170 | Pragmatic Evaluation of Adversarial Examples in Natural Language |
171 | Learning to Move with Affordance Maps |
172 | Towards Interpreting Deep Neural Networks via Understanding Layer Behaviors |
173 | Deep Learning For Symbolic Mathematics |
174 | Deep Interaction Processes for Time-Evolving Graphs |
175 | Differentiable learning of numerical rules in knowledge graphs |
176 | Consistency Regularization for Generative Adversarial Networks |
177 | On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning |
178 | Lyceum: An efficient and scalable ecosystem for robot learning |
179 | SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models |
180 | In-training Matrix Factorization for Parameter-frugal Neural Machine Translation |
181 | Benefits of Overparameterization in Single-Layer Latent Variable Generative Models |
182 | Implicit competitive regularization in GANs |
183 | Scale-Equivariant Steerable Networks |
184 | Extreme Language Model Compression with Optimal Subwords and Shared Projections |
185 | DeepSphere: a graph-based spherical CNN |
186 | Improved Training Techniques for Online Neural Machine Translation |
187 | GRASPEL: GRAPH SPECTRAL LEARNING AT SCALE |
188 | Overcoming Catastrophic Forgetting via Hessian-free Curvature Estimates |
189 | Score and Lyrics-Free Singing Voice Generation |
190 | Neural Video Encoding |
191 | Interactive Classification by Asking Informative Questions |
192 | Classification-Based Anomaly Detection for General Data |
193 | Mixture Density Networks Find Viewpoint the Dominant Factor for Accurate Spatial Offset Regression |
194 | Distributed Training Across the World |
195 | Unrestricted Adversarial Examples via Semantic Manipulation |
196 | Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model |
197 | Closed loop deep Bayesian inversion: Uncertainty driven acquisition for fast MRI |
198 | OBJECT-ORIENTED REPRESENTATION OF 3D SCENES |
199 | Discriminative Particle Filter Reinforcement Learning for Complex Partial observations |
200 | Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories |
201 | State Alignment-based Imitation Learning |
202 | Reweighted Proximal Pruning for Large-Scale Language Representation |
203 | Neural Arithmetic Units |
204 | Lipschitz constant estimation for Neural Networks via sparse polynomial optimization |
205 | Random Bias Initialization Improving Binary Neural Network Training |
206 | Meta-RCNN: Meta Learning for Few-Shot Object Detection |
207 | Adversarially learned anomaly detection for time series data |
208 | HOW THE CHOICE OF ACTIVATION AFFECTS TRAINING OF OVERPARAMETRIZED NEURAL NETS |
209 | Multi-Precision Policy Enforced Training (MuPPET) : A precision-switching strategy for quantised fixed-point training of CNNs |
210 | Deep Spike Decoder (DSD) |
211 | Isolating Latent Structure with Cross-population Variational Autoencoders |
212 | Learning Compact Embedding Layers via Differentiable Product Quantization |
213 | Accelerating First-Order Optimization Algorithms |
214 | Physics-Aware Flow Data Completion Using Neural Inpainting |
215 | Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks |
216 | Provable Filter Pruning for Efficient Neural Networks |
217 | ADAPTIVE GENERATION OF PROGRAMMING PUZZLES |
218 | Learning transitional skills with intrinsic motivation |
219 | Quantifying uncertainty with GAN-based priors |
220 | End to End Trainable Active Contours via Differentiable Rendering |
221 | Plan2Vec: Unsupervised Representation Learning by Latent Plans |
222 | Uncertainty-aware Variational-Recurrent Imputation Network for Clinical Time Series |
223 | Compositional Continual Language Learning |
224 | Out-of-Distribution Image Detection Using the Normalized Compression Distance |
225 | Discriminative Variational Autoencoder for Continual Learning with Generative Replay |
226 | Connectivity-constrained interactive annotations for panoptic segmentation |
227 | On learning visual odometry errors |
228 | Regularization Matters in Policy Optimization |
229 | Adaptive Online Planning for Continual Lifelong Learning |
230 | Measuring causal influence with back-to-back regression: the linear case |
231 | Regularizing Predictions via Class-wise Self-knowledge Distillation |
232 | Multi-source Multi-view Transfer Learning in Neural Topic Modeling with Pretrained Topic and Word Embeddings |
233 | Adversarial Lipschitz Regularization |
234 | Reasoning-Aware Graph Convolutional Network for Visual Question Answering |
235 | SGD Learns One-Layer Networks in WGANs |
236 | Localized Meta-Learning: A PAC-Bayes Analysis for Meta-Leanring Beyond Global Prior |
237 | FNNP: Fast Neural Network Pruning Using Adaptive Batch Normalization |
238 | Adversarial Training and Provable Defenses: Bridging the Gap |
239 | Finding Deep Local Optima Using Network Pruning |
240 | Adversarial Training Generalizes Data-dependent Spectral Norm Regularization |
241 | Knowledge Transfer via Student-Teacher Collaboration |
242 | A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case |
243 | Weight-space symmetry in neural network loss landscapes revisited |
244 | Differentiable Bayesian Neural Network Inference for Data Streams |
245 | Efficient Transformer for Mobile Applications |
246 | Learning by shaking: Computing policy gradients by physical forward-propagation |
247 | Occlusion resistant learning of intuitive physics from videos |
248 | Quantum Graph Neural Networks |
249 | Statistical Verification of General Perturbations by Gaussian Smoothing |
250 | Localised Generative Flows |
251 | TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces |
252 | Robust Few-Shot Learning with Adversarially Queried Meta-Learners |
253 | Certifying Neural Network Audio Classifiers |
254 | Collaborative Training of Balanced Random Forests for Open Set Domain Adaptation |
255 | PAC-Bayesian Neural Network Bounds |
256 | Semi-Implicit Back Propagation |
257 | Mutual Information Gradient Estimation for Representation Learning |
258 | Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning |
259 | Iterative Deep Graph Learning for Graph Neural Networks |
260 | Mint: Matrix-Interleaving for Multi-Task Learning |
261 | Learning Cluster Structured Sparsity by Reweighting |
262 | Selfish Emergent Communication |
263 | Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning |
264 | Imitation Learning of Robot Policies using Language, Vision and Motion |
265 | Improving Visual Relation Detection using Depth Maps |
266 | Semi-supervised Pose Estimation with Geometric Latent Representations |
267 | Identifying Weights and Architectures of Unknown ReLU Networks |
268 | Unsupervised Domain Adaptation through Self-Supervision |
269 | Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions |
270 | $\alpha^{\alpha}$-Rank: Scalable Multi-agent Evaluation through Evolution |
271 | Variable Complexity in the Univariate and Multivariate Structural Causal Model |
272 | Regularizing activations in neural networks via distribution matching with the Wassertein metric |
273 | RefNet: Automatic Essay Scoring by Pairwise Comparison |
274 | Gradient Descent Maximizes the Margin of Homogeneous Neural Networks |
275 | Mixed Precision Training With 8-bit Floating Point |
276 | An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms |
277 | Consistent Meta-Reinforcement Learning via Model Identification and Experience Relabeling |
278 | Transferring Optimality Across Data Distributions via Homotopy Methods |
279 | Latent Normalizing Flows for Many-to-Many Cross Domain Mappings |
280 | Learning Multi-Agent Communication Through Structured Attentive Reasoning |
281 | Dynamic Model Pruning with Feedback |
282 | $\ell_1$ Adversarial Robustness Certificates: a Randomized Smoothing Approach |
283 | On the interaction between supervision and self-play in emergent communication |
284 | CNAS: Channel-Level Neural Architecture Search |
285 | FLAT MANIFOLD VAES |
286 | Slow Thinking Enables Task-Uncertain Lifelong and Sequential Few-Shot Learning |
287 | A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms |
288 | Expected Information Maximization: Using the I-Projection for Mixture Density Estimation |
289 | Through the Lens of Neural Network: Analyzing Neural QA Models via Quantized Latent Representation |
290 | All Simulations Are Not Equal: Simulation Reweighing for Imperfect Information Games |
291 | Truth or backpropaganda? An empirical investigation of deep learning theory |
292 | Learning to Rank Learning Curves |
293 | Set Functions for Time Series |
294 | I love your chain mail! Making knights smile in a fantasy game world |
295 | Masked Translation Model |
296 | MissDeepCausal: causal inference from incomplete data using deep latent variable models |
297 | Variational Constrained Reinforcement Learning with Application to Planning at Roundabout |
298 | Efficient Deep Representation Learning by Adaptive Latent Space Sampling |
299 | Learning Functionally Decomposed Hierarchies for Continuous Navigation Tasks |
300 | Deep Audio Priors Emerge From Harmonic Convolutional Networks |
301 | Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks |
302 | On Understanding Knowledge Graph Representation |
303 | Encoding Musical Style with Transformer Autoencoders |
304 | Collaborative Inter-agent Knowledge Distillation for Reinforcement Learning |
305 | Gauge Equivariant Spherical CNNs |
306 | INTERPRETING CNN PREDICTION THROUGH LAYER - WISE SELECTED DISCERNIBLE NEURONS |
307 | Preventing Imitation Learning with Adversarial Policy Ensembles |
308 | On the Anomalous Generalization of GANs |
309 | Improving Generalization in Meta Reinforcement Learning using Neural Objectives |
310 | A closer look at the approximation capabilities of neural networks |
311 | VIMPNN: A physics informed neural network for estimating potential energies of out-of-equilibrium systems |
312 | SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning |
313 | Resolving Lexical Ambiguity in English–Japanese Neural Machine Translation |
314 | Data-Efficient Image Recognition with Contrastive Predictive Coding |
315 | Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps |
316 | wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL |
317 | Residual Energy-Based Models for Text Generation |
318 | AtomNAS: Fine-Grained End-to-End Neural Architecture Search |
319 | The Power of Semantic Similarity based Soft-Labeling for Generalized Zero-Shot Learning |
320 | AugMix: A Simple Method to Improve Robustness and Uncertainty under Data Shift |
321 | Learning Latent Dynamics for Partially-Observed Chaotic Systems |
322 | Exploration via Flow-Based Intrinsic Rewards |
323 | Learning Underlying Physical Properties From Observations For Trajectory Prediction |
324 | SPREAD DIVERGENCE |
325 | GraphQA: Protein Model Quality Assessment using Graph Convolutional Network |
326 | Disentanglement through Nonlinear ICA with General Incompressible-flow Networks (GIN) |
327 | DEEP GRAPH SPECTRAL EVOLUTION NETWORKS FOR GRAPH TOPOLOGICAL TRANSFORMATION |
328 | Angular Visual Hardness |
329 | Deep Relational Factorization Machines |
330 | Towards Scalable Imitation Learning for Multi-Agent Systems with Graph Neural Networks |
331 | On the Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks |
332 | MEMORY-BASED GRAPH NETWORKS |
333 | Mem2Mem: Learning to Summarize Long Texts with Memory-to-Memory Transfer |
334 | GQ-Net: Training Quantization-Friendly Deep Networks |
335 | An Empirical Study of Encoders and Decoders in Graph-Based Dependency Parsing |
336 | ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks |
337 | Variational Template Machine for Data-to-Text Generation |
338 | Phase Transitions for the Information Bottleneck in Representation Learning |
339 | PopSGD: Decentralized Stochastic Gradient Descent in the Population Model |
340 | Symmetric-APL Activations: Training Insights and Robustness to Adversarial Attacks |
341 | Faster and Just As Accurate: A Simple Decomposition for Transformer Models |
342 | Hidden incentives for self-induced distributional shift |
343 | The divergences minimized by non-saturating GAN training |
344 | The Differentiable Cross-Entropy Method |
345 | Atomic Compression Networks |
346 | Continual learning with hypernetworks |
347 | Few-Shot Regression via Learning Sparsifying Basis Functions |
348 | Understanding and Training Deep Diagonal Circulant Neural Networks |
349 | Removing input features via a generative model to explain their attributions to classifier's decisions |
350 | Top-down training for neural networks |
351 | Demystifying Graph Neural Network Via Graph Filter Assessment |
352 | Towards Certified Defense for Unrestricted Adversarial Attacks |
353 | Permutation Equivariant Models for Compositional Generalization in Language |
354 | Training binary neural networks with real-to-binary convolutions |
355 | DO-AutoEncoder: Learning and Intervening Bivariate Causal Mechanisms in Images |
356 | StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding |
357 | Multichannel Generative Language Models |
358 | Smooth markets: A basic mechanism for organizing gradient-based learners |
359 | Enhancing the Transformer with explicit relational encoding for math problem solving |
360 | Ergodic Inference: Accelerate Convergence by Optimisation |
361 | SemanticAdv: Generating Adversarial Examples via Attribute-Conditional Image Editing |
362 | Uncertainty - sensitive learning and planning with ensembles |
363 | Fair Resource Allocation in Federated Learning |
364 | Continual Learning via Principal Components Projection |
365 | Task-Mediated Representation Learning |
366 | Convolutional Conditional Neural Processes |
367 | Self-Induced Curriculum Learning in Neural Machine Translation |
368 | CWAE-IRL: Formulating a supervised approach to Inverse Reinforcement Learning problem |
369 | A Quality-Diversity Controllable GAN for Text Generation |
370 | Newton Residual Learning |
371 | Hydra: Preserving Ensemble Diversity for Model Distillation |
372 | Few-Shot Few-Shot Learning and the role of Spatial Attention |
373 | BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning |
374 | Lossless Data Compression with Transformer |
375 | Meta-Learning with Warped Gradient Descent |
376 | Never Give Up: Learning Directed Exploration Strategies |
377 | AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing |
378 | Unsupervised Spatiotemporal Data Inpainting |
379 | Transferable Recognition-Aware Image Processing |
380 | GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modelling |
381 | Transfer Active Learning For Graph Neural Networks |
382 | Trajectory growth through random deep ReLU networks |
383 | Frequency Pooling: Shift-Equivalent and Anti-Aliasing Down Sampling |
384 | Improving Sequential Latent Variable Models with Autoregressive Flows |
385 | SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference |
386 | Sparse Transformer: Concentrated Attention Through Explicit Selection |
387 | Minimizing Change in Classifier Likelihood to Mitigate Catastrophic Forgetting |
388 | Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration |
389 | You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings |
390 | Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport |
391 | Defensive Tensorization: Randomized Tensor Parametrization for Robust Neural Networks |
392 | Question Generation from Paragraphs: A Tale of Two Hierarchical Models |
393 | Robust Reinforcement Learning via Adversarial Training with Langevin Dynamics |
394 | Embodied Multimodal Multitask Learning |
395 | High Fidelity Speech Synthesis with Adversarial Networks |
396 | Autoencoder-based Initialization for Recurrent Neural Networks with a Linear Memory |
397 | Test-Time Training for Out-of-Distribution Generalization |
398 | Distance-based Composable Representations with Neural Networks |
399 | At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? |
400 | GPU Memory Management for Deep Neural Networks Using Deep Q-Network |
401 | FRICATIVE PHONEME DETECTION WITH ZERO DELAY |
402 | Walking on the Edge: Fast, Low-Distortion Adversarial Examples |
403 | Disentangling Trainability and Generalization in Deep Learning |
404 | Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization |
405 | Functional Regularisation for Continual Learning with Gaussian Processes |
406 | Verification of Generative-Model-Based Visual Transformations |
407 | A Graph Neural Network Assisted Monte Carlo Tree Search Approach to Traveling Salesman Problem |
408 | Residual EBMs: Does Real vs. Fake Text Discrimination Generalize? |
409 | Learning Likelihoods with Conditional Normalizing Flows |
410 | Informed Temporal Modeling via Logical Specification of Factorial LSTMs |
411 | Auto Network Compression with Cross-Validation Gradient |
412 | Regularly varying representation for sentence embedding |
413 | A Simple and Scalable Shape Representation for 3D Reconstruction |
414 | Learning Through Limited Self-Supervision: Improving Time-Series Classification Without Additional Data via Auxiliary Tasks |
415 | EvoNet: A Neural Network for Predicting the Evolution of Dynamic Graphs |
416 | Few-Shot One-Class Classification via Meta-Learning |
417 | Training a Constrained Natural Media Painting Agent using Reinforcement Learning |
418 | Fix-Net: pure fixed-point representation of deep neural networks |
419 | Learning Semantic Correspondences from Noisy Data-text Pairs by Local-to-Global Alignments |
420 | The Role of Embedding Complexity in Domain-invariant Representations |
421 | Learning Curves for Deep Neural Networks: A field theory perspective |
422 | Zero-Shot Policy Transfer with Disentangled Attention |
423 | Disentangled Cumulants Help Successor Representations Transfer to New Tasks |
424 | Learning vector representation of local content and matrix representation of local motion, with implications for V1 |
425 | Online Learned Continual Compression with Stacked Quantization Modules |
426 | Gumbel-Matrix Routing for Flexible Multi-task Learning |
427 | The Frechet Distance of training and test distribution predicts the generalization gap |
428 | Mixed Setting Training Methods for Incremental Slot-Filling Tasks |
429 | Selective sampling for accelerating training of deep neural networks |
430 | Representing Unordered Data Using Multiset Automata and Complex Numbers |
431 | Robust Natural Language Representation Learning for Natural Language Inference by Projecting Superficial Words out |
432 | Deep Nonlinear Stochastic Optimal Control for Systems with Multiplicative Uncertainties |
433 | Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network |
434 | Sentence embedding with contrastive multi-views learning |
435 | Dynamics-Aware Embeddings |
436 | Learning Multi-facet Embeddings of Phrases and Sentences using Sparse Coding for Unsupervised Semantic Applications |
437 | AN ATTENTION-BASED DEEP NET FOR LEARNING TO RANK |
438 | RaPP: Novelty Detection with Reconstruction along Projection Pathway |
439 | SAFE-DNN: A Deep Neural Network with Spike Assisted Feature Extraction for Noise Robust Inference |
440 | Putting Machine Translation in Context with the Noisy Channel Model |
441 | Deep geometric matrix completion: Are we doing it right? |
442 | Progressive Compressed Records: Taking a Byte Out of Deep Learning Data |
443 | Robustness and/or Redundancy Emerge in Overparametrized Deep Neural Networks |
444 | The Intriguing Effects of Focal Loss on the Calibration of Deep Neural Networks |
445 | Hypermodels for Exploration |
446 | Denoising Improves Latent Space Geometry in Text Autoencoders |
447 | Provable Convergence and Global Optimality of Generative Adversarial Network |
448 | On Symmetry and Initialization for Neural Networks |
449 | Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies |
450 | Policy path programming |
451 | Meta-Learning with Network Pruning for Overfitting Reduction |
452 | Kernel and Rich Regimes in Overparametrized Models |
453 | A Boolean Task Algebra for Reinforcement Learning |
454 | Explanation by Progressive Exaggeration |
455 | Quantum Optical Experiments Modeled by Long Short-Term Memory |
456 | Why do These Match? Explaining the Behavior of Image Similarity Models |
457 | Mode Connectivity and Sparse Neural Networks |
458 | Monte Carlo Deep Neural Network Arithmetic |
459 | Shape Features Improve General Model Robustness |
460 | Random Partition Relaxation for Training Binary and Ternary Weight Neural Network |
461 | How can we generalise learning distributed representations of graphs? |
462 | Relation-based Generalized Zero-shot Classification with the Domain Discriminator on the shared representation |
463 | Self-supervised Training of Proposal-based Segmentation via Background Prediction |
464 | Influence-aware Memory for Deep Reinforcement Learning |
465 | Gating Revisited: Deep Multi-layer RNNs That Can Be Trained |
466 | Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses |
467 | A Simple Geometric Proof for the Benefit of Depth in ReLU Networks |
468 | Avoiding Negative Side-Effects and Promoting Safe Exploration with Imaginative Planning |
469 | BayesOpt Adversarial Attack |
470 | CrossNorm: On Normalization for Off-Policy Reinforcement Learning |
471 | A Simple Technique to Enable Saliency Methods to Pass the Sanity Checks |
472 | Directional Message Passing for Molecular Graphs |
473 | Unsupervised Learning of Efficient and Robust Speech Representations |
474 | Compositional Embeddings: Joint Perception and Comparison of Class Label Sets |
475 | Model-based reinforcement learning for biological sequence design |
476 | Learning to Optimize via Dual space Preconditioning |
477 | Self-Attentional Credit Assignment for Transfer in Reinforcement Learning |
478 | AdaGAN: Adaptive GAN for Many-to-Many Non-Parallel Voice Conversion |
479 | City Metro Network Expansion with Reinforcement Learning |
480 | BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations |
481 | ShardNet: One Filter Set to Rule Them All |
482 | Towards Interpretable Evaluations: A Case Study of Named Entity Recognition |
483 | Mixed-curvature Variational Autoencoders |
484 | Rethinking deep active learning: Using unlabeled data at model training |
485 | Blurring Structure and Learning to Optimize and Adapt Receptive Fields |
486 | Layerwise Learning Rates for Object Features in Unsupervised and Supervised Neural Networks And Consequent Predictions for the Infant Visual System |
487 | Continual Deep Learning by Functional Regularisation of Memorable Past |
488 | Demystifying Inter-Class Disentanglement |
489 | On the implicit minimization of alternative loss functions when training deep networks |
490 | Dynamic Graph Message Passing Networks |
491 | A Deep Recurrent Neural Network via Unfolding Reweighted l1-l1 Minimization |
492 | Differentially Private Mixed-Type Data Generation For Unsupervised Learning |
493 | Learning from Rules Generalizing Labeled Exemplars |
494 | Group-Transformer: Towards A Lightweight Character-level Language Model |
495 | Language-independent Cross-lingual Contextual Representations |
496 | Understanding the Limitations of Conditional Generative Models |
497 | Skew-Explore: Learn faster in continuous spaces with sparse rewards |
498 | Diversely Stale Parameters for Efficient Training of Deep Convolutional Networks |
499 | Exploring the Correlation between Likelihood of Flow-based Generative Models and Image Semantics |
500 | Anomaly Detection Based on Unsupervised Disentangled Representation Learning in Combination with Manifold Learning |
501 | Neural Arithmetic Unit by reusing many small pre-trained networks |
502 | On Stochastic Sign Descent Methods |
503 | GENN: Predicting Correlated Drug-drug Interactions with Graph Energy Neural Networks |
504 | Event Discovery for History Representation in Reinforcement Learning |
505 | Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning |
506 | Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification |
507 | Domain-Invariant Representations: A Look on Compression and Weights |
508 | Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack |
509 | Spike-based causal inference for weight alignment |
510 | Symmetry and Systematicity |
511 | Efficacy of Pixel-Level OOD Detection for Semantic Segmentation |
512 | PatchFormer: A neural architecture for self-supervised representation learning on images |
513 | Address2vec: Generating vector embeddings for blockchain analytics |
514 | Attack-Resistant Federated Learning with Residual-based Reweighting |
515 | Learning scalable and transferable multi-robot/machine sequential assignment planning via graph embedding |
516 | Learning a Spatio-Temporal Embedding for Video Instance Segmentation |
517 | Efficient Exploration via State Marginal Matching |
518 | Side-Tuning: Network Adaptation via Additive Side Networks |
519 | Lookahead: A Far-sighted Alternative of Magnitude-based Pruning |
520 | SCELMo: Source Code Embeddings from Language Models |
521 | Detecting Change in Seasonal Pattern via Autoencoder and Temporal Regularization |
522 | CopyCAT: Taking Control of Neural Policies with Constant Attacks |
523 | VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning |
524 | A Generalized Training Approach for Multiagent Learning |
525 | Quantum Semi-Supervised Kernel Learning |
526 | Unsupervised Meta-Learning for Reinforcement Learning |
527 | Making Efficient Use of Demonstrations to Solve Hard Exploration Problems |
528 | Training individually fair ML models with sensitive subspace robustness |
529 | Meta-learning curiosity algorithms |
530 | vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations |
531 | The Secret Revealer: Generative Model Inversion Attacks Against Deep Neural Networks |
532 | Leveraging Entanglement Entropy for Deep Understanding of Attention Matrix in Text Matching |
533 | Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies |
534 | Under what circumstances do local codes emerge in feed-forward neural networks |
535 | MMA Training: Direct Input Space Margin Maximization through Adversarial Training |
536 | Forecasting Deep Learning Dynamics with Applications to Hyperparameter Tuning |
537 | Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks |
538 | Building Deep Equivariant Capsule Networks |
539 | Learning to Infer User Interface Attributes from Images |
540 | Attacking Graph Convolutional Networks via Rewiring |
541 | Incorporating BERT into Neural Machine Translation |
542 | Unsupervised Hierarchical Graph Representation Learning with Variational Bayes |
543 | Copy That! Editing Sequences by Copying Spans |
544 | DeepXML: Scalable & Accurate Deep Extreme Classification for Matching User Queries to Advertiser Bid Phrases |
545 | What Can Neural Networks Reason About? |
546 | Structured Object-Aware Physics Prediction for Video Modeling and Planning |
547 | A multi-task U-net for segmentation with lazy labels |
548 | Neural Design of Contests and All-Pay Auctions using Multi-Agent Simulation |
549 | CaptainGAN: Navigate Through Embedding Space For Better Text Generation |
550 | Learning-Augmented Data Stream Algorithms |
551 | word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement |
552 | On Weight-Sharing and Bilevel Optimization in Architecture Search |
553 | Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models |
554 | Imbalanced Classification via Adversarial Minority Over-sampling |
555 | Compositional Transfer in Hierarchical Reinforcement Learning |
556 | On the Relationship between Self-Attention and Convolutional Layers |
557 | PolyGAN: High-Order Polynomial Generators |
558 | Dynamic Scale Inference by Entropy Minimization |
559 | SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes |
560 | Rethinking Data Augmentation: Self-Supervision and Self-Distillation |
561 | GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN |
562 | Learning to Remember from a Multi-Task Teacher |
563 | Gradient $\ell_1$ Regularization for Quantization Robustness |
564 | Coloring graph neural networks for node disambiguation |
565 | Spectral Embedding of Regularized Block Models |
566 | On Federated Learning of Deep Networks from Non-IID Data: Parameter Divergence and the Effects of Hyperparametric Methods |
567 | Improved Detection of Adversarial Attacks via Penetration Distortion Maximization |
568 | Barcodes as summary of objective functions' topology |
569 | Unsupervised Video-to-Video Translation via Self-Supervised Learning |
570 | Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control |
571 | STYLE EXAMPLE-GUIDED TEXT GENERATION USING GENERATIVE ADVERSARIAL TRANSFORMERS |
572 | LEARNING TO IMPUTE: A GENERAL FRAMEWORK FOR SEMI-SUPERVISED LEARNING |
573 | Geometry-aware Generation of Adversarial and Cooperative Point Clouds |
574 | Crafting Data-free Universal Adversaries with Dilate Loss |
575 | Efficient Bi-Directional Verification of ReLU Networks via Quadratic Programming |
576 | Improving Sample Efficiency in Model-Free Reinforcement Learning from Images |
577 | Improving Exploration of Deep Reinforcement Learning using Planning for Policy Search |
578 | Spatial Information is Overrated for Image Classification |
579 | A Theoretical Analysis of Deep Q-Learning |
580 | Decentralized Deep Learning with Arbitrary Communication Compression |
581 | Can I Trust the Explainer? Verifying Post-Hoc Explanatory Methods |
582 | D3PG: Deep Differentiable Deterministic Policy Gradients |
583 | Deep Ensembles: A Loss Landscape Perspective |
584 | A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation |
585 | MULTI-STAGE INFLUENCE FUNCTION |
586 | Impact of the latent space on the ability of GANs to fit the distribution |
587 | Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators |
588 | Combining Q-Learning and Search with Amortized Value Estimates |
589 | Hyperbolic Image Embeddings |
590 | Infinite-Horizon Differentiable Model Predictive Control |
591 | Neural Reverse Engineering of Stripped Binaries |
592 | Anchor & Transform: Learning Sparse Representations of Discrete Objects |
593 | Emergence of Collective Policies Inside Simulations with Biased Representations |
594 | Projection Based Constrained Policy Optimization |
595 | GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension |
596 | Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning |
597 | Recurrent Layer Attention Network |
598 | Towards Effective 2-bit Quantization: Pareto-optimal Bit Allocation for Deep CNNs Compression |
599 | You Only Train Once: Loss-Conditional Training of Deep Networks |
600 | Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization |
601 | Using Explainabilty to Detect Adversarial Attacks |
602 | Feature Selection using Stochastic Gates |
603 | SpectroBank: A filter-bank convolutional layer for CNN-based audio applications |
604 | Testing For Typicality with Respect to an Ensemble of Learned Distributions |
605 | Emergent Communication in Networked Multi-Agent Reinforcement Learning |
606 | GraphSAINT: Graph Sampling Based Inductive Learning Method |
607 | Adversarial Filters of Dataset Biases |
608 | Value-Driven Hindsight Modelling |
609 | Incorporating Perceptual Prior to Improve Model's Adversarial Robustness |
610 | Learning Neural Causal Models from Unknown Interventions |
611 | Adaptive Generation of Unrestricted Adversarial Inputs |
612 | P-BN: Towards Effective Batch Normalization in the Path Space |
613 | Efficient Probabilistic Logic Reasoning with Graph Neural Networks |
614 | On the geometry and learning low-dimensional embeddings for directed graphs |
615 | GATO: Gates Are Not the Only Option |
616 | Probabilistic View of Multi-agent Reinforcement Learning: A Unified Approach |
617 | Neural Subgraph Isomorphism Counting |
618 | RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments |
619 | Continual Learning with Delayed Feedback |
620 | Neural Non-additive Utility Aggregation |
621 | Bayesian Variational Autoencoders for Unsupervised Out-of-Distribution Detection |
622 | ``"Best-of-Many-Samples" Distribution Matching |
623 | Dynamically Balanced Value Estimates for Actor-Critic Methods |
624 | Spatially Parallel Attention and Component Extraction for Scene Decomposition |
625 | Efficient generation of structured objects with Constrained Adversarial Networks |
626 | Deep Variational Semi-Supervised Novelty Detection |
627 | Cross-Lingual Ability of Multilingual BERT: An Empirical Study |
628 | Towards Understanding Generalization in Gradient-Based Meta-Learning |
629 | Towards Finding Longer Proofs |
630 | Probing Emergent Semantics in Predictive Agents via Question Answering |
631 | Revisiting the Information Plane |
632 | Deep 3D-Zoom Net: Unsupervised Learning of Photo-Realistic 3D-Zoom |
633 | Hierarchical Graph Matching Networks for Deep Graph Similarity Learning |
634 | A Simple Approach to the Noisy Label Problem Through the Gambler's Loss |
635 | On the Reflection of Sensitivity in the Generalization Error |
636 | Redundancy-Free Computation Graphs for Graph Neural Networks |
637 | Toward Understanding The Effect of Loss Function on The Performance of Knowledge Graph Embedding |
638 | Reducing Transformer Depth on Demand with Structured Dropout |
639 | Semi-Supervised Learning with Normalizing Flows |
640 | Neural Communication Systems with Bandwidth-limited Channel |
641 | Reducing Computation in Recurrent Networks by Selectively Updating State Neurons |
642 | A Novel Analysis Framework of Lower Complexity Bounds for Finite-Sum Optimization |
643 | Neural Outlier Rejection for Self-Supervised Keypoint Learning |
644 | Exploring the Pareto-Optimality between Quality and Diversity in Text Generation |
645 | B-Spline CNNs on Lie groups |
646 | EMS: End-to-End Model Search for Network Architecture, Pruning and Quantization |
647 | Feature-based Augmentation for Semi-Supervised Learning |
648 | Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel |
649 | Progressive Knowledge Distillation For Generative Modeling |
650 | EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks |
651 | Learning To Explore Using Active Neural Mapping |
652 | Adversarial Robustness Against the Union of Multiple Perturbation Models |
653 | Understanding and Improving Information Transfer in Multi-Task Learning |
654 | Hyperparameter Tuning and Implicit Regularization in Minibatch SGD |
655 | Searching for Stage-wise Neural Graphs In the Limit |
656 | Restricting the Flow: Information Bottlenecks for Attribution |
657 | Stein Bridging: Enabling Mutual Reinforcement between Explicit and Implicit Generative Models |
658 | Step Size Optimization |
659 | Equilibrium Propagation with Continual Weight Updates |
660 | Global Adversarial Robustness Guarantees for Neural Networks |
661 | A Stochastic Derivative Free Optimization Method with Momentum |
662 | Coresets for Accelerating Incremental Gradient Methods |
663 | A Greedy Approach to Max-Sliced Wasserstein GANs |
664 | Off-Policy Actor-Critic with Shared Experience Replay |
665 | Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems |
666 | The Ingredients of Real World Robotic Reinforcement Learning |
667 | Causal Discovery with Reinforcement Learning |
668 | Modelling the influence of data structure on learning in neural networks |
669 | Task-agnostic Continual Learning via Growing Long-Term Memory Networks |
670 | Scaling Autoregressive Video Models |
671 | TOWARDS FEATURE SPACE ADVERSARIAL ATTACK |
672 | Generative Integration Networks |
673 | Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Nonconvex Optimization |
674 | Compressive Transformers for Long-Range Sequence Modelling |
675 | Global Momentum Compression for Sparse Communication in Distributed SGD |
676 | State2vec: Off-Policy Successor Feature Approximators |
677 | Differentiation of Blackbox Combinatorial Solvers |
678 | Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs |
679 | Lagrangian Fluid Simulation with Continuous Convolutions |
680 | Graph-based motion planning networks |
681 | Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks |
682 | Semi-supervised semantic segmentation needs strong, high-dimensional perturbations |
683 | Learning to Guide Random Search |
684 | Attentive Sequential Neural Processes |
685 | The intriguing role of module criticality in the generalization of deep networks |
686 | Yet another but more efficient black-box adversarial attack: tiling and evolution strategies |
687 | TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing |
688 | Learning with Social Influence through Interior Policy Differentiation |
689 | SPROUT: Self-Progressing Robust Training |
690 | Alleviating Privacy Attacks via Causal Learning |
691 | Hybrid Weight Representation: A Quantization Method Represented with Ternary and Sparse-Large Weights |
692 | Self-labelling via simultaneous clustering and representation learning |
693 | Meta Decision Trees for Explainable Recommendation Systems |
694 | Continual Learning with Gated Incremental Memories for Sequential Data Processing |
695 | Policy Optimization by Local Improvement through Search |
696 | Improving Model Compatibility of Generative Adversarial Networks by Boundary Calibration |
697 | Data Annealing Transfer learning Procedure for Informal Language Understanding Tasks |
698 | Robust anomaly detection and backdoor attack detection via differential privacy |
699 | CAT: Compression-Aware Training for bandwidth reduction |
700 | Scheduling the Learning Rate Via Hypergradients: New Insights and a New Algorithm |
701 | Learning Entailment-Based Sentence Embeddings from Natural Language Inference |
702 | Invariance vs Robustness of Neural Networks |
703 | Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm |
704 | LARGE SCALE REPRESENTATION LEARNING FROM TRIPLET COMPARISONS |
705 | Irrationality can help reward inference |
706 | Learning to Reach Goals Without Reinforcement Learning |
707 | Pruning Depthwise Separable Convolutions for Extra Efficiency Gain of Lightweight Models |
708 | Subjective Reinforcement Learning for Open Complex Environments |
709 | Deep probabilistic subsampling for task-adaptive compressed sensing |
710 | Text Embedding Bank Module for Detailed Image Paragraph Caption |
711 | Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations |
712 | Representing Model Uncertainty of Neural Networks in Sparse Information Form |
713 | GroSS Decomposition: Group-Size Series Decomposition for Whole Search-Space Training |
714 | Neural Tangents: Fast and Easy Infinite Neural Networks in Python |
715 | Sparse Weight Activation Training |
716 | Learning Robust Representations via Multi-View Information Bottleneck |
717 | Batch-shaping for learning conditional channel gated networks |
718 | Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy |
719 | Universal Adversarial Attack Using Very Few Test Examples |
720 | Rotation-invariant clustering of functional cell types in primary visual cortex |
721 | Solving single-objective tasks by preference multi-objective reinforcement learning |
722 | Deep automodulators |
723 | Enhanced Convolutional Neural Tangent Kernels |
724 | Revisiting Gradient Episodic Memory for Continual Learning |
725 | Inductive and Unsupervised Representation Learning on Graph Structured Objects |
726 | A new perspective in understanding of Adam-Type algorithms and beyond |
727 | Causally Correct Partial Models for Reinforcement Learning |
728 | Spectral Nonlocal Block for Neural Network |
729 | U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation |
730 | Masked Based Unsupervised Content Transfer |
731 | Efficient meta reinforcement learning via meta goal generation |
732 | Learning robust visual representations using data augmentation invariance |
733 | A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs |
734 | DropEdge: Towards Deep Graph Convolutional Networks on Node Classification |
735 | Simple but effective techniques to reduce dataset biases |
736 | Projected Canonical Decomposition for Knowledge Base Completion |
737 | Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue |
738 | AMUSED: A Multi-Stream Vector Representation Method for Use In Natural Dialogue |
739 | Measuring the Reliability of Reinforcement Learning Algorithms |
740 | Semi-Supervised Named Entity Recognition with CRF-VAEs |
741 | Stable Rank Normalization for Improved Generalization in Neural Networks and GANs |
742 | Graph Neural Networks for Soft Semi-Supervised Learning on Hypergraphs |
743 | Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks |
744 | Deep Neural Forests: An Architecture for Tabular Data |
745 | Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks |
746 | ICNN: INPUT-CONDITIONED FEATURE REPRESENTATION LEARNING FOR TRANSFORMATION-INVARIANT NEURAL NETWORK |
747 | Data Augmentation in Training CNNs: Injecting Noise to Images |
748 | VAENAS: Sampling Matters in Neural Architecture Search |
749 | Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following |
750 | Model-Agnostic Feature Selection with Additional Mutual Information |
751 | Do Deep Neural Networks for Segmentation Understand Insideness? |
752 | Adversarial Robustness as a Prior for Learned Representations |
753 | Explaining Time Series by Counterfactuals |
754 | Variational Diffusion Autoencoders with Random Walk Sampling |
755 | Probability Calibration for Knowledge Graph Embedding Models |
756 | Contrastive Multiview Coding |
757 | Fast Sparse ConvNets |
758 | Reformer: The Efficient Transformer |
759 | BasisVAE: Orthogonal Latent Space for Deep Disentangled Representation |
760 | Target-Embedding Autoencoders for Supervised Representation Learning |
761 | Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search |
762 | Conditional Flow Variational Autoencoders for Structured Sequence Prediction |
763 | High-Frequency guided Curriculum Learning for Class-specific Object Boundary Detection |
764 | On the Equivalence between Node Embeddings and Structural Graph Representations |
765 | Disagreement-Regularized Imitation Learning |
766 | Shifted Randomized Singular Value Decomposition |
767 | PassNet: Learning pass probability surfaces from single-location labels. An architecture for visually-interpretable soccer analytics |
768 | On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints |
769 | Are Few-shot Learning Benchmarks Too Simple ? |
770 | UNIVERSAL MODAL EMBEDDING OF DYNAMICS IN VIDEOS AND ITS APPLICATIONS |
771 | Universality Theorems for Generative Models |
772 | Function Feature Learning of Neural Networks |
773 | Manifold Learning and Alignment with Generative Adversarial Networks |
774 | Learning Deep-Latent Hierarchies by Stacking Wasserstein Autoencoders |
775 | Scalable Deep Neural Networks via Low-Rank Matrix Factorization |
776 | NoiGAN: NOISE AWARE KNOWLEDGE GRAPH EMBEDDING WITH GAN |
777 | Fast Task Adaptation for Few-Shot Learning |
778 | Weighted Empirical Risk Minimization: Transfer Learning based on Importance Sampling |
779 | Neural Program Synthesis By Self-Learning |
780 | Neural Epitome Search for Architecture-Agnostic Network Compression |
781 | Learning from Label Proportions with Consistency Regularization |
782 | Do recent advancements in model-based deep reinforcement learning really improve data efficiency? |
783 | Evo-NAS: Evolutionary-Neural Hybrid Agent for Architecture Search |
784 | Mixing Up Real Samples and Adversarial Samples for Semi-Supervised Learning |
785 | Task-Agnostic Robust Encodings for Combating Adversarial Typos |
786 | When Covariate-shifted Data Augmentation Increases Test Error And How to Fix It |
787 | Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems |
788 | AdamT: A Stochastic Optimization with Trend Correction Scheme |
789 | The Variational InfoMax AutoEncoder |
790 | Skew-Fit: State-Covering Self-Supervised Reinforcement Learning |
791 | LOGAN: Latent Optimisation for Generative Adversarial Networks |
792 | Hyper-SAGNN: a self-attention based graph neural network for hypergraphs |
793 | A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning |
794 | Global-Local Network for Learning Depth with Very Sparse Supervision |
795 | CEB Improves Model Robustness |
796 | Music Source Separation in the Waveform Domain |
797 | Information lies in the eye of the beholder: The effect of representations on observed mutual information |
798 | On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach |
799 | Distributionally Robust Neural Networks |
800 | Distilling the Knowledge of BERT for Text Generation |
801 | Kernel of CycleGAN as a principal homogeneous space |
802 | Cross-Lingual Vision-Language Navigation |
803 | Molecule Property Prediction and Classification with Graph Hypernetworks |
804 | A Syntax-Aware Approach for Unsupervised Text Style Transfer |
805 | Relevant-features based Auxiliary Cells for Robust and Energy Efficient Deep Learning |
806 | Don't Use Large Mini-batches, Use Local SGD |
807 | Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$ |
808 | Model Based Reinforcement Learning for Atari |
809 | Generating Multi-Sentence Abstractive Summaries of Interleaved Texts |
810 | On Universal Equivariant Set Networks |
811 | Compressive Hyperspherical Energy Minimization |
812 | OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL NETWORKS |
813 | Deep End-to-end Unsupervised Anomaly Detection |
814 | Tensor Decompositions for Temporal Knowledge Base Completion |
815 | CloudLSTM: A Recurrent Neural Model for Spatiotemporal Point-cloud Stream Forecasting |
816 | Neural Approximation of an Auto-Regressive Process through Confidence Guided Sampling |
817 | A Simple Randomization Technique for Generalization in Deep Reinforcement Learning |
818 | Stochastic Latent Residual Video Prediction |
819 | AlignNet: Self-supervised Alignment Module |
820 | Learning with Protection: Rejection of Suspicious Samples under Adversarial Environment |
821 | QXplore: Q-Learning Exploration by Maximizing Temporal Difference Error |
822 | Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck |
823 | Partial Simulation for Imitation Learning |
824 | Few-shot Learning by Focusing on Differences |
825 | Robustness Verification for Transformers |
826 | EnsembleNet: A novel architecture for Incremental Learning |
827 | Anomalous Pattern Detection in Activations and Reconstruction Error of Autoencoders |
828 | Fantastic Generalization Measures and Where to Find Them |
829 | Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks |
830 | Learning De-biased Representations with Biased Representations |
831 | Weakly Supervised Disentanglement with Guarantees |
832 | Imagining the Latent Space of a Variational Auto-Encoders |
833 | A Copula approach for hyperparameter transfer learning |
834 | THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION |
835 | Provenance detection through learning transformation-resilient watermarking |
836 | Regulatory Focus: Promotion and Prevention Inclinations in Policy Search |
837 | Fairness with Wasserstein Adversarial Networks |
838 | Diagonal Graph Convolutional Networks with Adaptive Neighborhood Aggregation |
839 | Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth |
840 | The Dual Information Bottleneck |
841 | Deep Auto-Deferring Policy for Combinatorial Optimization |
842 | Towards trustworthy predictions from deep neural networks with fast adversarial calibration |
843 | Abductive Commonsense Reasoning |
844 | Variance Reduction With Sparse Gradients |
845 | BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget |
846 | RNA Secondary Structure Prediction By Learning Unrolled Algorithms |
847 | Learning transport cost from subset correspondence |
848 | Attentive Weights Generation for Few Shot Learning via Information Maximization |
849 | Semi-Supervised Few-Shot Learning with a Controlled Degree of Task-Adaptive Conditioning |
850 | Detecting Noisy Training Data with Loss Curves |
851 | Reducing Sentiment Bias in Language Models via Counterfactual Evaluation |
852 | Near-Zero-Cost Differentially Private Deep Learning with Teacher Ensembles |
853 | Neural Network Out-of-Distribution Detection for Regression Tasks |
854 | Rényi Fair Inference |
855 | Reject Illegal Inputs: Scaling Generative Classifiers with Supervised Deep Infomax |
856 | Lean Images for Geo-Localization |
857 | WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia |
858 | Deep Lifetime Clustering |
859 | Towards Understanding the Transferability of Deep Representations |
860 | Meta Dropout: Learning to Perturb Latent Features for Generalization |
861 | Adversarial AutoAugment |
862 | When Robustness Doesn’t Promote Robustness: Synthetic vs. Natural Distribution Shifts on ImageNet |
863 | Understanding Why Neural Networks Generalize Well Through GSNR of Parameters |
864 | State-only Imitation with Transition Dynamics Mismatch |
865 | Measuring and Improving the Use of Graph Information in Graph Neural Networks |
866 | Meta-Learning by Hallucinating Useful Examples |
867 | Pixel Co-Occurence Based Loss Metrics for Super Resolution Texture Recovery |
868 | A Latent Morphology Model for Open-Vocabulary Neural Machine Translation |
869 | Sample-Based Point Cloud Decoder Networks |
870 | AUGMENTED POLICY GRADIENT METHODS FOR EFFICIENT REINFORCEMENT LEARNING |
871 | BETANAS: Balanced Training and selective drop for Neural Architecture Search |
872 | Connecting the Dots Between MLE and RL for Sequence Prediction |
873 | Universal Approximation with Certified Networks |
874 | Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency |
875 | SEERL : Sample Efficient Ensemble Reinforcement Learning |
876 | Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks |
877 | DyNet: Dynamic Convolution for Accelerating Convolution Neural Networks |
878 | Deep Symbolic Superoptimization Without Human Knowledge |
879 | Unsupervised domain adaptation with imputation |
880 | Sample Efficient Policy Gradient Methods with Recursive Variance Reduction |
881 | A Generative Model for Molecular Distance Geometry |
882 | Generating Biased Datasets for Neural Natural Language Processing |
883 | Robustified Importance Sampling for Covariate Shift |
884 | Fast Task Inference with Variational Intrinsic Successor Features |
885 | Certified Defenses for Adversarial Patches |
886 | Hardware-aware One-Shot Neural Architecture Search in Coordinate Ascent Framework |
887 | Contrastive Representation Distillation |
888 | Generating valid Euclidean distance matrices |
889 | Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions |
890 | Information Theoretic Model Predictive Q-Learning |
891 | On Predictive Information Sub-optimality of RNNs |
892 | Model Inversion Networks for Model-Based Optimization |
893 | Learning to Recognize the Unseen Visual Predicates |
894 | Continuous Control with Contexts, Provably |
895 | Stabilizing Transformers for Reinforcement Learning |
896 | A FRAMEWORK FOR ROBUSTNESS CERTIFICATION OF SMOOTHED CLASSIFIERS USING F-DIVERGENCES |
897 | The Detection of Distributional Discrepancy for Text Generation |
898 | Relative Pixel Prediction For Autoregressive Image Generation |
899 | FACE SUPER-RESOLUTION GUIDED BY 3D FACIAL PRIORS |
900 | Natural- to formal-language generation using Tensor Product Representations |
901 | Three-Head Neural Network Architecture for AlphaZero Learning |
902 | Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Budget |
903 | Interpretable Network Structure for Modeling Contextual Dependency |
904 | Policy Tree Network |
905 | Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks |
906 | Characterize and Transfer Attention in Graph Neural Networks |
907 | Adversarial Neural Pruning |
908 | Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering |
909 | A Baseline for Few-Shot Image Classification |
910 | Abstract Diagrammatic Reasoning with Multiplex Graph Networks |
911 | Emergent Systematic Generalization In a Situated Agent |
912 | SoftAdam: Unifying SGD and Adam for better stochastic gradient descent |
913 | ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators |
914 | Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning |
915 | Amharic Text Normalization with Sequence-to-Sequence Models |
916 | Thinking While Moving: Deep Reinforcement Learning with Concurrent Control |
917 | RATE-DISTORTION OPTIMIZATION GUIDED AUTOENCODER FOR GENERATIVE APPROACH |
918 | On the expected running time of nonconvex optimization with early stopping |
919 | Knossos: Compiling AI with AI |
920 | Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution |
921 | CP-GAN: Towards a Better Global Landscape of GANs |
922 | Jacobian Adversarially Regularized Networks for Robustness |
923 | Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems |
924 | Improving Federated Learning Personalization via Model Agnostic Meta Learning |
925 | Towards Verified Robustness under Text Deletion Interventions |
926 | Discovering Topics With Neural Topic Models Built From PLSA Loss |
927 | And the Bit Goes Down: Revisiting the Quantization of Neural Networks |
928 | Meta-Learning Runge-Kutta |
929 | RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis |
930 | Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks |
931 | Instant Quantization of Neural Networks using Monte Carlo Methods |
932 | Hallucinative Topological Memory for Zero-Shot Visual Planning |
933 | Learning Good Policies By Learning Good Perceptual Models |
934 | Implementation Matters in Deep RL: A Case Study on PPO and TRPO |
935 | A Closer Look at Deep Policy Gradients |
936 | Plug and Play Language Model: A simple baseline for controlled language generation |
937 | Efficient High-Dimensional Data Representation Learning via Semi-Stochastic Block Coordinate Descent Methods |
938 | Understanding and Robustifying Differentiable Architecture Search |
939 | Rethinking the Hyperparameters for Fine-tuning |
940 | UNITER: Learning UNiversal Image-TExt Representations |
941 | Self-Supervised GAN Compression |
942 | Retrieving Signals in the Frequency Domain with Deep Complex Extractors |
943 | Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings |
944 | Implementing Inductive bias for different navigation tasks through diverse RNN attrractors |
945 | Disentangling Style and Content in Anime Illustrations |
946 | Dynamic Instance Hardness |
947 | Multi-step Greedy Policies in Model-Free Deep Reinforcement Learning |
948 | A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions |
949 | Is my Deep Learning Model Learning more than I want it to? |
950 | LIA: Latently Invertible Autoencoder with Adversarial Learning |
951 | PCMC-Net: Feature-based Pairwise Choice Markov Chains |
952 | Multi-Agent Interactions Modeling with Correlated Policies |
953 | Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning |
954 | Once for All: Train One Network and Specialize it for Efficient Deployment |
955 | Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition |
956 | Acutum: When Generalization Meets Adaptability |
957 | FR-GAN: Fair and Robust Training |
958 | SNODE: Spectral Discretization of Neural ODEs for System Identification |
959 | Guiding Program Synthesis by Learning to Generate Examples |
960 | Fast Neural Network Adaptation via Parameters Remapping |
961 | Measuring Calibration in Deep Learning |
962 | R2D2: Reuse & Reduce via Dynamic Weight Diffusion for Training Efficient NLP Models |
963 | Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep RL |
964 | On the Distribution of Penultimate Activations of Classification Networks |
965 | Divide-and-Conquer Adversarial Learning for High-Resolution Image Enhancement |
966 | Meta-Learning Deep Energy-Based Memory Models |
967 | Mutual Information Maximization for Robust Plannable Representations |
968 | Depth creates no more spurious local minima in linear networks |
969 | WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE |
970 | YaoGAN: Learning Worst-case Competitive Algorithms from Self-generated Inputs |
971 | Annealed Denoising score matching: learning Energy based model in high-dimensional spaces |
972 | Finding Winning Tickets with Limited (or No) Supervision |
973 | Graph Convolutional Reinforcement Learning |
974 | Open-Set Domain Adaptation with Category-Agnostic Clusters |
975 | Deep Generative Classifier for Out-of-distribution Sample Detection |
976 | Reparameterized Variational Divergence Minimization for Stable Imitation |
977 | Learning Function-Specific Word Representations |
978 | Swoosh! Rattle! Thump! - Actions that Sound |
979 | Improving and Stabilizing Deep Energy-Based Learning |
980 | Perception-Driven Curiosity with Bayesian Surprise |
981 | Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning |
982 | Towards Effective and Efficient Zero-shot Learning by Fine-tuning with Task Descriptions |
983 | TWIN GRAPH CONVOLUTIONAL NETWORKS: GCN WITH DUAL GRAPH SUPPORT FOR SEMI-SUPERVISED LEARNING |
984 | Continual Density Ratio Estimation (CDRE): A new method for evaluating generative models in continual learning |
985 | CONTRIBUTION OF INTERNAL REFLECTION IN LANGUAGE EMERGENCE WITH AN UNDER-RESTRICTED SITUATION |
986 | Kernelized Wasserstein Natural Gradient |
987 | The Curious Case of Neural Text Degeneration |
988 | Universal approximations of permutation invariant/equivariant functions by deep neural networks |
989 | Improving Robustness Without Sacrificing Accuracy with Patch Gaussian Augmentation |
990 | What Can Learned Intrinsic Rewards Capture? |
991 | On Iterative Neural Network Pruning, Reinitialization, and the Similarity of Masks |
992 | Implicit Generative Modeling for Efficient Exploration |
993 | Continuous Meta-Learning without Tasks |
994 | Counterfactual Regularization for Model-Based Reinforcement Learning |
995 | Multilingual Alignment of Contextual Word Representations |
996 | A bi-diffusion based layer-wise sampling method for deep learning in large graphs |
997 | Learning Video Representations using Contrastive Bidirectional Transformer |
998 | Unrestricted Adversarial Attacks For Semantic Segmentation |
999 | Randomness in Deconvolutional Networks for Visual Representation |
1000 | HUBERT Untangles BERT to Improve Transfer across NLP Tasks |
1001 | The Gambler's Problem and Beyond |
1002 | CRAP: Semi-supervised Learning via Conditional Rotation Angle Prediction |
1003 | Noisy $\ell^{0}$-Sparse Subspace Clustering on Dimensionality Reduced Data |
1004 | GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation |
1005 | Off-policy Multi-step Q-learning |
1006 | Axial Attention in Multidimensional Transformers |
1007 | Joint text classification on multiple levels with multiple labels |
1008 | Fully Quantized Transformer for Improved Translation |
1009 | The Surprising Behavior Of Graph Neural Networks |
1010 | Double Neural Counterfactual Regret Minimization |
1011 | Resizable Neural Networks |
1012 | Multitask Soft Option Learning |
1013 | Adaptive Adversarial Imitation Learning |
1014 | Representation Learning with Multisets |
1015 | Improving Confident-Classifiers For Out-of-distribution Detection |
1016 | Cyclic Graph Dynamic Multilayer Perceptron for Periodic Signals |
1017 | Accelerating Monte Carlo Bayesian Inference via Approximating Predictive Uncertainty over the Simplex |
1018 | Capsule Networks without Routing Procedures |
1019 | Certifiably Robust Interpretation in Deep Learning |
1020 | Continuous Convolutional Neural Network forNonuniform Time Series |
1021 | DS-VIC: Unsupervised Discovery of Decision States for Transfer in RL |
1022 | Neural Policy Gradient Methods: Global Optimality and Rates of Convergence |
1023 | Multi-objective Neural Architecture Search via Predictive Network Performance Optimization |
1024 | Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference |
1025 | Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers |
1026 | A Mean-Field Theory for Kernel Alignment with Random Features in Generative Adverserial Networks |
1027 | Learning Key Steps to Attack Deep Reinforcement Learning Agents |
1028 | Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks |
1029 | On PAC-Bayes Bounds for Deep Neural Networks using the Loss Curvature |
1030 | Deep Graph Matching Consensus |
1031 | Self-Supervised Learning of Appliance Usage |
1032 | Gaussian Conditional Random Fields for Classification |
1033 | Fourier networks for uncertainty estimates and out-of-distribution detection |
1034 | Semantic Hierarchy Emerges in the Deep Generative Representations for Scene Synthesis |
1035 | Quantum Algorithms for Deep Convolutional Neural Networks |
1036 | TWO-STEP UNCERTAINTY NETWORK FOR TASKDRIVEN SENSOR PLACEMENT |
1037 | EXPLOITING SEMANTIC COHERENCE TO IMPROVE PREDICTION IN SATELLITE SCENE IMAGE ANALYSIS: APPLICATION TO DISEASE DENSITY ESTIMATION |
1038 | Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds |
1039 | Abstractive Dialog Summarization with Semantic Scaffolds |
1040 | Evaluating Semantic Representations of Source Code |
1041 | Searching to Exploit Memorization Effect in Learning from Corrupted Labels |
1042 | Study of a Simple, Expressive and Consistent Graph Feature Representation |
1043 | Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness |
1044 | Balancing Cost and Benefit with Tied-Multi Transformers |
1045 | End-to-End Multi-Domain Task-Oriented Dialogue Systems with Multi-level Neural Belief Tracker |
1046 | All Neural Networks are Created Equal |
1047 | Construction of Macro Actions for Deep Reinforcement Learning |
1048 | BOSH: An Efficient Meta Algorithm for Decision-based Attacks |
1049 | MGP-AttTCN: An Interpretable Machine Learning Model for the Prediction of Sepsis |
1050 | Unsupervised Representation Learning by Predicting Random Distances |
1051 | ConQUR: Mitigating Delusional Bias in Deep Q-Learning |
1052 | Where is the Information in a Deep Network? |
1053 | Extreme Values are Accurate and Robust in Deep Networks |
1054 | Statistically Consistent Saliency Estimation |
1055 | Domain-Independent Dominance of Adaptive Methods |
1056 | Neural Networks for Principal Component Analysis: A New Loss Function Provably Yields Ordered Exact Eigenvectors |
1057 | Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control |
1058 | PNEN: Pyramid Non-Local Enhanced Networks |
1059 | Interpretations are useful: penalizing explanations to align neural networks with prior knowledge |
1060 | FreeLB: Enhanced Adversarial Training for Language Understanding |
1061 | Behaviour Suite for Reinforcement Learning |
1062 | Strategies for Pre-training Graph Neural Networks |
1063 | GRAPHS, ENTITIES, AND STEP MIXTURE |
1064 | Refining the variational posterior through iterative optimization |
1065 | Aggregating explanation methods for neural networks stabilizes explanations |
1066 | Recurrent Hierarchical Topic-Guided Neural Language Models |
1067 | Invertible generative models for inverse problems: mitigating representation error and dataset bias |
1068 | An Algorithm-Agnostic NAS Benchmark |
1069 | Learning World Graph Decompositions To Accelerate Reinforcement Learning |
1070 | Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems |
1071 | Controlling generative models with continuous factors of variations |
1072 | Emergent Tool Use From Multi-Agent Autocurricula |
1073 | The fairness-accuracy landscape of neural classifiers |
1074 | Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee |
1075 | Unsupervised Clustering using Pseudo-semi-supervised Learning |
1076 | Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning |
1077 | POLYNOMIAL ACTIVATION FUNCTIONS |
1078 | PairNorm: Tackling Oversmoothing in GNNs |
1079 | Training-Free Uncertainty Estimation for Neural Networks |
1080 | Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning |
1081 | Empirical Studies on the Properties of Linear Regions in Deep Neural Networks |
1082 | SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning |
1083 | Smoothness and Stability in GANs |
1084 | Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation |
1085 | On Bonus Based Exploration Methods In The Arcade Learning Environment |
1086 | Power up! Robust Graph Convolutional Network based on Graph Powering |
1087 | Global graph curvature |
1088 | Deep k-NN for Noisy Labels |
1089 | Filling the Soap Bubbles: Efficient Black-Box Adversarial Certification with Non-Gaussian Smoothing |
1090 | Guided Adaptive Credit Assignment for Sample Efficient Policy Optimization |
1091 | A Theory of Usable Information under Computational Constraints |
1092 | On the Invertibility of Invertible Neural Networks |
1093 | Shallow VAEs with RealNVP Prior Can Perform as Well as Deep Hierarchical VAEs |
1094 | GAN-based Gaussian Mixture Model Responsibility Learning |
1095 | Information-Theoretic Local Minima Characterization and Regularization |
1096 | Well-Read Students Learn Better: On the Importance of Pre-training Compact Models |
1097 | IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks |
1098 | UWGAN: UNDERWATER GAN FOR REAL-WORLD UNDERWATER COLOR RESTORATION AND DEHAZING |
1099 | HiLLoC: lossless image compression with hierarchical latent variable models |
1100 | Learning to Learn Kernels with Variational Random Features |
1101 | Efficient Wrapper Feature Selection using Autoencoder and Model Based Elimination |
1102 | Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics |
1103 | Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks |
1104 | Dual Sequential Monte Carlo: Tunneling Filtering and Planning in Continuous POMDPs |
1105 | Enhancing Language Emergence through Empathy |
1106 | The Generalization-Stability Tradeoff in Neural Network Pruning |
1107 | Word embedding re-examined: is the symmetrical factorization optimal? |
1108 | Empowering Graph Representation Learning with Paired Training and Graph Co-Attention |
1109 | Learning representations for binary-classification without backpropagation |
1110 | Deep unsupervised feature selection |
1111 | WaveFlow: A Compact Flow-based Model for Raw Audio |
1112 | Mathematical Reasoning in Latent Space |
1113 | Black Box Recursive Translations for Molecular Optimization |
1114 | Improved Generalization Bound of Permutation Invariant Deep Neural Networks |
1115 | Frequency-based Search-control in Dyna |
1116 | Off-policy Bandits with Deficient Support |
1117 | Implicit λ-Jeffreys Autoencoders: Taking the Best of Both Worlds |
1118 | Super-AND: A Holistic Approach to Unsupervised Embedding Learning |
1119 | FLUID FLOW MASS TRANSPORT FOR GENERATIVE NETWORKS |
1120 | Recognizing Plans by Learning Embeddings from Observed Action Distributions |
1121 | LEX-GAN: Layered Explainable Rumor Detector Based on Generative Adversarial Networks |
1122 | Towards Stable and Efficient Training of Verifiably Robust Neural Networks |
1123 | Multi-hop Question Answering via Reasoning Chains |
1124 | Factorized Multimodal Transformer for Multimodal Sequential Learning |
1125 | Learning in Confusion: Batch Active Learning with Noisy Oracle |
1126 | Iterative energy-based projection on a normal data manifold for anomaly localization |
1127 | Counting the Paths in Deep Neural Networks as a Performance Predictor |
1128 | Chart Auto-Encoders for Manifold Structured Data |
1129 | Optimizing Loss Landscape Connectivity via Neuron Alignment |
1130 | CROSS-DOMAIN CASCADED DEEP TRANSLATION |
1131 | V1Net: A computational model of cortical horizontal connections |
1132 | Distribution Matching Prototypical Network for Unsupervised Domain Adaptation |
1133 | Deep amortized clustering |
1134 | Using Objective Bayesian Methods to Determine the Optimal Degree of Curvature within the Loss Landscape |
1135 | Towards neural networks that provably know when they don't know |
1136 | BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning |
1137 | Fully Convolutional Graph Neural Networks using Bipartite Graph Convolutions |
1138 | Inductive representation learning on temporal graphs |
1139 | Attention on Abstract Visual Reasoning |
1140 | Starfire: Regularization-Free Adversarially-Robust Structured Sparse Training |
1141 | Convolutional Tensor-Train LSTM for Long-Term Video Prediction |
1142 | An Information Theoretic Approach to Distributed Representation Learning |
1143 | PatchVAE: Learning Local Latent Codes for Recognition |
1144 | A Probabilistic Formulation of Unsupervised Text Style Transfer |
1145 | ROBUST GENERATIVE ADVERSARIAL NETWORK |
1146 | Feature Map Transform Coding for Energy-Efficient CNN Inference |
1147 | Generative Models for Effective ML on Private, Decentralized Datasets |
1148 | Learning from Partially-Observed Multimodal Data with Variational Autoencoders |
1149 | A SIMPLE AND EFFECTIVE FRAMEWORK FOR PAIRWISE DEEP METRIC LEARNING |
1150 | A Group-Theoretic Framework for Knowledge Graph Embedding |
1151 | A⋆MCTS: SEARCH WITH THEORETICAL GUARANTEE USING POLICY AND VALUE FUNCTIONS |
1152 | Picking Winning Tickets Before Training by Preserving Gradient Flow |
1153 | Exploring Cellular Protein Localization Through Semantic Image Synthesis |
1154 | Learning Calibratable Policies using Programmatic Style-Consistency |
1155 | Contextual Temperature for Language Modeling |
1156 | Retrospection: Leveraging the Past for Efficient Training of Deep Neural Networks |
1157 | Curriculum Loss: Robust Learning and Generalization against Label Corruption |
1158 | Discrete Transformer |
1159 | Adversarially Robust Generalization Just Requires More Unlabeled Data |
1160 | Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference |
1161 | DeepSFM: Structure From Motion Via Deep Bundle Adjustment |
1162 | IsoNN: Isomorphic Neural Network for Graph Representation Learning and Classification |
1163 | Uncertainty-guided Continual Learning with Bayesian Neural Networks |
1164 | Spline Templated Based Handwriting Generation |
1165 | On Empirical Comparisons of Optimizers for Deep Learning |
1166 | On Evaluating Explainability Algorithms |
1167 | Deep Hierarchical-Hyperspherical Learning (DH^2L) |
1168 | Versatile Anomaly Detection with Outlier Preserving Distribution Mapping Autoencoders |
1169 | Ladder Polynomial Neural Networks |
1170 | Training Recurrent Neural Networks Online by Learning Explicit State Variables |
1171 | How fine can fine-tuning be? Learning efficient language models |
1172 | Improved Modeling of Complex Systems Using Hybrid Physics/Machine Learning/Stochastic Models |
1173 | LEARNING TO LEARN WITH BETTER CONVERGENCE |
1174 | Deep Expectation-Maximization in Hidden Markov Models via Simultaneous Perturbation Stochastic Approximation |
1175 | Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework |
1176 | Compositional Visual Generation with Energy Based Models |
1177 | Learning Sparsity and Quantization Jointly and Automatically for Neural Network Compression via Constrained Optimization |
1178 | Hierarchical Bayes Autoencoders |
1179 | Wyner VAE: A Variational Autoencoder with Succinct Common Representation Learning |
1180 | Granger Causal Structure Reconstruction from Heterogeneous Multivariate Time Series |
1181 | CGT: Clustered Graph Transformer for Urban Spatio-temporal Prediction |
1182 | Robust Reinforcement Learning for Continuous Control with Model Misspecification |
1183 | Decoupling Representation and Classifier for Long-Tailed Recognition |
1184 | SDGM: Sparse Bayesian Classifier Based on a Discriminative Gaussian Mixture Model |
1185 | Which Tasks Should Be Learned Together in Multi-task Learning? |
1186 | COMBINED FLEXIBLE ACTIVATION FUNCTIONS FOR DEEP NEURAL NETWORKS |
1187 | Empirical observations pertaining to learned priors for deep latent variable models |
1188 | MetaPoison: Learning to craft adversarial poisoning examples via meta-learning |
1189 | Teacher-Student Compression with Generative Adversarial Networks |
1190 | Visual Hide and Seek |
1191 | Unsupervised Temperature Scaling: Robust Post-processing Calibration for Domain Shift |
1192 | Pareto Optimality in No-Harm Fairness |
1193 | Domain Adaptation Through Label Propagation: Learning Clustered and Aligned Features |
1194 | Visual Representation Learning with 3D View-Constrastive Inverse Graphics Networks |
1195 | Dream to Control: Learning Behaviors by Latent Imagination |
1196 | From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech |
1197 | Active Learning Graph Neural Networks via Node Feature Propagation |
1198 | Real or Not Real, that is the Question |
1199 | Deep Reinforcement Learning with Implicit Human Feedback |
1200 | Multi-Sample Dropout for Accelerated Training and Better Generalization |
1201 | MelNet: A Generative Model for Audio in the Frequency Domain |
1202 | Semi-Supervised Semantic Dependency Parsing Using CRF Autoencoders |
1203 | Image Classification Through Top-Down Image Pyramid Traversal |
1204 | Cross Domain Imitation Learning |
1205 | FAST LEARNING VIA EPISODIC MEMORY: A PERSPECTIVE FROM ANIMAL DECISION-MAKING |
1206 | DCTD: Deep Conditional Target Densities for Accurate Regression |
1207 | Blending Diverse Physical Priors with Neural Networks |
1208 | VISUALIZING POINT CLOUD CLASSIFIERS BY MORPHING POINT CLOUDS INTO POTATOES |
1209 | Read, Highlight and Summarize: A Hierarchical Neural Semantic Encoder-based Approach |
1210 | Posterior Control of Blackbox Generation |
1211 | A closer look at network resolution for efficient network design |
1212 | Efficient Systolic Array Based on Decomposable MAC for Quantized Deep Neural Networks |
1213 | Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing |
1214 | On the Evaluation of Conditional GANs |
1215 | JAUNE: Justified And Unified Neural language Evaluation |
1216 | Classification as Decoder: Trading Flexibility for Control in Multi Domain Dialogue |
1217 | Statistical Adaptive Stochastic Optimization |
1218 | Scalable Neural Learning for Verifiable Consistency with Temporal Specifications |
1219 | Model Comparison of Beer data classification using an electronic nose |
1220 | Non-linear System Identification from Partial Observations via Iterative Smoothing and Learning |
1221 | Evaluating Lossy Compression Rates of Deep Generative Models |
1222 | LambdaNet: Probabilistic Type Inference using Graph Neural Networks |
1223 | Variational Autoencoders with Normalizing Flow Decoders |
1224 | Model-Augmented Actor-Critic: Backpropagating through Paths |
1225 | Metagross: Meta Gated Recursive Controller Units for Sequence Modeling |
1226 | Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension |
1227 | Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities |
1228 | Stochastic Mirror Descent on Overparameterized Nonlinear Models |
1229 | Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators |
1230 | Recurrent Chunking Mechanisms for Conversational Machine Reading Comprehension |
1231 | Frequency Analysis for Graph Convolution Network |
1232 | Network Deconvolution |
1233 | Revisiting Self-Training for Neural Sequence Generation |
1234 | Generative Cleaning Networks with Quantized Nonlinear Transform for Deep Neural Network Defense |
1235 | Mutual Exclusivity as a Challenge for Deep Neural Networks |
1236 | Meta-Q-Learning |
1237 | CURSOR-BASED ADAPTIVE QUANTIZATION FOR DEEP NEURAL NETWORK |
1238 | Natural Image Manipulation for Autoregressive Models Using Fisher Scores |
1239 | Unifying Part Detection And Association For Multi-person Pose Estimation |
1240 | Towards a Deep Network Architecture for Structured Smoothness |
1241 | A novel text representation which enables image classifiers to perform text classification |
1242 | On the Global Convergence of Training Deep Linear ResNets |
1243 | A Closer Look at the Optimization Landscapes of Generative Adversarial Networks |
1244 | Perceptual Generative Autoencoders |
1245 | Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning |
1246 | JAX MD: End-to-End Differentiable, Hardware Accelerated, Molecular Dynamics in Pure Python |
1247 | Deflecting Adversarial Attacks |
1248 | Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks |
1249 | MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning |
1250 | Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication |
1251 | Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? |
1252 | Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks |
1253 | Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks |
1254 | Intriguing Properties of Adversarial Training at Scale |
1255 | Point Process Flows |
1256 | Cover Filtration and Stable Paths in the Mapper |
1257 | Fully Polynomial-Time Randomized Approximation Schemes for Global Optimization of High-Dimensional Folded Concave Penalized Generalized Linear Models |
1258 | Learning Neural Surrogate Model for Warm-Starting Bayesian Optimization |
1259 | Scalable Differentially Private Data Generation via Private Aggregation of Teacher Ensembles |
1260 | Knowledge Graph Embedding: A Probabilistic Perspective and Generalization Bounds |
1261 | Stabilizing Neural ODE Networks with Stochasticity |
1262 | Adversarial Paritial Multi-label Learning |
1263 | Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness |
1264 | Agent as Scientist: Learning to Verify Hypotheses |
1265 | CRNet: Image Super-Resolution Using A Convolutional Sparse Coding Inspired Network |
1266 | Deep Double Descent: Where Bigger Models and More Data Hurt |
1267 | Multigrid Neural Memory |
1268 | ASGen: Answer-containing Sentence Generation to Pre-Train Question Generator for Scale-up Data in Question Answering |
1269 | Distribution-Guided Local Explanation for Black-Box Classifiers |
1270 | Decoding As Dynamic Programming For Recurrent Autoregressive Models |
1271 | Compressed Sensing with Deep Image Prior and Learned Regularization |
1272 | Gradient Surgery for Multi-Task Learning |
1273 | SINGLE PATH ONE-SHOT NEURAL ARCHITECTURE SEARCH WITH UNIFORM SAMPLING |
1274 | Synthesizing Programmatic Policies that Inductively Generalize |
1275 | Transformer-XH: Multi-hop question answering with eXtra Hop attention |
1276 | Variational Hyper RNN for Sequence Modeling |
1277 | Generalization through Memorization: Nearest Neighbor Language Models |
1278 | Comparing Fine-tuning and Rewinding in Neural Network Pruning |
1279 | Simple is Better: Training an End-to-end Contract Bridge Bidding Agent without Human Knowledge |
1280 | The Sooner The Better: Investigating Structure of Early Winning Lottery Tickets |
1281 | Long History Short-Term Memory for Long-Term Video Prediction |
1282 | Adversarial training with perturbation generator networks |
1283 | Single episode transfer for differing environmental dynamics in reinforcement learning |
1284 | Inducing Stronger Object Representations in Deep Visual Trackers |
1285 | TOWARDS STABILIZING BATCH STATISTICS IN BACKWARD PROPAGATION OF BATCH NORMALIZATION |
1286 | STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION |
1287 | Training Deep Neural Networks with Partially Adaptive Momentum |
1288 | NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension |
1289 | Learning Latent Representations for Inverse Dynamics using Generalized Experiences |
1290 | Learning The Difference That Makes A Difference With Counterfactually-Augmented Data |
1291 | Differentiable Architecture Compression |
1292 | The Early Phase of Neural Network Training |
1293 | Chordal-GCN: Exploiting sparsity in training large-scale graph convolutional networks |
1294 | On The Difficulty of Warm-Starting Neural Network Training |
1295 | NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks |
1296 | Distilled embedding: non-linear embedding factorization using knowledge distillation |
1297 | Incremental RNN: A Dynamical View. |
1298 | Domain-Relevant Embeddings for Question Similarity |
1299 | Actor-Critic Approach for Temporal Predictive Clustering |
1300 | Adversarial Privacy Preservation under Attribute Inference Attack |
1301 | Behavior-Guided Reinforcement Learning |
1302 | Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates |
1303 | Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling |
1304 | Extreme Tensoring for Low-Memory Preconditioning |
1305 | Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning |
1306 | Collapsed amortized variational inference for switching nonlinear dynamical systems |
1307 | Non-Autoregressive Dialog State Tracking |
1308 | Channel Equilibrium Networks |
1309 | Independence-aware Advantage Estimation |
1310 | Bayesian Meta Sampling for Fast Uncertainty Adaptation |
1311 | Salient Explanation for Fine-grained Classification |
1312 | SIMULTANEOUS ATTRIBUTED NETWORK EMBEDDING AND CLUSTERING |
1313 | Stochastic Gradient Methods with Block Diagonal Matrix Adaptation |
1314 | Harnessing Structures for Value-Based Planning and Reinforcement Learning |
1315 | The Dynamics of Signal Propagation in Gated Recurrent Neural Networks |
1316 | Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality |
1317 | Discriminability Distillation in Group Representation Learning |
1318 | Calibration, Entropy Rates, and Memory in Language Models |
1319 | Rethinking Generalized Matrix Factorization for Recommendation: The Importance of Multi-hot Encoding |
1320 | Efficient Saliency Maps for Explainable AI |
1321 | Reinforcement Learning with Probabilistically Complete Exploration |
1322 | Unaligned Image-to-Sequence Transformation with Loop Consistency |
1323 | Learning to Generate 3D Training Data through Hybrid Gradient |
1324 | Removing the Representation Error of GAN Image Priors Using the Deep Decoder |
1325 | MEMO: A Deep Network for Flexible Combination of Episodic Memories |
1326 | Superbloom: Bloom filter meets Transformer |
1327 | Longitudinal Enrichment of Imaging Biomarker Representations for Improved Alzheimer's Disease Diagnosis |
1328 | Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks |
1329 | Generating Semantic Adversarial Examples with Differentiable Rendering |
1330 | Guided variational autoencoder for disentanglement learning |
1331 | ManiGAN: Text-Guided Image Manipulation |
1332 | Quantum algorithm for finding the negative curvature direction |
1333 | Dual-module Inference for Efficient Recurrent Neural Networks |
1334 | GUIDEGAN: ATTENTION BASED SPATIAL GUIDANCE FOR IMAGE-TO-IMAGE TRANSLATION |
1335 | MixUp as Directional Adversarial Training |
1336 | Towards Interpretable Molecular Graph Representation Learning |
1337 | Representation Learning Through Latent Canonicalizations |
1338 | Winning Privately: The Differentially Private Lottery Ticket Mechanism |
1339 | Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization |
1340 | WHAT ILLNESS OF LANDSCAPE CAN OVER-PARAMETERIZATION ALONE CURE? |
1341 | Correctness Verification of Neural Network |
1342 | Generalizing Natural Language Analysis through Span-relation Representations |
1343 | Jelly Bean World: A Testbed for Never-Ending Learning |
1344 | Characterizing convolutional neural networks with one-pixel signature |
1345 | A Deep Dive into Count-Min Sketch for Extreme Classification in Logarithmic Memory |
1346 | Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs |
1347 | Learning from Explanations with Neural Module Execution Tree |
1348 | A Coordinate-Free Construction of Scalable Natural Gradient |
1349 | Discovering Motor Programs by Recomposing Demonstrations |
1350 | How Aggressive Can Adversarial Attacks Be: Learning Ordered Top-k Attacks |
1351 | Adaptive Learned Bloom Filter (Ada-BF): Efficient Utilization of the Classifier |
1352 | Convergence Behaviour of Some Gradient-Based Methods on Bilinear Zero-Sum Games |
1353 | Aging Memories Generate More Fluent Dialogue Responses with Memory Networks |
1354 | DSReg: Using Distant Supervision as a Regularizer |
1355 | Iterative Target Augmentation for Effective Conditional Generation |
1356 | Composing Task-Agnostic Policies with Deep Reinforcement Learning |
1357 | The Local Elasticity of Neural Networks |
1358 | Gradient-Based Neural DAG Learning |
1359 | On Concept-Based Explanations in Deep Neural Networks |
1360 | Policy Message Passing: A New Algorithm for Probabilistic Graph Inference |
1361 | Learning to Control Latent Representations for Few-Shot Learning of Named Entities |
1362 | Amortized Nesterov's Momentum: Robust and Lightweight Momentum for Deep Learning |
1363 | Recurrent Event Network : Global Structure Inference Over Temporal Knowledge Graph |
1364 | Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data |
1365 | Composition-based Multi-Relational Graph Convolutional Networks |
1366 | Capsules with Inverted Dot-Product Attention Routing |
1367 | The Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions |
1368 | Insights on Visual Representations for Embodied Navigation Tasks |
1369 | Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos |
1370 | On the Unintended Social Bias of Training Language Generation Models with News Articles |
1371 | Role-Wise Data Augmentation for Knowledge Distillation |
1372 | Learning Classifier Synthesis for Generalized Few-Shot Learning |
1373 | Attention Forcing for Sequence-to-sequence Model Training |
1374 | Topic Models with Survival Supervision: Archetypal Analysis and Neural Approaches |
1375 | FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary |
1376 | On Need for Topology-Aware Generative Models for Manifold-Based Defenses |
1377 | Neural Execution of Graph Algorithms |
1378 | Objective Mismatch in Model-based Reinforcement Learning |
1379 | Molecular Graph Enhanced Transformer for Retrosynthesis Prediction |
1380 | Non-Sequential Melody Generation |
1381 | Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning |
1382 | Visual Explanation for Deep Metric Learning |
1383 | Deep Innovation Protection |
1384 | Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models |
1385 | BERTScore: Evaluating Text Generation with BERT |
1386 | Octave Graph Convolutional Network |
1387 | Learning from Imperfect Annotations: An End-to-End Approach |
1388 | Zeroth Order Optimization by a Mixture of Evolution Strategies |
1389 | Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History |
1390 | Machine Truth Serum |
1391 | Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control |
1392 | GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding |
1393 | Sensible adversarial learning |
1394 | Attention Interpretability Across NLP Tasks |
1395 | Neuron ranking - an informed way to compress convolutional neural networks |
1396 | MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees |
1397 | AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training |
1398 | INTERNAL-CONSISTENCY CONSTRAINTS FOR EMERGENT COMMUNICATION |
1399 | Bio-Inspired Hashing for Unsupervised Similarity Search |
1400 | Simplicial Complex Networks |
1401 | BEYOND SUPERVISED LEARNING: RECOGNIZING UNSEEN ATTRIBUTE-OBJECT PAIRS WITH VISION-LANGUAGE FUSION AND ATTRACTOR NETWORKS |
1402 | Underwhelming Generalization Improvements From Controlling Feature Attribution |
1403 | Graph Constrained Reinforcement Learning for Natural Language Action Spaces |
1404 | Solving Packing Problems by Conditional Query Learning |
1405 | Task-Relevant Adversarial Imitation Learning |
1406 | Generative Restricted Kernel Machines |
1407 | Towards Fast Adaptation of Neural Architectures with Meta Learning |
1408 | RL-ST: Reinforcing Style, Fluency and Content Preservation for Unsupervised Text Style Transfer |
1409 | A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU Networks |
1410 | Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling |
1411 | Toward Understanding Generalization of Over-parameterized Deep ReLU network trained with SGD in Student-teacher Setting |
1412 | Asymptotics of Wide Networks from Feynman Diagrams |
1413 | Symplectic Recurrent Neural Networks |
1414 | Representational Disentanglement for Multi-Domain Image Completion |
1415 | Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks |
1416 | Learning Cross-Context Entity Representations from Text |
1417 | SPECTRA: Sparse Entity-centric Transitions |
1418 | DeepSimplex: Reinforcement Learning of Pivot Rules Improves the Efficiency of Simplex Algorithm in Solving Linear Programming Problems |
1419 | Learning Temporal Abstraction with Information-theoretic Constraints for Hierarchical Reinforcement Learning |
1420 | Selective Brain Damage: Measuring the Disparate Impact of Model Pruning |
1421 | Asynchronous Stochastic Subgradient Methods for General Nonsmooth Nonconvex Optimization |
1422 | Improved Structural Discovery and Representation Learning of Multi-Agent Data |
1423 | Quantized Reinforcement Learning (QuaRL) |
1424 | R-TRANSFORMER: RECURRENT NEURAL NETWORK ENHANCED TRANSFORMER |
1425 | NADS: Neural Architecture Distribution Search for Uncertainty Awareness |
1426 | Rigging the Lottery: Making All Tickets Winners |
1427 | CAPACITY-LIMITED REINFORCEMENT LEARNING: APPLICATIONS IN DEEP ACTOR-CRITIC METHODS FOR CONTINUOUS CONTROL |
1428 | Discovering the compositional structure of vector representations with Role Learning Networks |
1429 | Higher-Order Function Networks for Learning Composable 3D Object Representations |
1430 | Adapting to Label Shift with Bias-Corrected Calibration |
1431 | Neural Module Networks for Reasoning over Text |
1432 | Strong Baseline Defenses Against Clean-Label Poisoning Attacks |
1433 | MANIFOLD FORESTS: CLOSING THE GAP ON NEURAL NETWORKS |
1434 | Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees |
1435 | Improved memory in recurrent neural networks with sequential non-normal dynamics |
1436 | Model Imitation for Model-Based Reinforcement Learning |
1437 | Embodied Language Grounding with Implicit 3D Visual Feature Representations |
1438 | Likelihood Contribution based Multi-scale Architecture for Generative Flows |
1439 | A Base Model Selection Methodology for Efficient Fine-Tuning |
1440 | Rethinking Curriculum Learning With Incremental Labels And Adaptive Compensation |
1441 | Graph Neural Networks for Reasoning 2-Quantified Boolean Formulas |
1442 | Learn to Explain Efficiently via Neural Logic Inductive Learning |
1443 | NormLime: A New Feature Importance Metric for Explaining Deep Neural Networks |
1444 | Pre-trained Contextual Embedding of Source Code |
1445 | Certified Robustness to Adversarial Label-Flipping Attacks via Randomized Smoothing |
1446 | Benefit of Interpolation in Nearest Neighbor Algorithms |
1447 | {COMPANYNAME}11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery |
1448 | Neural Clustering Processes |
1449 | Improving Neural Language Generation with Spectrum Control |
1450 | Span Recovery for Deep Neural Networks with Applications to Input Obfuscation |
1451 | Unknown-Aware Deep Neural Network |
1452 | MODELLING BIOLOGICAL ASSAYS WITH ADAPTIVE DEEP KERNEL LEARNING |
1453 | A Memory-augmented Neural Network by Resembling Human Cognitive Process of Memorization |
1454 | A Perturbation Analysis of Input Transformations for Adversarial Attacks |
1455 | ADA+: A GENERIC FRAMEWORK WITH MORE ADAPTIVE EXPLICIT ADJUSTMENT FOR LEARNING RATE |
1456 | Locally Constant Networks |
1457 | Smooth Kernels Improve Adversarial Robustness and Perceptually-Aligned Gradients |
1458 | Multi-View Summarization and Activity Recognition Meet Edge Computing in IoT Environments |
1459 | Neural ODEs for Image Segmentation with Level Sets |
1460 | Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations |
1461 | PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction |
1462 | Low Rank Training of Deep Neural Networks for Emerging Memory Technology |
1463 | Decentralized Distributed PPO: Mastering PointGoal Navigation |
1464 | MultiGrain: a unified image embedding for classes and instances |
1465 | Learning to Learn by Zeroth-Order Oracle |
1466 | Neural Embeddings for Nearest Neighbor Search Under Edit Distance |
1467 | ADAPTING PRETRAINED LANGUAGE MODELS FOR LONG DOCUMENT CLASSIFICATION |
1468 | Robust Federated Learning Through Representation Matching and Adaptive Hyper-parameters |
1469 | ROS-HPL: Robotic Object Search with Hierarchical Policy Learning and Intrinsic-Extrinsic Modeling |
1470 | Knockoff-Inspired Feature Selection via Generative Models |
1471 | MetaPix: Few-Shot Video Retargeting |
1472 | SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum |
1473 | Stochastic Prototype Embeddings |
1474 | Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog |
1475 | Generalized Transformation-based Gradient |
1476 | Targeted sampling of enlarged neighborhood via Monte Carlo tree search for TSP |
1477 | Black-box Adversarial Attacks with Bayesian Optimization |
1478 | Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving |
1479 | Learning to Combat Compounding-Error in Model-Based Reinforcement Learning |
1480 | Understanding Attention Mechanisms |
1481 | Beyond GANs: Transforming without a Target Distribution |
1482 | Four Things Everyone Should Know to Improve Batch Normalization |
1483 | Learning to solve the credit assignment problem |
1484 | Improving Multi-Manifold GANs with a Learned Noise Prior |
1485 | Overparameterized Neural Networks Can Implement Associative Memory |
1486 | Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts |
1487 | Sampling-Free Learning of Bayesian Quantized Neural Networks |
1488 | A Hierarchy of Graph Neural Networks Based on Learnable Local Features |
1489 | The Blessing of Dimensionality: An Empirical Study of Generalization |
1490 | DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling |
1491 | NEURAL EXECUTION ENGINES |
1492 | Learning to Make Generalizable and Diverse Predictions for Retrosynthesis |
1493 | Disentangled GANs for Controllable Generation of High-Resolution Images |
1494 | Continuous Graph Flow |
1495 | Benchmarking Adversarial Robustness |
1496 | ROBUST SINGLE-STEP ADVERSARIAL TRAINING |
1497 | Wasserstein-Bounded Generative Adversarial Networks |
1498 | DBA: Distributed Backdoor Attacks against Federated Learning |
1499 | Learning Generative Models using Denoising Density Estimators |
1500 | Fast is better than free: Revisiting adversarial training |
1501 | LOSSLESS SINGLE IMAGE SUPER RESOLUTION FROM LOW-QUALITY JPG IMAGES |
1502 | Improving Neural Abstractive Summarization Using Transfer Learning and Factuality-Based Evaluation: Towards Automating Science Journalism |
1503 | Deep Multivariate Mixture of Gaussians for Object Detection under Occlusion |
1504 | iWGAN: an Autoencoder WGAN for Inference |
1505 | BERT-AL: BERT for Arbitrarily Long Document Understanding |
1506 | Novelty Search in representational space for sample efficient exploration |
1507 | Switched linear projections and inactive state sensitivity for deep neural network interpretability |
1508 | An Optimization Principle Of Deep Learning? |
1509 | Testing Robustness Against Unforeseen Adversaries |
1510 | Thieves on Sesame Street! Model Extraction of BERT-based APIs |
1511 | Understanding Knowledge Distillation in Non-autoregressive Machine Translation |
1512 | Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning |
1513 | Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data |
1514 | Locality and Compositionality in Zero-Shot Learning |
1515 | Optimistic Adaptive Acceleration for Optimization |
1516 | Situating Sentence Embedders with Nearest Neighbor Overlap |
1517 | Posterior Sampling: Make Reinforcement Learning Sample Efficient Again |
1518 | Generalized Clustering by Learning to Optimize Expected Normalized Cuts |
1519 | Mix-review: Alleviate Forgetting in the Pretrain-Finetune Framework for Neural Language Generation Models |
1520 | The function of contextual illusions |
1521 | Disentangling neural mechanisms for perceptual grouping |
1522 | Adversarial Imitation Attack |
1523 | Regularizing Trajectories to Mitigate Catastrophic Forgetting |
1524 | When Do Variational Autoencoders Know What They Don't Know? |
1525 | Semantic Pruning for Single Class Interpretability |
1526 | Analyzing the Role of Model Uncertainty for Electronic Health Records |
1527 | Chameleon: Adaptive Code Optimization For Expedited Deep Neural Network Compilation |
1528 | Weakly-supervised Knowledge Graph Alignment with Adversarial Learning |
1529 | Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders |
1530 | Not All Features Are Equal: Feature Leveling Deep Neural Networks for Better Interpretation |
1531 | Intrinsic Motivation for Encouraging Synergistic Behavior |
1532 | Noisy Machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation |
1533 | Perceptual Regularization: Visualizing and Learning Generalizable Representations |
1534 | Neural networks with motivation |
1535 | Improving One-Shot NAS By Suppressing The Posterior Fading |
1536 | Toward Amortized Ranking-Critical Training For Collaborative Filtering |
1537 | ALBERT: A Lite BERT for Self-supervised Learning of Language Representations |
1538 | Curriculum Learning for Deep Generative Models with Clustering |
1539 | Should All Cross-Lingual Embeddings Speak English? |
1540 | Sign-OPT: A Query-Efficient Hard-label Adversarial Attack |
1541 | Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP |
1542 | Learning Space Partitions for Nearest Neighbor Search |
1543 | Visual Interpretability Alone Helps Adversarial Robustness |
1544 | One-Shot Neural Architecture Search via Compressive Sensing |
1545 | Learning Adversarial Grammars for Future Prediction |
1546 | End-to-end named entity recognition and relation extraction using pre-trained language models |
1547 | How noise affects the Hessian spectrum in overparameterized neural networks |
1548 | A Simple Recurrent Unit with Reduced Tensor Product Representations |
1549 | Parallel Neural Text-to-Speech |
1550 | Context-Aware Object Detection With Convolutional Neural Networks |
1551 | DeepV2D: Video to Depth with Differentiable Structure from Motion |
1552 | TPO: TREE SEARCH POLICY OPTIMIZATION FOR CONTINUOUS ACTION SPACES |
1553 | Gaussian Process Meta-Representations Of Neural Networks |
1554 | CAN ALTQ LEARN FASTER: EXPERIMENTS AND THEORY |
1555 | The Break-Even Point on the Optimization Trajectories of Deep Neural Networks |
1556 | Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets |
1557 | Exploration Based Language Learning for Text-Based Games |
1558 | Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks |
1559 | CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning |
1560 | Deep Imitative Models for Flexible Inference, Planning, and Control |
1561 | Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness |
1562 | Defensive Quantization Layer For Convolutional Network Against Adversarial Attack |
1563 | Defective Convolutional Layers Learn Robust CNNs |
1564 | DASGrad: Double Adaptive Stochastic Gradient |
1565 | Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning |
1566 | The Logical Expressiveness of Graph Neural Networks |
1567 | GOING BEYOND TOKEN-LEVEL PRE-TRAINING FOR EMBEDDING-BASED LARGE-SCALE RETRIEVAL |
1568 | Conditional Out-of-Sample Generation For Unpaired Data using trVAE |
1569 | The Benefits of Over-parameterization at Initialization in Deep ReLU Networks |
1570 | UniLoss: Unified Surrogate Loss by Adaptive Interpolation |
1571 | A Training Scheme for the Uncertain Neuromorphic Computing Chips |
1572 | Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently |
1573 | Deep Graph Translation |
1574 | Are Transformers universal approximators of sequence-to-sequence functions? |
1575 | Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples |
1576 | Decoupling Weight Regularization from Batch Size for Model Compression |
1577 | Zero-Shot Out-of-Distribution Detection with Feature Correlations |
1578 | Proactive Sequence Generator via Knowledge Acquisition |
1579 | Interpretable Deep Neural Network Models: Hybrid of Image Kernels and Neural Networks |
1580 | Multi-scale Attributed Node Embedding |
1581 | $\textrm{D}^2$GAN: A Few-Shot Learning Approach with Diverse and Discriminative Feature Synthesis |
1582 | Understanding the functional and structural differences across excitatory and inhibitory neurons |
1583 | One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation |
1584 | Differentially Private Meta-Learning |
1585 | Leveraging Adversarial Examples to Obtain Robust Second-Order Representations |
1586 | CLEVRER: Collision Events for Video Representation and Reasoning |
1587 | Using Logical Specifications of Objectives in Multi-Objective Reinforcement Learning |
1588 | Efficient Training of Robust and Verifiable Neural Networks |
1589 | Learning Compositional Koopman Operators for Model-Based Control |
1590 | Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness |
1591 | Confidence-Calibrated Adversarial Training: Towards Robust Models Generalizing Beyond the Attack Used During Training |
1592 | All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization |
1593 | Generating Dialogue Responses From A Semantic Latent Space |
1594 | Is There Mode Collapse? A Case Study on Face Generation and Its Black-box Calibration |
1595 | Overlearning Reveals Sensitive Attributes |
1596 | Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial Attacks |
1597 | A Kolmogorov Complexity Approach to Generalization in Deep Learning |
1598 | Towards Modular Algorithm Induction |
1599 | Optimal Strategies Against Generative Attacks |
1600 | One Generation Knowledge Distillation by Utilizing Peer Samples |
1601 | Stein Self-Repulsive Dynamics: Benefits from Past Samples |
1602 | Adversarially robust transfer learning |
1603 | One Demonstration Imitation Learning |
1604 | Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation |
1605 | Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning |
1606 | Improving Irregularly Sampled Time Series Learning with Dense Descriptors of Time |
1607 | Contextual Text Style Transfer |
1608 | Modeling question asking using neural program generation |
1609 | Learning to Link |
1610 | Adversarial Attacks on Copyright Detection Systems |
1611 | Detecting Extrapolation with Local Ensembles |
1612 | Revisiting Fine-tuning for Few-shot Learning |
1613 | Global Relational Models of Source Code |
1614 | MONET: Debiasing Graph Embeddings via the Metadata-Orthogonal Training Unit |
1615 | Selection via Proxy: Efficient Data Selection for Deep Learning |
1616 | Deep Learning-Based Average Consensus |
1617 | Meta Learning via Learned Loss |
1618 | Short and Sparse Deconvolution --- A Geometric Approach |
1619 | If MaxEnt RL is the Answer, What is the Question? |
1620 | Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well |
1621 | Characterizing Missing Information in Deep Networks Using Backpropagated Gradients |
1622 | INVOCMAP: MAPPING METHOD NAMES TO METHOD INVOCATIONS VIA MACHINE LEARNING |
1623 | Scaleable input gradient regularization for adversarial robustness |
1624 | Adjustable Real-time Style Transfer |
1625 | Unsupervised Progressive Learning and the STAM Architecture |
1626 | Wasserstein Robust Reinforcement Learning |
1627 | Knowledge Hypergraphs: Prediction Beyond Binary Relations |
1628 | Dynamics-Aware Unsupervised Skill Discovery |
1629 | A Fine-Grained Spectral Perspective on Neural Networks |
1630 | Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent |
1631 | UNPAIRED POINT CLOUD COMPLETION ON REAL SCANS USING ADVERSARIAL TRAINING |
1632 | Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform |
1633 | DIME: AN INFORMATION-THEORETIC DIFFICULTY MEASURE FOR AI DATASETS |
1634 | Structured consistency loss for semi-supervised semantic segmentation |
1635 | AMRL: Aggregated Memory For Reinforcement Learning |
1636 | Adapting Behaviour for Learning Progress |
1637 | Pretraining boosts out-of-domain robustness for pose estimation |
1638 | GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning |
1639 | Synthetic vs Real: Deep Learning on Controlled Noise |
1640 | Detecting malicious PDF using CNN |
1641 | NESTED LEARNING FOR MULTI-GRANULAR TASKS |
1642 | Scalable Model Compression by Entropy Penalized Reparameterization |
1643 | Stochastic Geodesic Optimization for Neural Networks |
1644 | Dynamic Time Lag Regression: Predicting What & When |
1645 | Scholastic-Actor-Critic For Multi Agent Reinforcement Learning |
1646 | On summarized validation curves and generalization |
1647 | Convolutional Bipartite Attractor Networks |
1648 | Anomaly Detection by Deep Direct Density Ratio Estimation |
1649 | New Loss Functions for Fast Maximum Inner Product Search |
1650 | Lipschitz Lifelong Reinforcement Learning |
1651 | Local Label Propagation for Large-Scale Semi-Supervised Learning |
1652 | GumbelClip: Off-Policy Actor-Critic Using Experience Replay |
1653 | Going Deeper with Lean Point Networks |
1654 | Improved Mutual Information Estimation |
1655 | Semi-Supervised Generative Modeling for Controllable Speech Synthesis |
1656 | Towards Physics-informed Deep Learning for Turbulent Flow Prediction |
1657 | Unsupervised Learning from Video with Deep Neural Embeddings |
1658 | Neural Text Generation With Unlikelihood Training |
1659 | Pure and Spurious Critical Points: a Geometric Study of Linear Networks |
1660 | Surrogate-Based Constrained Langevin Sampling With Applications to Optimal Material Configuration Design |
1661 | Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning |
1662 | Mean Field Models for Neural Networks in Teacher-student Setting |
1663 | A Causal View on Robustness of Neural Networks |
1664 | Striving for Simplicity in Off-Policy Deep Reinforcement Learning |
1665 | White Box Network: Obtaining a right composition ordering of functions |
1666 | Deep neuroethology of a virtual rodent |
1667 | DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression |
1668 | Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks |
1669 | Causal Induction from Visual Observations for Goal Directed Tasks |
1670 | Duration-of-Stay Storage Assignment under Uncertainty |
1671 | CAQL: Continuous Action Q-Learning |
1672 | GRAPH ANALYSIS AND GRAPH POOLING IN THE SPATIAL DOMAIN |
1673 | Your classifier is secretly an energy based model and you should treat it like one |
1674 | On the Linguistic Capacity of Real-time Counter Automata |
1675 | Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels |
1676 | Adaptive Structural Fingerprints for Graph Attention Networks |
1677 | Inductive Matrix Completion Based on Graph Neural Networks |
1678 | Neural Operator Search |
1679 | Time2Vec: Learning a Vector Representation of Time |
1680 | ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring |
1681 | Conditional Learning of Fair Representations |
1682 | Mean-field Behaviour of Neural Tangent Kernel for Deep Neural Networks |
1683 | TabNet: Attentive Interpretable Tabular Learning |
1684 | Adapt-to-Learn: Policy Transfer in Reinforcement Learning |
1685 | Identity Crisis: Memorization and Generalization Under Extreme Overparameterization |
1686 | Stiffness: A New Perspective on Generalization in Neural Networks |
1687 | Linguistic Embeddings as a Common-Sense Knowledge Repository: Challenges and Opportunities |
1688 | First-Order Preconditioning via Hypergradient Descent |
1689 | Feature Partitioning for Efficient Multi-Task Architectures |
1690 | Layer Flexible Adaptive Computation Time for Recurrent Neural Networks |
1691 | Curvature-based Robustness Certificates against Adversarial Examples |
1692 | Adversarial Video Generation on Complex Datasets |
1693 | Topological Autoencoders |
1694 | Context-Gated Convolution |
1695 | Reinforcement Learning without Ground-Truth State |
1696 | Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin |
1697 | In-Domain Representation Learning For Remote Sensing |
1698 | Training Neural Networks for and by Interpolation |
1699 | FAN: Focused Attention Networks |
1700 | Unsupervised Data Augmentation for Consistency Training |
1701 | Assessing Generalization in TD methods for Deep Reinforcement Learning |
1702 | Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning |
1703 | Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning? |
1704 | The Effect of Neural Net Architecture on Gradient Confusion & Training Performance |
1705 | Making DenseNet Interpretable: A Case Study in Clinical Radiology |
1706 | Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space |
1707 | Regularizing Deep Multi-Task Networks using Orthogonal Gradients |
1708 | Fast Training of Sparse Graph Neural Networks on Dense Hardware |
1709 | Simultaneous Classification and Out-of-Distribution Detection Using Deep Neural Networks |
1710 | Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML |
1711 | Long-term planning, short-term adjustments |
1712 | Imitation Learning via Off-Policy Distribution Matching |
1713 | Unsupervised Learning of Automotive 3D Crash Simulations using LSTMs |
1714 | Augmenting Transformers with KNN-Based Composite Memory |
1715 | SGD with Hardness Weighted Sampling for Distributionally Robust Deep Learning |
1716 | Constrained Markov Decision Processes via Backward Value Functions |
1717 | Reanalysis of Variance Reduced Temporal Difference Learning |
1718 | Meta-Learning for Variational Inference |
1719 | CONFEDERATED MACHINE LEARNING ON HORIZONTALLY AND VERTICALLY SEPARATED MEDICAL DATA FOR LARGE-SCALE HEALTH SYSTEM INTELLIGENCE |
1720 | Defending Against Adversarial Examples by Regularized Deep Embedding |
1721 | Minimizing FLOPs to Learn Efficient Sparse Representations |
1722 | Neural-Guided Symbolic Regression with Asymptotic Constraints |
1723 | Policy Optimization In the Face of Uncertainty |
1724 | DropGrad: Gradient Dropout Regularization for Meta-Learning |
1725 | Understanding Top-k Sparsification in Distributed Deep Learning |
1726 | Entropy Penalty: Towards Generalization Beyond the IID Assumption |
1727 | Improving Semantic Parsing with Neural Generator-Reranker Architecture |
1728 | Learning a Behavioral Repertoire from Demonstrations |
1729 | GRAPH NEIGHBORHOOD ATTENTIVE POOLING |
1730 | Deep symbolic regression |
1731 | Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification |
1732 | Doubly Normalized Attention |
1733 | Uncertainty-Aware Prediction for Graph Neural Networks |
1734 | Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space |
1735 | Lattice Representation Learning |
1736 | Omnibus Dropout for Improving The Probabilistic Classification Outputs of ConvNets |
1737 | Deep Multiple Instance Learning for Taxonomic Classification of Metagenomic read sets |
1738 | Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints |
1739 | RoBERTa: A Robustly Optimized BERT Pretraining Approach |
1740 | Deep Semi-Supervised Anomaly Detection |
1741 | GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation |
1742 | Out-of-distribution Detection in Few-shot Classification |
1743 | Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification |
1744 | Mirror-Generative Neural Machine Translation |
1745 | Frustratingly easy quasi-multitask learning |
1746 | Interpreting video features: a comparison of 3D convolutional networks and convolutional LSTM networks |
1747 | TrojanNet: Exposing the Danger of Trojan Horse Attack on Neural Networks |
1748 | Robust Learning with Jacobian Regularization |
1749 | Generalized Inner Loop Meta-Learning |
1750 | Sign Bits Are All You Need for Black-Box Attacks |
1751 | Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech |
1752 | Pre-training as Batch Meta Reinforcement Learning with tiMe |
1753 | On Global Feature Pooling for Fine-grained Visual Categorization |
1754 | Exploring by Exploiting Bad Models in Model-Based Reinforcement Learning |
1755 | Reinforced active learning for image segmentation |
1756 | Variational inference of latent hierarchical dynamical systems in neuroscience: an application to calcium imaging data |
1757 | Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search |
1758 | Gradientless Descent: High-Dimensional Zeroth-Order Optimization |
1759 | Equivariant Entity-Relationship Networks |
1760 | Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning |
1761 | Unsupervised Few-shot Object Recognition by Integrating Adversarial, Self-supervision, and Deep Metric Learning of Latent Parts |
1762 | On the "steerability" of generative adversarial networks |
1763 | GASL: Guided Attention for Sparsity Learning in Deep Neural Networks |
1764 | Affine Self Convolution |
1765 | Improving Differentially Private Models with Active Learning |
1766 | Matrix Multilayer Perceptron |
1767 | BEAN: Interpretable Representation Learning with Biologically-Enhanced Artificial Neuronal Assembly Regularization |
1768 | Feature-Robustness, Flatness and Generalization Error for Deep Neural Networks |
1769 | TriMap: Large-scale Dimensionality Reduction Using Triplets |
1770 | LEARNED STEP SIZE QUANTIZATION |
1771 | Frontal low-rank random tensors for high-order feature representation |
1772 | Learning General and Reusable Features via Racecar-Training |
1773 | Higher-order Weighted Graph Convolutional Networks |
1774 | Estimating counterfactual treatment outcomes over time through adversarially balanced representations |
1775 | Poincaré Wasserstein Autoencoder |
1776 | Robust Instruction-Following in a Situated Agent via Transfer-Learning from Text |
1777 | Stochastic Conditional Generative Networks with Basis Decomposition |
1778 | Task-Based Top-Down Modulation Network for Multi-Task-Learning Applications |
1779 | Global reasoning network for image super-resolution |
1780 | Tensor Graph Convolutional Networks for Prediction on Dynamic Graphs |
1781 | Matching Distributions via Optimal Transport for Semi-Supervised Learning |
1782 | GraphNVP: an Invertible Flow-based Model for Generating Molecular Graphs |
1783 | Language GANs Falling Short |
1784 | GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations |
1785 | Last-iterate convergence rates for min-max optimization |
1786 | Poisoning Attacks with Generative Adversarial Nets |
1787 | Parameterized Action Reinforcement Learning for Inverted Index Match Plan Generation |
1788 | Learnable Group Transform For Time-Series |
1789 | From English to Foreign Languages: Transferring Pre-trained Language Models |
1790 | COPHY: Counterfactual Learning of Physical Dynamics |
1791 | Semi-Supervised Few-Shot Learning with Prototypical Random Walks |
1792 | Why Convolutional Networks Learn Oriented Bandpass Filters: A Hypothesis |
1793 | Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning |
1794 | Unsupervised Out-of-Distribution Detection with Batch Normalization |
1795 | Understanding the Limitations of Variational Mutual Information Estimators |
1796 | Latent Question Reformulation and Information Accumulation for Multi-Hop Machine Reading |
1797 | Hamiltonian Generative Networks |
1798 | Customizing Sequence Generation with Multi-Task Dynamical Systems |
1799 | Extracting and Leveraging Feature Interaction Interpretations |
1800 | Zero-Shot Medical Image Artifact Reduction |
1801 | Quantum Expectation-Maximization for Gaussian Mixture Models |
1802 | Behavior Regularized Offline Reinforcement Learning |
1803 | Encoder-Agnostic Adaptation for Conditional Language Generation |
1804 | Optimizing Data Usage via Differentiable Rewards |
1805 | Dropout: Explicit Forms and Capacity Control |
1806 | Training Interpretable Convolutional Neural Networks towards Class-specific Filters |
1807 | Faster Neural Network Training with Data Echoing |
1808 | Kronecker Attention Networks |
1809 | Farkas layers: don't shift the data, fix the geometry |
1810 | Non-Gaussian processes and neural networks at finite widths |
1811 | Unsupervised Model Selection for Variational Disentangled Representation Learning |
1812 | Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation |
1813 | How much Position Information Do Convolutional Neural Networks Encode? |
1814 | A Theoretical Analysis of the Number of Shots in Few-Shot Learning |
1815 | Event extraction from unstructured Amharic text |
1816 | Representation Learning for Remote Sensing: An Unsupervised Sensor Fusion Approach |
1817 | Natural Language State Representation for Reinforcement Learning |
1818 | Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery |
1819 | Project and Forget: Solving Large Scale Metric Constrained Problems |
1820 | On the Variance of the Adaptive Learning Rate and Beyond |
1821 | Translation Between Waves, wave2wave |
1822 | Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations |
1823 | Improving End-to-End Object Tracking Using Relational Reasoning |
1824 | Attention Privileged Reinforcement Learning for Domain Transfer |
1825 | Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations |
1826 | On Variational Learning of Controllable Representations for Text without Supervision |
1827 | Disentangled Representation Learning with Sequential Residual Variational Autoencoder |
1828 | Improved Training Speed, Accuracy, and Data Utilization via Loss Function Optimization |
1829 | Using Hindsight to Anchor Past Knowledge in Continual Learning |
1830 | Empirical confidence estimates for classification by deep neural networks |
1831 | iSOM-GSN: An Integrative Approach for Transforming Multi-omic Data into Gene Similarity Networks via Self-organizing Maps |
1832 | Learning Numeral Embedding |
1833 | Localized Generations with Deep Neural Networks for Multi-Scale Structured Datasets |
1834 | AlgoNet: $C^\infty$ Smooth Algorithmic Neural Networks |
1835 | Temporal-difference learning for nonlinear value function approximation in the lazy training regime |
1836 | A Bayes-Optimal View on Adversarial Examples |
1837 | Efficient Content-Based Sparse Attention with Routing Transformers |
1838 | Good Semi-supervised VAE Requires Tighter Evidence Lower Bound |
1839 | Option Discovery using Deep Skill Chaining |
1840 | HOPPITY: LEARNING GRAPH TRANSFORMATIONS TO DETECT AND FIX BUGS IN PROGRAMS |
1841 | PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization |
1842 | Deep Randomized Least Squares Value Iteration |
1843 | Self-Supervised Policy Adaptation |
1844 | RTC-VAE: HARNESSING THE PECULIARITY OF TOTAL CORRELATION IN LEARNING DISENTANGLED REPRESENTATIONS |
1845 | OmniNet: A unified architecture for multi-modal multi-task learning |
1846 | Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition |
1847 | LEVERAGING AUXILIARY TEXT FOR DEEP RECOGNITION OF UNSEEN VISUAL RELATIONSHIPS |
1848 | TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising |
1849 | V4D: 4D Covolutional Neural Networks for Video-level Representations Learning |
1850 | ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs |
1851 | Learning to Represent Programs with Property Signatures |
1852 | Unified recurrent network for many feature types |
1853 | Restoration of Video Frames from a Single Blurred Image with Motion Understanding |
1854 | Improving Dirichlet Prior Network for Out-of-Distribution Example Detection |
1855 | Variational Autoencoders for Opponent Modeling in Multi-Agent Systems |
1856 | Prototype Recalls for Continual Learning |
1857 | Generative Ratio Matching Networks |
1858 | Emergence of Compositional Language with Deep Generational Transmission |
1859 | Deep Gradient Boosting -- Layer-wise Input Normalization of Neural Networks |
1860 | A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models |
1861 | Bridging ELBO objective and MMD |
1862 | In Search for a SAT-friendly Binarized Neural Network Architecture |
1863 | EfferenceNets for latent space planning |
1864 | Neural networks are a priori biased towards Boolean functions with low entropy |
1865 | DUAL ADVERSARIAL MODEL FOR GENERATING 3D POINT CLOUD |
1866 | Wider Networks Learn Better Features |
1867 | Conditional Invertible Neural Networks for Guided Image Generation |
1868 | Cost-Effective Testing of a Deep Learning Model through Input Reduction |
1869 | Hebbian Graph Embeddings |
1870 | NeuralUCB: Contextual Bandits with Neural Network-Based Exploration |
1871 | Meta-Graph: Few shot Link Prediction via Meta Learning |
1872 | Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games |
1873 | An implicit function learning approach for parametric modal regression |
1874 | The asymptotic spectrum of the Hessian of DNN throughout training |
1875 | Auto-Encoding Explanatory Examples |
1876 | RISE and DISE: Two Frameworks for Learning from Time Series with Missing Data |
1877 | Fast Machine Learning with Byzantine Workers and Servers |
1878 | How the Softmax Activation Hinders the Detection of Adversarial and Out-of-Distribution Examples in Neural Networks |
1879 | Tree-Structured Attention with Hierarchical Accumulation |
1880 | Deep 3D Pan via Local adaptive "t-shaped" convolutions with global and local adaptive dilations |
1881 | MANAS: Multi-Agent Neural Architecture Search |
1882 | SimulS2S: End-to-End Simultaneous Speech to Speech Translation |
1883 | Enhancing Attention with Explicit Phrasal Alignments |
1884 | LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning |
1885 | Robust saliency maps with distribution-preserving decoys |
1886 | Role of two learning rates in convergence of model-agnostic meta-learning |
1887 | Low-Resource Knowledge-Grounded Dialogue Generation |
1888 | Generative Multi Source Domain Adaptation |
1889 | GResNet: Graph Residual Network for Reviving Deep GNNs from Suspended Animation |
1890 | Realism Index: Interpolation in Generative Models With Arbitrary Prior |
1891 | Deep RL for Blood Glucose Control: Lessons, Challenges, and Opportunities |
1892 | A TARGET-AGNOSTIC ATTACK ON DEEP MODELS: EXPLOITING SECURITY VULNERABILITIES OF TRANSFER LEARNING |
1893 | Training Provably Robust Models by Polyhedral Envelope Regularization |
1894 | FleXOR: Trainable Fractional Quantization |
1895 | DP-LSSGD: An Optimization Method to Lift the Utility in Privacy-Preserving ERM |
1896 | Multi-Task Learning via Scale Aware Feature Pyramid Networks and Effective Joint Head |
1897 | AdaX: Adaptive Gradient Descent with Exponential Long Term Memory |
1898 | ON COMPUTATION AND GENERALIZATION OF GENER- ATIVE ADVERSARIAL IMITATION LEARNING |
1899 | Disentangling Improves VAEs' Robustness to Adversarial Attacks |
1900 | Sparsity Meets Robustness: Channel Pruning for the Feynman-Kac Formalism Principled Robust Deep Neural Nets |
1901 | FEW-SHOT LEARNING ON GRAPHS VIA SUPER-CLASSES BASED ON GRAPH SPECTRAL MEASURES |
1902 | On Recovering Latent Factors From Sampling And Firing Graph |
1903 | Influence-Based Multi-Agent Exploration |
1904 | Demonstration Actor Critic |
1905 | Deep Coordination Graphs |
1906 | Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation |
1907 | How Well Do WGANs Estimate the Wasserstein Metric? |
1908 | Revisiting the Generalization of Adaptive Gradient Methods |
1909 | An Information Theoretic Perspective on Disentangled Representation Learning |
1910 | Multiplicative Interactions and Where to Find Them |
1911 | SELF-KNOWLEDGE DISTILLATION ADVERSARIAL ATTACK |
1912 | DIVA: Domain Invariant Variational Autoencoder |
1913 | Continual Learning with Bayesian Neural Networks for Non-Stationary Data |
1914 | RPGAN: random paths as a latent space for GAN interpretability |
1915 | SAdam: A Variant of Adam for Strongly Convex Functions |
1916 | Improving the Generalization of Visual Navigation Policies using Invariance Regularization |
1917 | Improving the robustness of ImageNet classifiers using elements of human visual cognition |
1918 | Differentially Private Survival Function Estimation |
1919 | Size-free generalization bounds for convolutional neural networks |
1920 | Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks |
1921 | A Fair Comparison of Graph Neural Networks for Graph Classification |
1922 | Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents |
1923 | Computation Reallocation for Object Detection |
1924 | MULTI-LABEL METRIC LEARNING WITH BIDIRECTIONAL REPRESENTATION DEEP NEURAL NETWORKS |
1925 | Sparse Networks from Scratch: Faster Training without Losing Performance |
1926 | Modeling Winner-Take-All Competition in Sparse Binary Projections |
1927 | Laplacian Denoising Autoencoder |
1928 | Training Data Distribution Search with Ensemble Active Learning |
1929 | Meta-Learning without Memorization |
1930 | COMMUNITY PRESERVING NODE EMBEDDING |
1931 | From Variational to Deterministic Autoencoders |
1932 | Adversarially Robust Representations with Smooth Encoders |
1933 | AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures |
1934 | Representation Quality Explain Adversarial Attacks |
1935 | Inferring Dynamical Systems with Long-Range Dependencies through Line Attractor Regularization |
1936 | End-To-End Input Selection for Deep Neural Networks |
1937 | Hierarchical Graph-to-Graph Translation for Molecules |
1938 | Teaching GAN to generate per-pixel annotation |
1939 | ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning |
1940 | DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine |
1941 | A NEW POINTWISE CONVOLUTION IN DEEP NEURAL NETWORKS THROUGH EXTREMELY FAST AND NON PARAMETRIC TRANSFORMS |
1942 | Decaying momentum helps neural network training |
1943 | Regularizing Black-box Models for Improved Interpretability |
1944 | GPNET: MONOCULAR 3D VEHICLE DETECTION BASED ON LIGHTWEIGHT WHEEL GROUNDING POINT DETECTION NETWORK |
1945 | Needles in Haystacks: On Classifying Tiny Objects in Large Images |
1946 | Quadratic GCN for graph classification |
1947 | The advantage of using Student's t-priors in variational autoencoders |
1948 | Finite Depth and Width Corrections to the Neural Tangent Kernel |
1949 | Order Learning and Its Application to Age Estimation |
1950 | Couple-VAE: Mitigating the Encoder-Decoder Incompatibility in Variational Text Modeling with Coupled Deterministic Networks |
1951 | Distilling Neural Networks for Faster and Greener Dependency Parsing |
1952 | Model-based Saliency for the Detection of Adversarial Examples |
1953 | Online Meta-Critic Learning for Off-Policy Actor-Critic Methods |
1954 | BUZz: BUffer Zones for defending adversarial examples in image classification |
1955 | Efficient and Information-Preserving Future Frame Prediction and Beyond |
1956 | Path Space for Recurrent Neural Networks with ReLU Activations |
1957 | Wasserstein Adversarial Regularization (WAR) on label noise |
1958 | Self-Supervised Speech Recognition via Local Prior Matching |
1959 | SRDGAN: learning the noise prior for Super Resolution with Dual Generative Adversarial Networks |
1960 | Amata: An Annealing Mechanism for Adversarial Training Acceleration |
1961 | An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on Smoothly Varying Weight Hypothesis |
1962 | Context Based Machine Translation With Recurrent Neural Network For English-Amharic Translation |
1963 | Robust Domain Randomization for Reinforcement Learning |
1964 | NAS evaluation is frustratingly hard |
1965 | Ellipsoidal Trust Region Methods for Neural Network Training |
1966 | Learning Semantically Meaningful Representations Through Embodiment |
1967 | Superseding Model Scaling by Penalizing Dead Units and Points with Separation Constraints |
1968 | Artificial Design: Modeling Artificial Super Intelligence with Extended General Relativity and Universal Darwinism via Geometrization for Universal Design Automation |
1969 | Robust Graph Representation Learning via Neural Sparsification |
1970 | Hyperbolic Discounting and Learning Over Multiple Horizons |
1971 | CLN2INV: Learning Loop Invariants with Continuous Logic Networks |
1972 | Gated Channel Transformation for Visual Recognition |
1973 | Federated User Representation Learning |
1974 | INSTANCE CROSS ENTROPY FOR DEEP METRIC LEARNING |
1975 | Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base |
1976 | Variational pSOM: Deep Probabilistic Clustering with Self-Organizing Maps |
1977 | Augmenting Self-attention with Persistent Memory |
1978 | Information Plane Analysis of Deep Neural Networks via Matrix--Based Renyi's Entropy and Tensor Kernels |
1979 | Ridge Regression: Structure, Cross-Validation, and Sketching |
1980 | Hindsight Trust Region Policy Optimization |
1981 | Policy Optimization with Stochastic Mirror Descent |
1982 | Graph convolutional networks for learning with few clean and many noisy labels |
1983 | A Constructive Prediction of the Generalization Error Across Scales |
1984 | MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale |
1985 | A Mention-Pair Model of Annotation with Nonparametric User Communities |
1986 | An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality |
1987 | NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds |
1988 | Mogrifier LSTM |
1989 | Individualised Dose-Response Estimation using Generative Adversarial Nets |
1990 | Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video |
1991 | Trajectory representation learning for Multi-Task NMRDPs planning |
1992 | Incorporating Horizontal Connections in Convolution by Spatial Shuffling |
1993 | Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field |
1994 | Counterfactuals uncover the modular structure of deep generative models |
1995 | Pushing the bounds of dropout |
1996 | Confidence Scores Make Instance-dependent Label-noise Learning Possible |
1997 | Gap-Aware Mitigation of Gradient Staleness |
1998 | Evaluating and Calibrating Uncertainty Prediction in Regression Tasks |
1999 | Ensemble Distribution Distillation |
2000 | Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation |
2001 | On the Tunability of Optimizers in Deep Learning |
2002 | Gradient Perturbation is Underrated for Differentially Private Convex Optimization |
2003 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations |
2004 | Credible Sample Elicitation by Deep Learning, for Deep Learning |
2005 | Neural Markov Logic Networks |
2006 | Optimistic Exploration even with a Pessimistic Initialisation |
2007 | Better Optimization for Neural Architecture Search with Mixed-Level Reformulation |
2008 | Risk Averse Value Expansion for Sample Efficient and Robust Policy Learning |
2009 | Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing |
2010 | LabelFool: A Trick in the Label Space |
2011 | RGTI:Response generation via templates integration for End to End dialog |
2012 | Towards Disentangling Non-Robust and Robust Components in Performance Metric |
2013 | A Mechanism of Implicit Regularization in Deep Learning |
2014 | Feature-map-level Online Adversarial Knowledge Distillation |
2015 | Optimising Neural Network Architectures for Provable Adversarial Robustness |
2016 | Recurrent Independent Mechanisms |
2017 | An Explicitly Relational Neural Network Architecture |
2018 | Branched Multi-Task Networks: Deciding What Layers To Share |
2019 | MxPool: Multiplex Pooling for Hierarchical Graph Representation Learning |
2020 | Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations |
2021 | Temporal Difference Weighted Ensemble For Reinforcement Learning |
2022 | Task Level Data Augmentation for Meta-Learning |
2023 | Effect of top-down connections in Hierarchical Sparse Coding |
2024 | Compressive Recovery Defense: A Defense Framework for $\ell_0, \ell_2$ and $\ell_\infty$ norm attacks. |
2025 | Match prediction from group comparison data using neural networks |
2026 | Extractor-Attention Network: A New Attention Network with Hybrid Encoders for Chinese Text Classification |
2027 | Identifying through Flows for Recovering Latent Representations |
2028 | Robust training with ensemble consensus |
2029 | Fault Tolerant Reinforcement Learning via A Markov Game of Control and Stopping |
2030 | BRIDGING ADVERSARIAL SAMPLES AND ADVERSARIAL NETWORKS |
2031 | Hierarchical Summary-to-Article Generation |
2032 | Unsupervised-Learning of time-varying features |
2033 | Self-Adversarial Learning with Comparative Discrimination for Text Generation |
2034 | A General Upper Bound for Unsupervised Domain Adaptation |
2035 | Vid2Game: Controllable Characters Extracted from Real-World Videos |
2036 | Action Semantics Network: Considering the Effects of Actions in Multiagent Systems |
2037 | Growing Action Spaces |
2038 | Learning Generative Image Object Manipulations from Language Instructions |
2039 | Discourse-Based Evaluation of Language Understanding |
2040 | Learning Efficient Parameter Server Synchronization Policies for Distributed SGD |
2041 | Relational State-Space Model for Stochastic Multi-Object Systems |
2042 | TSInsight: A local-global attribution framework for interpretability in time-series data |
2043 | OPTIMAL TRANSPORT, CYCLEGAN, AND PENALIZED LS FOR UNSUPERVISED LEARNING IN INVERSE PROBLEMS |
2044 | Structural Language Models for Any-Code Generation |
2045 | How does Lipschitz Regularization Influence GAN Training? |
2046 | Simple and Effective Stochastic Neural Networks |
2047 | Robust Reinforcement Learning with Wasserstein Constraint |
2048 | Cross-Iteration Batch Normalization |
2049 | Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement Learning |
2050 | The Effect of Residual Architecture on the Per-Layer Gradient of Deep Networks |
2051 | Prune or quantize? Strategy for Pareto-optimally low-cost and accurate CNN |
2052 | Graph Residual Flow for Molecular Graph Generation |
2053 | Nonlinearities in activations substantially shape the loss surfaces of neural networks |
2054 | Attention over Parameters for Dialogue Systems |
2055 | The Convex Information Bottleneck Lagrangian |
2056 | The problem with DDPG: understanding failures in deterministic environments with sparse rewards |
2057 | LocalGAN: Modeling Local Distributions for Adversarial Response Generation |
2058 | Hierarchical Image-to-image Translation with Nested Distributions Modeling |
2059 | Generative Adversarial Networks For Data Scarcity Industrial Positron Images With Attention |
2060 | OvA-INN: Continual Learning with Invertible Neural Networks |
2061 | Contextual Inverse Reinforcement Learning |
2062 | Mining GANs for knowledge transfer to small domains |
2063 | Learning Time-Aware Assistance Functions for Numerical Fluid Solvers |
2064 | Transition Based Dependency Parser for Amharic Language Using Deep Learning |
2065 | Samples Are Useful? Not Always: denoising policy gradient updates using variance explained |
2066 | Learning Surrogate Losses |
2067 | Boosting Network: Learn by Growing Filters and Layers via SplitLBI |
2068 | Split LBI for Deep Learning: Structural Sparsity via Differential Inclusion Paths |
2069 | Generalizing Deep Multi-task Learning with Heterogeneous Structured Networks |
2070 | Unsupervised Universal Self-Attention Network for Graph Classification |
2071 | FairFace: A Novel Face Attribute Dataset for Bias Measurement and Mitigation |
2072 | Manifold Modeling in Embedded Space: A Perspective for Interpreting "Deep Image Prior" |
2073 | Novelty Detection Via Blurring |
2074 | Small-GAN: Speeding up GAN Training using Core-Sets |
2075 | Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks |
2076 | Data-Independent Neural Pruning via Coresets |
2077 | Deeper Insights into Weight Sharing in Neural Architecture Search |
2078 | Learnable Higher-order Representation for Action Recognition |
2079 | Dirichlet Wrapper to Quantify Classification Uncertainty in Black-Box Systems |
2080 | S2VG: Soft Stochastic Value Gradient method |
2081 | Deep Network classification by Scattering and Homotopy dictionary learning |
2082 | Scalable Generative Models for Graphs with Graph Attention Mechanism |
2083 | Continuous Adaptation in Multi-agent Competitive Environments |
2084 | Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP |
2085 | Combiner: Inductively Learning Tree Structured Attention in Transformers |
2086 | Robust Cross-lingual Embeddings from Parallel Sentences |
2087 | Semi-supervised Learning by Coaching |
2088 | DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS |
2089 | Blockwise Self-Attention for Long Document Understanding |
2090 | Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models |
2091 | I am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively |
2092 | Black-Box Adversarial Attack with Transferable Model-based Embedding |
2093 | Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients |
2094 | Understanding Distributional Ambiguity via Non-robust Chance Constraint |
2095 | MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer |
2096 | Do Image Classifiers Generalize Across Time? |
2097 | Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation |
2098 | Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination |
2099 | A shallow feature extraction network with a large receptive field for stereo matching tasks |
2100 | Learning Boolean Circuits with Neural Networks |
2101 | ProxNet: End-to-End Learning of Structured Representation by Proximal Mapping |
2102 | Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets |
2103 | Towards Principled Objectives for Contrastive Disentanglement |
2104 | Compositional languages emerge in a neural iterated learning model |
2105 | Population-Guided Parallel Policy Search for Reinforcement Learning |
2106 | Classification Logit Two-sample Testing by Neural Networks |
2107 | Variational Recurrent Models for Solving Partially Observable Control Tasks |
2108 | Learning to Discretize: Solving 1D Scalar Conservation Laws via Deep Reinforcement Learning |
2109 | Towards Unifying Neural Architecture Space Exploration and Generalization |
2110 | Composable Semi-parametric Modelling for Long-range Motion Generation |
2111 | Towards an Adversarially Robust Normalization Approach |
2112 | Generative Latent Flow |
2113 | Adversarial Example Detection and Classification with Asymmetrical Adversarial Training |
2114 | CZ-GEM: A FRAMEWORK FOR DISENTANGLED REPRESENTATION LEARNING |
2115 | Generalized Natural Language Grounded Navigation via Environment-agnostic Multitask Learning |
2116 | Global Concavity and Optimization in a Class of Dynamic Discrete Choice Models |
2117 | Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information |
2118 | On the Pareto Efficiency of Quantized CNN |
2119 | BANANAS: Bayesian Optimization with Neural Networks for Neural Architecture Search |
2120 | Potential Flow Generator with $L_2$ Optimal Transport Regularity for Generative Models |
2121 | Integrative Tensor-based Anomaly Detection System For Satellites |
2122 | Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions |
2123 | MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius |
2124 | TinyBERT: Distilling BERT for Natural Language Understanding |
2125 | UW-NET: AN INCEPTION-ATTENTION NETWORK FOR UNDERWATER IMAGE CLASSIFICATION |
2126 | Semantically-Guided Representation Learning for Self-Supervised Monocular Depth |
2127 | Stochastic AUC Maximization with Deep Neural Networks |
2128 | Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures |
2129 | Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity |
2130 | Why ADAM Beats SGD for Attention Models |
2131 | Reflection-based Word Attribute Transfer |
2132 | Difference-Seeking Generative Adversarial Network--Unseen Sample Generation |
2133 | EINS: Long Short-Term Memory with Extrapolated Input Network Simplification |
2134 | FasterSeg: Searching for Faster Real-time Semantic Segmentation |
2135 | LEARNING EXECUTION THROUGH NEURAL CODE FUSION |
2136 | Meta Module Network for Compositional Visual Reasoning |
2137 | Min-max Entropy for Weakly Supervised Pointwise Localization |
2138 | Editable Neural Networks |
2139 | Parallel Scheduled Sampling |
2140 | Learning Explainable Models Using Attribution Priors |
2141 | Efficient Inference and Exploration for Reinforcement Learning |
2142 | Leveraging inductive bias of neural networks for learning without explicit human annotations |
2143 | Bias-Resilient Neural Network |
2144 | Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis |
2145 | Accelerating Reinforcement Learning Through GPU Atari Emulation |
2146 | Can gradient clipping mitigate label noise? |
2147 | Concise Multi-head Attention Models |
2148 | Tensorized Embedding Layers for Efficient Model Compression |
2149 | Rethinking Neural Network Quantization |
2150 | Zero-shot task adaptation by homoiconic meta-mapping |
2151 | iSparse: Output Informed Sparsification of Neural Networks |
2152 | HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled embedding of n-gram statistics |
2153 | Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model |
2154 | Fast Linear Interpolation for Piecewise-Linear Functions, GAMs, and Deep Lattice Networks |
2155 | Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system |
2156 | Collaborative Generated Hashing for Market Analysis and Fast Cold-start Recommendation |
2157 | Pruned Graph Scattering Transforms |
2158 | DDSP: Differentiable Digital Signal Processing |
2159 | Continual Learning via Neural Pruning |
2160 | Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML |
2161 | XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering |
2162 | Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning |
2163 | GLAD: Learning Sparse Graph Recovery |
2164 | PDP: A General Neural Framework for Learning SAT Solvers |
2165 | Adaptive Loss Scaling for Mixed Precision Training |
2166 | Quantifying Exposure Bias for Neural Language Generation |
2167 | How many weights are enough : can tensor factorization learn efficient policies ? |
2168 | Domain Aggregation Networks for Multi-Source Domain Adaptation |
2169 | Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming |
2170 | AHash: A Load-Balanced One Permutation Hash |
2171 | Ordinary differential equations on graph networks |
2172 | Lift-the-flap: what, where and when for context reasoning |
2173 | Unifying Question Answering, Text Classification, and Regression via Span Extraction |
2174 | Supervised learning with incomplete data via sparse representations |
2175 | Conversation Generation with Concept Flow |
2176 | The Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit |
2177 | Variational Hashing-based Collaborative Filtering with Self-Masking |
2178 | Neural Network Branching for Neural Network Verification |
2179 | SoftLoc: Robust Temporal Localization under Label Misalignment |
2180 | VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation |
2181 | Adaptive Data Augmentation with Deep Parallel Generative Models |
2182 | Domain-invariant Learning using Adaptive Filter Decomposition |
2183 | Topology of deep neural networks |
2184 | Adversarial Policies: Attacking Deep Reinforcement Learning |
2185 | Escaping Saddle Points Faster with Stochastic Momentum |
2186 | Few-shot Text Classification with Distributional Signatures |
2187 | RotationOut as a Regularization Method for Neural Network |
2188 | Universal Approximation with Deep Narrow Networks |
2189 | A Dynamic Approach to Accelerate Deep Learning Training |
2190 | Geometric Insights into the Convergence of Nonlinear TD Learning |
2191 | Efficient Multivariate Bandit Algorithm with Path Planning |
2192 | Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling |
2193 | Exploring Model-based Planning with Policy Networks |
2194 | Benchmarking Model-Based Reinforcement Learning |
2195 | Encoder-decoder Network as Loss Function for Summarization |
2196 | Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks |
2197 | On Identifiability in Transformers |
2198 | Automated curriculum generation through setter-solver interactions |
2199 | Deep Multi-View Learning via Task-Optimal CCA |
2200 | Bandlimiting Neural Networks Against Adversarial Attacks |
2201 | Progressive Memory Banks for Incremental Domain Adaptation |
2202 | MMD GAN with Random-Forest Kernels |
2203 | What graph neural networks cannot learn: depth vs width |
2204 | INFERENCE, PREDICTION, AND ENTROPY RATE OF CONTINUOUS-TIME, DISCRETE-EVENT PROCESSES |
2205 | Learning an off-policy predictive state representation for deep reinforcement learning for vision-based steering in autonomous driving |
2206 | RTFM: Generalising to New Environment Dynamics via Reading |
2207 | MIM: Mutual Information Machine |
2208 | Real or Fake: An Empirical Study and Improved Model for Fake Face Detection |
2209 | Constant Time Graph Neural Networks |
2210 | AutoLR: A Method for Automatic Tuning of Learning Rate |
2211 | Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping |
2212 | Optimal Attacks on Reinforcement Learning Policies |
2213 | Multi-Agent Hierarchical Reinforcement Learning for Humanoid Navigation |
2214 | SMiRL: Surprise Minimizing RL in Entropic Environments |
2215 | Mesh-Free Unsupervised Learning-Based PDE Solver of Forward and Inverse problems |
2216 | Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models |
2217 | Sparse and Structured Visual Attention |
2218 | Network Pruning for Low-Rank Binary Index |
2219 | Style-based Encoder Pre-training for Multi-modal Image Synthesis |
2220 | LDMGAN: Reducing Mode Collapse in GANs with Latent Distribution Matching |
2221 | Bootstrapping the Expressivity with Model-based Planning |
2222 | DeepAGREL: Biologically plausible deep learning via direct reinforcement |
2223 | Homogeneous Linear Inequality Constraints for Neural Network Activations |
2224 | Leveraging Simple Model Predictions for Enhancing its Performance |
2225 | Modeling treatment events in disease progression |
2226 | DG-GAN: the GAN with the duality gap |
2227 | Stochastic Gradient Descent with Biased but Consistent Gradient Estimators |
2228 | One-way prototypical networks |
2229 | Encoding word order in complex embeddings |
2230 | ADASAMPLE: ADAPTIVE SAMPLING OF HARD POSITIVES FOR DESCRIPTOR LEARNING |
2231 | Functional vs. parametric equivalence of ReLU networks |
2232 | A New Multi-input Model with the Attention Mechanism for Text Classification |
2233 | Multi-Dimensional Explanation of Reviews |
2234 | A Uniform Generalization Error Bound for Generative Adversarial Networks |
2235 | QGAN: Quantize Generative Adversarial Networks to Extreme low-bits |
2236 | Learning to Transfer Learn |
2237 | Contrastive Learning of Structured World Models |
2238 | Disentangling Factors of Variations Using Few Labels |
2239 | Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality |
2240 | EDUCE: Explaining model Decision through Unsupervised Concepts Extraction |
2241 | Target-directed Atomic Importance Estimation via Reverse Self-attention |
2242 | A critical analysis of self-supervision, or what we can learn from a single image |
2243 | Accelerating SGD with momentum for over-parameterized learning |
2244 | Discrete InfoMax Codes for Meta-Learning |
2245 | The Geometry of Sign Gradient Descent |
2246 | Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation |
2247 | Attributes Obfuscation with Complex-Valued Features |
2248 | V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control |
2249 | MDE: Multiple Distance Embeddings for Link Prediction in Knowledge Graphs |
2250 | Improving Adversarial Robustness Requires Revisiting Misclassified Examples |
2251 | Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control |
2252 | InfoCNF: Efficient Conditional Continuous Normalizing Flow Using Adaptive Solvers |
2253 | Mirror Descent View For Neural Network Quantization |
2254 | Hierarchical Disentangle Network for Object Representation Learning |
2255 | Deep Multiple Instance Learning with Gaussian Weighting |
2256 | Mitigating Posterior Collapse in Strongly Conditioned Variational Autoencoders |
2257 | Zeno++: Robust Fully Asynchronous SGD |
2258 | DivideMix: Learning with Noisy Labels as Semi-supervised Learning |
2259 | PAD-Nets: Learning Dynamic Receptive Fields via Pixel-Wise Adaptive Dilation |
2260 | PLEX: PLanner and EXecutor for Embodied Learning in Navigation |
2261 | DeepObfusCode: Source Code Obfuscation Through Sequence-to-Sequence Networks |
2262 | Extreme Value k-means Clustering |
2263 | Adaptive network sparsification with dependent variational beta-Bernoulli dropout |
2264 | Data-dependent Gaussian Prior Objective for Language Generation |
2265 | Learning Representations in Reinforcement Learning: an Information Bottleneck Approach |
2266 | LSTOD: Latent Spatial-Temporal Origin-Destination prediction model and its applications in ride-sharing platforms |
2267 | Ecological Reinforcement Learning |
2268 | Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection |
2269 | Towards Understanding the Regularization of Adversarial Robustness on Neural Networks |
2270 | MaskConvNet: Training Efficient ConvNets from Scratch via Budget-constrained Filter Pruning |
2271 | Fast Bilinear Matrix Normalization via Rank-1 Update |
2272 | Scale-Equivariant Neural Networks with Decomposed Convolutional Filters |
2273 | A novel Bayesian estimation-based word embedding model for sentiment analysis |
2274 | Attacking Lifelong Learning Models with Gradient Reversion |
2275 | Learning with Long-term Remembering: Following the Lead of Mixed Stochastic Gradient |
2276 | A Harmonic Structure-Based Neural Network Model for Musical Pitch Detection |
2277 | Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking |
2278 | Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness |
2279 | Domain-Agnostic Few-Shot Classification by Learning Disparate Modulators |
2280 | Anomaly Detection and Localization in Images using Guided Attention |
2281 | Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards |
2282 | Logic and the 2-Simplicial Transformer |
2283 | PAC-Bayes Few-shot Meta-learning with Implicit Learning of Model Prior Distribution |
2284 | Reinforcement Learning with Chromatic Networks |
2285 | AE-OT: A NEW GENERATIVE MODEL BASED ON EXTENDED SEMI-DISCRETE OPTIMAL TRANSPORT |
2286 | Deep Mining: Detecting Anomalous Patterns in Neural Network Activations with Subset Scanning |
2287 | A Data-Efficient Mutual Information Neural Estimator for Statistical Dependency Testing |
2288 | Enhancing Adversarial Defense by k-Winners-Take-All |
2289 | Thwarting finite difference adversarial attacks with output randomization |
2290 | Exploration in Reinforcement Learning with Deep Covering Options |
2291 | Towards Controllable and Interpretable Face Completion via Structure-Aware and Frequency-Oriented Attentive GANs |
2292 | Learning audio representations with self-supervision |
2293 | Learning Disentangled Representations for CounterFactual Regression |
2294 | Learning relevant features for statistical inference |
2295 | VILD: Variational Imitation Learning with Diverse-quality Demonstrations |
2296 | Entropy Minimization In Emergent Languages |
2297 | A Unified framework for randomized smoothing based certified defenses |
2298 | Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification |
2299 | MIST: Multiple Instance Spatial Transformer Networks |
2300 | ISBNet: Instance-aware Selective Branching Networks |
2301 | MODiR: Multi-Objective Dimensionality Reduction for Joint Data Visualisation |
2302 | Robust Local Features for Improving the Generalization of Adversarial Training |
2303 | Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach |
2304 | Distributed Online Optimization with Long-Term Constraints |
2305 | Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives |
2306 | Learning the Arrow of Time for Problems in Reinforcement Learning |
2307 | Topological based classification using graph convolutional networks |
2308 | The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget |
2309 | AutoGrow: Automatic Layer Growing in Deep Convolutional Networks |
2310 | Sequence-level Intrinsic Exploration Model for Partially Observable Domains |
2311 | Pipelined Training with Stale Weights of Deep Convolutional Neural Networks |
2312 | StacNAS: Towards Stable and Consistent Optimization for Differentiable Neural Architecture Search |
2313 | Universal Learning Approach for Adversarial Defense |
2314 | Boosting Generative Models by Leveraging Cascaded Meta-Models |
2315 | Quantitatively Disentangling and Understanding Part Information in CNNs |
2316 | The Implicit Bias of Depth: How Incremental Learning Drives Generalization |
2317 | FAKE CAN BE REAL IN GANS |
2318 | Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness |
2319 | Measuring Compositional Generalization: A Comprehensive Method on Realistic Data |
2320 | Theory and Evaluation Metrics for Learning Disentangled Representations |
2321 | Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks |
2322 | Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning |
2323 | A TWO-STAGE FRAMEWORK FOR MATHEMATICAL EXPRESSION RECOGNITION |
2324 | Universal Source-Free Domain Adaptation |
2325 | Learning Invariants through Soft Unification |
2326 | Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction |
2327 | Macro Action Ensemble Searching Methodology for Deep Reinforcement Learning |
2328 | INTERPRETING CNN COMPRESSION USING INFORMATION BOTTLENECK |
2329 | Increasing batch size through instance repetition improves generalization |
2330 | FSPool: Learning Set Representations with Featurewise Sort Pooling |
2331 | Recurrent Neural Networks are Universal Filters |
2332 | On the Convergence of FedAvg on Non-IID Data |
2333 | Adversarially Robust Neural Networks via Optimal Control: Bridging Robustness with Lyapunov Stability |
2334 | Multi-agent Reinforcement Learning for Networked System Control |
2335 | Learning to Anneal and Prune Proximity Graphs for Similarity Search |
2336 | Deep Bayesian Structure Networks |
2337 | Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation |
2338 | Keyframing the Future: Discovering Temporal Hierarchy with Keyframe-Inpainter Prediction |
2339 | Differential Privacy in Adversarial Learning with Provable Robustness |
2340 | Topology-Aware Pooling via Graph Attention |
2341 | Siamese Attention Networks |
2342 | Neural Stored-program Memory |
2343 | ES-MAML: Simple Hessian-Free Meta Learning |
2344 | Enforcing Physical Constraints in Neural Neural Networks through Differentiable PDE Layer |
2345 | TabFact: A Large-scale Dataset for Table-based Fact Verification |
2346 | Evidence-Aware Entropy Decomposition For Active Deep Learning |
2347 | Learning to Generate Grounded Visual Captions without Localization Supervision |
2348 | Extreme Triplet Learning: Effectively Optimizing Easy Positives and Hard Negatives |
2349 | Implicit Bias of Gradient Descent based Adversarial Training on Separable Data |
2350 | Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks in Molecular Graph Analysis |
2351 | BERT Wears GloVes: Distilling Static Embeddings from Pretrained Contextual Representations |
2352 | The Visual Task Adaptation Benchmark |
2353 | Input Alignment along Chaotic directions increases Stability in Recurrent Neural Networks |
2354 | 3D-SIC: 3D Semantic Instance Completion for RGB-D Scans |
2355 | Learning Similarity Metrics for Numerical Simulations |
2356 | Image-guided Neural Object Rendering |
2357 | MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics |
2358 | Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients |
2359 | Stablizing Adversarial Invariance Induction by Discriminator Matching |
2360 | Natural Language Adversarial Attack and Defense in Word Level |
2361 | Amharic Light Stemmer |
2362 | Dynamical Clustering of Time Series Data Using Multi-Decoder RNN Autoencoder |
2363 | POP-Norm: A Theoretically Justified and More Accelerated Normalization Approach |
2364 | Programmable Neural Network Trojan for Pre-trained Feature Extractor |
2365 | Cost-Effective Interactive Neural Attention Learning |
2366 | On Layer Normalization in the Transformer Architecture |
2367 | PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search |
2368 | Knowledge Consistency between Neural Networks and Beyond |
2369 | Temporal Probabilistic Asymmetric Multi-task Learning |
2370 | Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information |
2371 | Corpus Based Amharic Sentiment Lexicon Generation |
2372 | Principled Weight Initialization for Hypernetworks |
2373 | Additive Powers-of-Two Quantization: A Non-uniform Discretization for Neural Networks |
2374 | Transfer Alignment Network for Double Blind Unsupervised Domain Adaptation |
2375 | Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods |
2376 | Neural Architecture Search in Embedding Space |
2377 | Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier |
2378 | Single Deep Counterfactual Regret Minimization |
2379 | HaarPooling: Graph Pooling with Compressive Haar Basis |
2380 | Safe Policy Learning for Continuous Control |
2381 | A Stochastic Trust Region Method for Non-convex Minimization |
2382 | Learning Effective Exploration Strategies For Contextual Bandits |
2383 | Improving Batch Normalization with Skewness Reduction for Deep Neural Networks |
2384 | Adversarial Inductive Transfer Learning with input and output space adaptation |
2385 | Graph Neural Networks For Multi-Image Matching |
2386 | An Empirical Study on Post-processing Methods for Word Embeddings |
2387 | AN EFFICIENT HOMOTOPY TRAINING ALGORITHM FOR NEURAL NETWORKS |
2388 | High performance RNNs with spiking neurons |
2389 | CLAREL: classification via retrieval loss for zero-shot learning |
2390 | Observational Overfitting in Reinforcement Learning |
2391 | On Mutual Information Maximization for Representation Learning |
2392 | Localizing and Amortizing: Efficient Inference for Gaussian Processes |
2393 | PNAT: Non-autoregressive Transformer by Position Learning |
2394 | On unsupervised-supervised risk and one-class neural networks |
2395 | Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds |
2396 | Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized NN |
2397 | Bayesian Inference for Large Scale Image Classification |
2398 | Ranking Policy Gradient |
2399 | How Does Learning Rate Decay Help Modern Neural Networks? |
2400 | Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures |
2401 | SVQN: Sequential Variational Soft Q-Learning Networks |
2402 | Classification Attention for Chinese NER |
2403 | Understanding Isomorphism Bias in Graph Data Sets |
2404 | Neural Machine Translation with Universal Visual Representation |
2405 | Towards More Realistic Neural Network Uncertainties |
2406 | Understanding Architectures Learnt by Cell-based Neural Architecture Search |
2407 | Soft Token Matching for Interpretable Low-Resource Classification |
2408 | Beyond Classical Diffusion: Ballistic Graph Neural Network |
2409 | Hierarchical Complement Objective Training |
2410 | Understanding and Stabilizing GANs' Training Dynamics with Control Theory |
2411 | Variance Reduced Local SGD with Lower Communication Complexity |
2412 | AutoQ: Automated Kernel-Wise Neural Network Quantization |
2413 | Quantifying Layerwise Information Discarding of Neural Networks and Beyond |
2414 | GDP: Generalized Device Placement for Dataflow Graphs |
2415 | Unveiling Hidden Biases in Deep Networks with Classification Images and Spike Triggered Analysis |
2416 | Generalization Puzzles in Deep Networks |
2417 | Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization |
2418 | Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring |
2419 | HighRes-net: Multi-Frame Super-Resolution by Recursive Fusion |
2420 | A Learning-based Iterative Method for Solving Vehicle Routing Problems |
2421 | Transferable Perturbations of Deep Feature Distributions |
2422 | Rethinking the Security of Skip Connections in ResNet-like Neural Networks |
2423 | ProtoAttend: Attention-Based Prototypical Learning |
2424 | A Signal Propagation Perspective for Pruning Neural Networks at Initialization |
2425 | Wildly Unsupervised Domain Adaptation and Its Powerful and Efficient Solution |
2426 | Automatically Learning Feature Crossing from Model Interpretation for Tabular Data |
2427 | Continual Learning with Adaptive Weights (CLAW) |
2428 | Interpretability Evaluation Framework for Deep Neural Networks |
2429 | Progressive Upsampling Audio Synthesis via Effective Adversarial Training |
2430 | Learning Compact Reward for Image Captioning |
2431 | S-Flow GAN |
2432 | Gradient-free Neural Network Training by Multi-convex Alternating Optimization |
2433 | Semi-supervised Semantic Segmentation using Auxiliary Network |
2434 | Intensity-Free Learning of Temporal Point Processes |
2435 | Scalable and Order-robust Continual Learning with Additive Parameter Decomposition |
2436 | Discriminator Based Corpus Generation for General Code Synthesis |
2437 | Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning |
2438 | BOOSTING ENCODER-DECODER CNN FOR INVERSE PROBLEMS |
2439 | Weakly Supervised Clustering by Exploiting Unique Class Count |
2440 | Domain Adaptation via Low-Rank Basis Approximation |
2441 | Learning to Control PDEs with Differentiable Physics |
2442 | Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware |
2443 | Estimating Gradients for Discrete Random Variables by Sampling without Replacement |
2444 | Structural Multi-agent Learning |
2445 | A Gradient-based Architecture HyperParameter Optimization Approach |
2446 | On importance-weighted autoencoders |
2447 | FALCON: Fast and Lightweight Convolution for Compressing and Accelerating CNN |
2448 | Multi-Task Adapters for On-Device Audio Inference |
2449 | Mincut Pooling in Graph Neural Networks |
2450 | Dual Graph Representation Learning |
2451 | Unsupervised Few Shot Learning via Self-supervised Training |
2452 | To Relieve Your Headache of Training an MRF, Take AdVIL |
2453 | ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization |
2454 | On the Dynamics and Convergence of Weight Normalization for Training Neural Networks |
2455 | CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition |
2456 | Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View |
2457 | Revisit Knowledge Distillation: a Teacher-free Framework |
2458 | SesameBERT: Attention for Anywhere |
2459 | Automated Relational Meta-learning |
2460 | Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments |
2461 | Boosting Ticket: Towards Practical Pruning for Adversarial Training with Lottery Ticket Hypothesis |
2462 | Moniqua: Modulo Quantized Communication in Decentralized SGD |
2463 | Defending Against Physically Realizable Attacks on Image Classification |
2464 | Certifying Distributional Robustness using Lipschitz Regularisation |
2465 | A SPIKING SEQUENTIAL MODEL: RECURRENT LEAKY INTEGRATE-AND-FIRE |
2466 | N-BEATS: Neural basis expansion analysis for interpretable time series forecasting |
2467 | Subgraph Attention for Node Classification and Hierarchical Graph Pooling |
2468 | Are there any 'object detectors' in the hidden layers of CNNs trained to identify objects or scenes? |
2469 | Learning Human Postural Control with Hierarchical Acquisition Functions |
2470 | Unsupervised Intuitive Physics from Past Experiences |
2471 | Expected Tight Bounds for Robust Deep Neural Network Training |
2472 | Analytical Moment Regularizer for Training Robust Networks |
2473 | Model Architecture Controls Gradient Descent Dynamics: A Combinatorial Path-Based Formula |
2474 | Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient |
2475 | Collaborative Filtering With A Synthetic Feedback Loop |
2476 | Self-Supervised State-Control through Intrinsic Mutual Information Rewards |
2477 | Stagnant zone segmentation with U-net |
2478 | Distance-Based Learning from Errors for Confidence Calibration |
2479 | Curvature Graph Network |
2480 | Learning Algorithmic Solutions to Symbolic Planning Tasks with a Neural Computer |
2481 | Generative Imputation and Stochastic Prediction |
2482 | PROTOTYPE-ASSISTED ADVERSARIAL LEARNING FOR UNSUPERVISED DOMAIN ADAPTATION |
2483 | Learning Expensive Coordination: An Event-Based Deep RL Approach |
2484 | Unifying Graph Convolutional Networks as Matrix Factorization |
2485 | Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks |
2486 | Model-free Learning Control of Nonlinear Stochastic Systems with Stability Guarantee |
2487 | Depth-Recurrent Residual Connections for Super-Resolution of Real-Time Renderings |
2488 | LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning |
2489 | GenDICE: Generalized Offline Estimation of Stationary Values |
2490 | Deep Audio Prior |
2491 | Compressing Deep Neural Networks With Learnable Regularization |
2492 | ATLPA:ADVERSARIAL TOLERANT LOGIT PAIRING WITH ATTENTION FOR CONVOLUTIONAL NEURAL NETWORK |
2493 | SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering |
2494 | Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization |
2495 | Learning Out-of-distribution Detection without Out-of-distribution Data |
2496 | Prox-SGD: Training Structured Neural Networks under Regularization and Constraints |
2497 | Unsupervised Learning of Node Embeddings by Detecting Communities |
2498 | Diverse Trajectory Forecasting with Determinantal Point Processes |
2499 | Bridging the domain gap in cross-lingual document classification |
2500 | Evaluating The Search Phase of Neural Architecture Search |
2501 | Learning to Defense by Learning to Attack |
2502 | Smooth Regularized Reinforcement Learning |
2503 | On Robustness of Neural Ordinary Differential Equations |
2504 | Diving into Optimization of Topology in Neural Networks |
2505 | FoveaBox: Beyound Anchor-based Object Detection |
2506 | Cascade Style Transfer |
2507 | Advantage Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning |
2508 | Unifying Graph Convolutional Neural Networks and Label Propagation |
2509 | Equivariant neural networks and equivarification |
2510 | Towards a Unified Evaluation of Explanation Methods without Ground Truth |
2511 | Data Valuation using Reinforcement Learning |
2512 | RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling |
2513 | BackPACK: Packing more into Backprop |
2514 | DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures |
2515 | Regional based query in graph active learning |
2516 | Group-Connected Multilayer Perceptron Networks |
2517 | Towards Stable and comprehensive Domain Alignment: Max-Margin Domain-Adversarial Training |
2518 | Depth-Adaptive Transformer |
2519 | VUSFA:Variational Universal Successor Features Approximator |
2520 | InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization |
2521 | Federated Adversarial Domain Adaptation |
2522 | CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning |
2523 | Learning Structured Communication for Multi-agent Reinforcement Learning |
2524 | Utilizing Edge Features in Graph Neural Networks via Variational Information Maximization |
2525 | Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters |
2526 | Utility Analysis of Network Architectures for 3D Point Cloud Processing |
2527 | Effective Mechanism to Mitigate Injuries During NFL Plays |
2528 | TechKG: A Large-Scale Chinese Technology-Oriented Knowledge Graph |
2529 | Learning Reusable Options for Multi-Task Reinforcement Learning |
2530 | Maxmin Q-learning: Controlling the Estimation Bias of Q-learning |
2531 | X-Forest: Approximate Random Projection Trees for Similarity Measurement |
2532 | From Here to There: Video Inbetweening Using Direct 3D Convolutions |
2533 | Low Bias Gradient Estimates for Very Deep Boolean Stochastic Networks |
2534 | Automatically Discovering and Learning New Visual Categories with Ranking Statistics |
2535 | Support-guided Adversarial Imitation Learning |
2536 | Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification |
2537 | Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells |
2538 | Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data |
2539 | Data augmentation instead of explicit regularization |
2540 | SCL: Towards Accurate Domain Adaptive Object Detection via Gradient Detach Based Stacked Complementary Losses |
2541 | SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards |
2542 | Label Cleaning with Likelihood Ratio Test |
2543 | Visual Imitation with Reinforcement Learning using Recurrent Siamese Networks |
2544 | Graph Neural Networks Exponentially Lose Expressive Power for Node Classification |
2545 | VIDEO AFFECTIVE IMPACT PREDICTION WITH MULTIMODAL FUSION AND LONG-SHORT TEMPORAL CONTEXT |
2546 | Graph inference learning for semi-supervised classification |
2547 | Sparse Coding with Gated Learned ISTA |
2548 | Dimensional Reweighting Graph Convolution Networks |
2549 | ROBUST DISCRIMINATIVE REPRESENTATION LEARNING VIA GRADIENT RESCALING: AN EMPHASIS REGULARISATION PERSPECTIVE |
2550 | Explaining A Black-box By Using A Deep Variational Information Bottleneck Approach |
2551 | Learning deep graph matching with channel-independent embedding and Hungarian attention |
2552 | EnsembleNet: End-to-End Optimization of Multi-headed Models |
2553 | Out-of-Distribution Detection Using Layerwise Uncertainty in Deep Neural Networks |
2554 | Semantics Preserving Adversarial Attacks |
2555 | Ensemble methods and LSTM outperformed other eight machine learning classifiers in an EEG-based BCI experiment |
2556 | Scaling Up Neural Architecture Search with Big Single-Stage Models |
2557 | AutoSlim: Towards One-Shot Architecture Search for Channel Numbers |
2558 | Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching |
2559 | EgoMap: Projective mapping and structured egocentric memory for Deep RL |
2560 | Accelerated Information Gradient flow |
2561 | Adversarial Attribute Learning by Exploiting negative correlated attributes |
2562 | StructPool: Structured Graph Pooling via Conditional Random Fields |
2563 | On the Decision Boundaries of Deep Neural Networks: A Tropical Geometry Perspective |
2564 | Probabilistic modeling the hidden layers of Deep Neural Networks |
2565 | IEG: Robust neural net training with severe label noises |
2566 | VideoEpitoma: Efficient Recognition of Long-range Actions |
2567 | On the Weaknesses of Reinforcement Learning for Neural Machine Translation |
2568 | Stochastically Controlled Compositional Gradient for the Composition problem |
2569 | Sharing Knowledge in Multi-Task Deep Reinforcement Learning |
2570 | HOW IMPORTANT ARE NETWORK WEIGHTS? TO WHAT EXTENT DO THEY NEED AN UPDATE? |
2571 | Deep Reasoning Networks: Thinking Fast and Slow, for Pattern De-mixing |
2572 | When Does Self-supervision Improve Few-shot Learning? |
2573 | Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation |
2574 | Context-aware Attention Model for Coreference Resolution |
2575 | SELF: Learning to Filter Noisy Labels with Self-Ensembling |
2576 | Neural Maximum Common Subgraph Detection with Guided Subgraph Extraction |
2577 | Amharic Negation Handling |
2578 | Noise Regularization for Conditional Density Estimation |
2579 | Star-Convexity in Non-Negative Matrix Factorization |
2580 | Count-guided Weakly Supervised Localization Based on Density Map |
2581 | Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization |
2582 | SSE-PT: Sequential Recommendation Via Personalized Transformer |
2583 | Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization |
2584 | Improving Evolutionary Strategies with Generative Neural Networks |
2585 | Analysis and Interpretation of Deep CNN Representations as Perceptual Quality Features |
2586 | Program Guided Agent |
2587 | Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency |
2588 | Prestopping: How Does Early Stopping Help Generalization Against Label Noise? |
2589 | Carpe Diem, Seize the Samples Uncertain "at the Moment" for Adaptive Batch Selection |
2590 | Large Batch Optimization for Deep Learning: Training BERT in 76 minutes |