|
" + n + "
|
Internal |
Passing Juice |
|
" +
n + "
|
Internal |
Passing Juice |
|
Google Scholar
|
External |
Passing Juice |
|
VibeVoice Technical Report
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
demo
|
External |
Passing Juice |
|
checkpoints
|
External |
Passing Juice |
|
Reward Reasoning Model
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
checkpoints
|
External |
Passing Juice |
|
Native Hybrid Thinking Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Scaling Laws of Synthetic Data for Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Reinforcement Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Rectified Sparse Attention
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
On-Policy RL with Optimal Reward Baseline
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code merged to verl
|
External |
Passing Juice |
|
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Differential Transformer
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Data Selection via Optimal Control for Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Self-Boosting Large Language Models with Synthetic Preference Data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index
|
External |
Passing Juice |
|
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Multimodal Latent Language Modeling with Next-Token Diffusion
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
RedStone: Curating General, Code, Math, and QA Data for Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
You Only Cache Once: Decoder-Decoder Architectures for Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Multi-Head Mixture-of-Experts
|
External |
Passing Juice |
|
Direct Preference Knowledge Distillation for Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Towards Optimal Learning of Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Kosmos-E: Learning to Follow Instruction for Robotic Grasping
|
Internal |
Passing Juice |
|
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Kosmos-2: Grounding Multimodal Large Language Models to the World
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
demo
|
External |
Passing Juice |
|
Knowledge Distillation of Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
BioCLIP: A Vision Foundation Model for the Tree of Life
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
model
|
External |
Passing Juice |
|
demo
|
External |
Passing Juice |
|
BitNet: Scaling 1-bit Transformers for Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Kosmos-2.5: A Multimodal Literate Model
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Large Language Model for Science: A Study on P vs. NP
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Retentive Network: A Successor to Transformer for Large Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
LongNet: Scaling Transformers to 1,000,000,000 Tokens
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Language Is Not All You Need: Aligning Perception with Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
MetaLM
|
External |
Passing Juice |
|
Augmenting Language Models with Long-Term Memory
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Optimizing Prompts for Text-to-Image Generation
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
demo
|
External |
Passing Juice |
|
Extensible Prompts for Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Pre-Training to Learn in Context
|
External |
Passing Juice |
|
pdf
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
A Length-Extrapolatable Transformer
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Magneto: A Foundation Transformer
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
VL-BEiT
|
External |
Passing Juice |
|
Non-Contrastive Learning Meets Language-Image Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Generic-to-Specific Distillation of Masked Autoencoders
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
A Unified View of Masked Image Modeling
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Visually-Augmented Language Modeling
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Corrupted Image Modeling for Self-Supervised Visual Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Prototypical Calibration for Few-shot Learning of Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Language Models are General-Purpose Interfaces
|
External |
Passing Juice |
|
On the Representation Collapse of Sparse Mixture of Experts
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
BEiT: BERT Pre-Training of Image Transformers
|
External |
Passing Juice |
|
bib
|
Internal |
Passing Juice |
|
code
|
External |
Passing Juice |
|
AdaPrompt: Adaptive Model Training for Prompt-based NLP
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Knowledge Neurons in Pretrained Transformers
|
External |
Passing Juice |
|
bib
|
Internal |
Passing Juice |
|
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
StableMoE: Stable Routing Strategy for Mixture of Experts
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Controllable Natural Language Generation with Contrastive Prefixes
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Swin Transformer V2: Scaling Up Capacity and Resolution
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
|
External |
Passing Juice |
|
bib
|
Internal |
Passing Juice |
|
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Consistency Regularization for Cross-Lingual Fine-Tuning
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Learning to Sample Replacements for ELECTRA Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Memory-Efficient Differentiable Transformer Architecture Search
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
|
External |
Passing Juice |
|
bib
|
Internal |
Passing Juice |
|
code
|
External |
Passing Juice |
|
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Cross-Lingual
Natural Language Generation via Pre-Training
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
Internal |
Passing Juice |
|
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
blog
|
External |
Passing Juice |
|
Harvesting and Refining Question-Answer Pairs for Unsupervised QA
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Unified
Language Model Pre-training for Natural Language Understanding and Generation
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Visualizing and Understanding
the Effectiveness of BERT
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Data-to-text Generation with Entity
Modeling
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Learning
to Ask Unanswerable Questions for Machine Reading Comprehension
|
External |
Passing Juice |
|
data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Data-to-Text
Generation with Content Selection and Planning
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Coarse-to-Fine Decoding for Neural
Semantic Parsing
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Confidence
Modeling for Neural Semantic Parsing
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Learning to
Paraphrase for Question Answering
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Learning to
Generate Product Reviews from Attributes
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Language to
Logical Form with Neural Attention
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
Long
Short-Term Memory-Networks for Machine Reading
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Solving and
Generating Chinese Character Riddles
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Unsupervised Word and
Dependency Path Embeddings for Aspect Term Extraction
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Question
Answering over Freebase with Multi-Column Convolutional Neural Networks
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
A Hybrid
Neural Model for Type Classification of Entity Mentions
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
Ranking
with Recursive Neural Networks and Its Application to Multi-document
Summarization
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Adaptive Recursive Neural
Network for Target-dependent Twitter Sentiment Classification
|
External |
Passing Juice |
|
data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Adaptive
Multi-Compositionality for Recursive Neural Models with Applications to Sentiment
Analysis
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
A
Joint Segmentation and Classification Framework for Sentiment Analysis
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
The
Automated Acquisition of Suggestions from Tweets
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
data
|
Internal |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
MoodLens: An Emoticon-Based
Sentiment Analysis System for Chinese Tweets
|
Internal |
Passing Juice |
|
demo
|
External |
Passing Juice |
|
poster
|
External |
Passing Juice |
|
video
|
External |
Passing Juice |
|
data
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Model as a Game: On Numerical and Spatial Consistency for Generative Games
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Inspecting Unification of
Encoding and Matching with Transformer: A Case Study of Machine Reading
Comprehension
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
Splusplus: A Feature-Rich
Two-stage Classifier for Sentiment Analysis of Tweets
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
code
|
External |
Passing Juice |
|
DeepNet: Scaling Transformers to 1,000 Layers
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Generic-to-Specific Distillation of Masked Autoencoders
|
External |
Passing Juice |
|
Transforming Wikipedia into Augmented Data for Query-Focused Summarization
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
Adaptive
Multi-Compositionality for Recursive Neural Network Models
|
External |
Passing Juice |
|
A Statistical
Parsing Framework for Sentiment Classification
|
External |
Passing Juice |
|
bib
|
External |
Passing Juice |
|
slides
|
Internal |
Passing Juice |
|
A Joint
Segmentation and Classification Framework for Sentence Level Sentiment
Classification
|
External |
Passing Juice |
|
Unraveling
the origin of exponential law in intra-urban human mobility
|
External |
Passing Juice |
|
Performance of Local
Information Based Link Prediction: A Sampling Perspective
|
External |
Passing Juice |
|
Learning Natural Language
Interfaces with Neural Models
|
External |
Passing Juice |
|
AIMatters (invited)
|
External |
Passing Juice |
|
Principal
Researcher
|
External |
Passing Juice |
|
Mirella Lapata
|
External |
Passing Juice |
|
Chris
Quirk
|
External |
Passing Juice |
|
Furu Wei
|
External |
Passing Juice |
|
Ke Xu
|
External |
Passing Juice |