← Home

Reading list 2024/2025

2025-02-20

Introduction

My reading list for easy access

Reading list

LLM Research Papers: The 2024 List
Phi-4 Technical Report
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Evolving Deeper LLM Thinking
Qwen2.5 Technical Report
Beating cuBLAS in Single-Precision General Matrix Multiplication
Triton resources
A Survey on Deep Neural Network Pruning
Starting with TK
1.58-bit FLUX
Noteworthy AI Research Papers of 2024 (Part One)
Noteworthy AI Research Papers of 2024 (Part Two)
Monolith: Real Time Recommendation System With Collisionless Embedding Table
Titans: Learning to Memorize at Test Time
Spanner: Google’s Globally-Distributed Database
Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks
Converting a From-Scratch GPT Architecture to Llama 2
Scaling Reinforcement Learning with LLMS
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Preference Discerning with LLM-Enhanced Generative Retrieval
THE LOTTERY TICKET HYPOTHESIS: FINDING SPARSE, TRAINABLE NEURAL NETWORKS
Memory Layers at Scale
Large Concept Models: Language Modeling in a Sentence Representation Space
WHAT MATTERS IN TRANSFORMERS? NOT ALL ATTENTION IS NEEDED
Coalescence: making LLM inference 5x faster
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Training Large Language Models to Reason in a Continuous Latent Space
ELASTICTOK: ADAPTIVE TOKENIZATION FOR IMAGE AND VIDEO
JETFORMER: AN AUTOREGRESSIVE GENERATIVE MODEL OF RAW IMAGES AND TEXT
Theoretical Analysis of Byte-Pair Encoding
Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-V3 Technical Report
Hymba: A Hybrid-head Architecture for Small Language Models
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities
Language Model Can Listen While Speaking
Multi-megabase scale genome interpretation with genetic language models
MiniMax-01: Scaling Foundation Models with Lightning Attention
Modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms
Transformer 2 : Self-adaptive LLMs
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Titans: Learning to Memorize at Test Time (v1)
Jasper and Stella: distillation of SOTA embedding models
GROKKING AT THE EDGE OF NUMERICAL STABILITY
The GAN is dead; long live the GAN! A Modern GAN Baseline
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
DIFFERENTIAL TRANSFORMER
Large Concept Models: Language Modeling in a Sentence Representation Space (Duplicate)
The State of Generative Models
Augmented Neural ODEs
Some Math behind Neural Tangent Kernel