Writing

Research notes, technical essays, and system-level thinking

Original essays around efficient LLMs, long-context systems, agent engineering, model compression, and AI-native product interfaces.

Latest Essay

On-Policy Distillation

Training Students on Their Own Mistakes

This slide deck introduces On-Policy Distillation (OPD): a family of distillation methods where the student learns from trajectories, states, prompts, or mistakes that arise fro...

2026年05月22日 model compression distillation on policy llm systems

16 posts

2026年05月22日 Updated 2026.05.22

On-Policy Distillation

Training Students on Their Own Mistakes

This slide deck introduces On-Policy Distillation (OPD): a family of distillation methods where the student learns from trajector...

model compression distillation on policy llm systems

2026年04月29日 Updated 2026.04.29

Trace Data Foundry：当 Agent 执行轨迹成为可交易的模型能力资产

这篇文章是对”agent trace 商品化”这一方向的竞品分析与产品设计推演。它不是一篇论文综述，而是试图回答一个具体问题：在 SWE-Gym、SWE-smith、AgentRR、Datacurve 等工作已经出现的背景下，一个”verified agen...

agent data marketplace rl

2026年04月29日 Updated 2026.04.29

你和 AI 的关系，决定了你的能力走向

过去两年，关于 AI 最流行的焦虑集中在一个错误的问题上：模型会不会替代我的工作？

agent skill harness education memory

2026年04月24日 Updated 2026.04.27

DeepSeek-V4 论文解读：百万 Token 上下文不是窗口竞赛，而是系统工程

DeepSeek-V4 Paper Notes: Million-Token Context as Systems Engineering

这篇文章基于论文《DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence》和原始中文整理稿重写。它不是逐段翻译，而是把论文重新组织成一条更适合阅读的技术主线：为什么百万...

paper reading long context llm systems

2026年04月21日 Updated 2026.04.27

长上下文的终局，不是更大的窗口

一篇关于长时 Agent、推理成本、外部记忆与状态资产化的产业技术评论。

long context llm systems agent

2026年04月21日 Updated 2026.04.27

From Ultra-Long Context to Lifelong Service: Why Open Models Are Likely to Become State-Centric Serving Systems

A research-style blog on architectural divergence, long-horizon agents, external memory, and possible state assets.

long context llm systems agent

2026年04月16日 Updated 2026.04.27

Claude Code 源码解读阅读地图

A Reading Map for Claude Code Source Analysis

读 Claude Code，不能只追“源码事件”

agent source reading

2026年04月15日 Updated 2026.04.27

Harness Engineering：把会写代码的模型，变成真正能交付的软件系统

引子：AI 编程的分水岭，不是会不会写代码，而是能不能持续交付

agent engineering

2026年03月18日 Updated 2026.04.27

The Death of the App: Why the "Intent Canvas" is the Endgame of Operating Systems

We carry supercomputers in our pockets, yet we still interact with them like 1990s filing cabinets.

product thinking agent

2026年03月18日 Updated 2026.04.27

App 的尽头是"画布"：零应用时代的底层架构逻辑

今天最荒诞的技术错位是：我们口袋里的手机已经具备运行百亿参数大模型的端侧算力，但我们与这台”超级计算机”的交互方式，依然停留在上世纪 90 年代的”九宫格”逻辑里。

product thinking agent

2025年08月22日 Updated 2026.04.27

大模型对话格式全景

大模型对话格式全景：从 Chat Template 到 Tool-Use，再到跨阶段 Token 设计

llm systems engineering

2025年08月07日 Updated 2026.04.27

LLM Agent 记忆管理方案

{::nomarkdown} LLM Agent记忆管理方案调研

agent memory

2025年08月06日 Updated 2026.04.27

GPT-OSS Model Card 解析

{::nomarkdown} GPT-OSS 模型深度解析：技术规格、性能与应用场景 /* Mermaid diagram styles */ .mermaid-container {...

open models model card

2025年08月05日 Updated 2026.04.27

原生轻量化大语言模型

Native Small Language Models

{::nomarkdown}

efficient llm llm systems

2025年08月05日 Updated 2026.04.27

可验证奖励的强化学习（RLVR）

Reinforcement Learning with Verifiable Rewards (RLVR)

{::nomarkdown}

reasoning llm training

2025年08月05日 Updated 2026.04.27

大型语言模型量化技术：原理、前沿与实践

LLM Quantization: Principles, Frontiers and Practice

{::nomarkdown} 大型语言模型量化技术：原理、前沿与实践目录导航量化技术概述 ...

model compression efficient llm