百大科技演讲拉片：OpenAI研究员，智能体入口如何取代网页？

摘要：讲者是OpenAI（之前在Anthropic）的研究员Karina Nguyen，她参与Claude1，chatGPT Canvas，Tasks方面的工作。

拉片视频链接传送门：百大科技演讲拉片：演讲者是来自OpenAI（之前在Anthropic

讲者是OpenAI（之前在Anthropic）的研究员Karina Nguyen，她参与Claude1，chatGPT Canvas，Tasks方面的工作。

I worked at antarik for about two years, working on cloud.
我在 antarik 工作了大约两年，专注于云相关工作。

Today, I would love to chat more about the scaling paradigms that have happened in the past two to four years in AI research, and how these paradigms unlocked new frontier product research.
今天，我想聊聊过去两到四年在人工智能研究中出现的扩展范式（Scaling Paradigms），以及这些范式如何开启了全新的前沿产品研究。

I’m also going to share some of the lessons learned by developing Claude and ChatGPT products, some design challenges and lessons, and how I think about the future of agents as they evolve from collaborators to co-innovators.
我也会分享开发 Claude 和 ChatGPT 产品过程中获得的一些经验教训、设计挑战，以及我如何看待智能体（Agents）从“合作者”演变为“共同创新者”（Co-innovators）的未来。

In the future, I would also love to invite you to engage in the conversation, and I’d be more than happy to answer some questions at the end.
之后也希望你们加入讨论，我很乐意在最后回答你们的问题。

I think there are two scaling paradigms that happened in AI research

over the past few years.
我认为过去几年在 AI 研究中出现了两种扩展范式。

The first paradigm is next-token prediction, also called pre-training.
第一种范式是下一个词（Token）的预测，也称为“预训练”（Pre-training）。

What’s amazing about next-token prediction is that it's essentially

a world-building machine.
下一个词预测之所以令人惊叹，是因为它本质上是一个“世界构建机”（World-building Machine）。

The model learns to understand the world by predicting the next word, fundamentally because certain sequences are caused by initial actions which are irreversible, so the model learns some of the physics of the world.
模型通过预测下一个词来理解世界，本质上是因为某些序列（Sequence）是由初始动作（Initial Actions）引起的，这种因果关系是不可逆的，所以模型能够学到世界的一些物理规律。

Tokens can be anything—strings, words, pixels—so the model must understand how the world works to predict what's next.
词（Token）可以是任意的东西，比如字符串（Strings）、单词（Words）、像素（Pixels）等，因此模型必须理解世界的运行方式才能预测接下来会发生什么。

Next-token prediction is essentially massive multitask learning.
下一个词的预测本质上是大规模的多任务学习（Massive Multitask Learning）。

During pre-training, some tasks are easy, such as translation, while others, like physics, problem-solving generation, logical expressions, and spatial reasoning, are very hard.
在预训练期间，有些任务很容易，例如翻译；而另一些任务，如物理知识、问题求解生成、逻辑表达（Logical Expressions）和空间推理（Spatial Reasoning），则非常困难。

Tasks involving computation, like math, require a "Chain of Thought" or extra computational resources during inference.
涉及数学这类需要大量计算的任务，需要使用“思维链”（Chain of Thought）或更多的计算资源。

creative writing is particularly challenging because it involves world-building, storytelling, and maintaining plot coherence, making it easy for the model to lose coherence with just a small mistake.
创造性写作（Creative Writing）尤其困难，因为它涉及构建世界（World-building）、讲故事（Storytelling）以及保持情节连贯性（Plot Coherence），模型很容易因为细微的错误而导致情节完全失去连贯性。

Evaluating creative writing is also difficult, making it one of the hardest open-ended AI research problems today.
创造性写作的评估也很困难，因此它是当今最难的开放式（Open-ended）AI 研究问题之一。

From 2020 to 2021, the first major product based on scaling pre-training was GitHub Copilot, which used billions of code tokens from open-source projects.
从 2020 到 2021 年，基于扩展预训练的首个主要产品是 GitHub Copilot，它使用了开源项目中的数十亿代码 token。

Researchers improved its usability through Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF).
研究人员通过人类反馈强化学习（Reinforcement Learning from Human Feedback，RLHF）和 AI 反馈强化学习（Reinforcement Learning from AI Feedback，RLAIF）提升了它的实用性。

This introduced a "post-training" phase focused on completing functions, generating multi-line completions, and predicting diffs.
这引入了一个“后训练”（Post-training）阶段，专注于补全函数、生成多行代码以及预测代码差异（Diffs）。

The next major paradigm shift, published by OpenAI last year, is scaling reinforcement learning using "Chain of Thought" (CoT), enabling models to tackle highly complex reasoning tasks.
去年 OpenAI 提出了另一个重要范式转变，即利用“思维链”（Chain of Thought，CoT）扩展强化学习（Scaling Reinforcement Learning），使模型能够处理高度复杂的推理任务。

In CoT, models spend significantly more computational time reasoning step-by-step through problems.
在思维链中，模型会花费更多的计算时间逐步推理，解决问题。

A major design challenge is how to present the model's complex thought processes to users without making them wait too long.
一个主要的设计挑战是如何将模型复杂的思考过程呈现给用户，同时避免用户等待时间过长。

This year is considered the "year of agents," characterized by complex reasoning, tool use, and long-context interactions.
今年被称为“智能体之年”（Year of agents），今年的特点是复杂推理（Complex Reasoning）、工具使用（Tool Use）和长上下文（Long-context）的互动。

The next stage will be agents evolving into co-innovators through creativity enabled by human-AI collaboration.
下一个阶段，智能体将通过人类与 AI 的协作实现创造力，演变为共同创新者（Co-innovators）。

Future product research will involve rapidly iterating between highly complex models and smaller, distilled models.
未来的产品研究将涉及高复杂模型与更小、更快速的蒸馏模型（Distilled Models）之间的快速迭代。

Design challenges include making unfamiliar capabilities feel familiar (e.g., using file uploads), and enabling modular product features that scale easily.
设计挑战包括使陌生的功能显得熟悉（例如通过文件上传）以及设计能够轻松扩展的模块化产品功能。

Trust remains a key bottleneck; solutions include better collaborative interfaces allowing real-time user feedback and verification.
信任仍然是关键瓶颈；解决方案包括开发更好的协作界面，使用户能实时反馈和验证。

Innovative tools like Claude's Slack integration, ChatGPT tasks, and Canvas illustrate the potential of collaborative and multimodal AI interfaces.
创新工具，如 Claude 的 Slack 整合、ChatGPT 的任务功能和 Canvas，展示了协作与多模态（Multimodal）AI 接口的潜力。

Ultimately, the future involves "invisible software creation," allowing anyone, even without coding experience, to create and deploy tools through AI.
最终的未来愿景是“无形的软件创造”，即使没有编程经验的人也能通过 AI 创建和部署工具。

AI interfaces will evolve into highly personalized, multimodal, and interactive Canvases, fundamentally changing how we interact with technology and the internet.
AI 界面将发展成高度个性化、多模态、互动的“画布”（Canvas），从根本上改变我们与技术和互联网的交互方式。

“My prediction is that you will click less and less on internet links, and the way you will access the internet will be via model lenses, which will be much cleaner and in a much more personalized way.”

我预测，未来你在互联网上的点击量会越来越少；你访问网络的方式将会通过“模型之镜”完成，不仅界面更简洁，也更高度个性化。

“And you can imagine having very personalized multimodal outputs: let’s say if I say I want to learn more about the solar system, instead of it giving me a text output, it should give you a 3D interactive visualization of the solar system, and you can have highly rich interactive features to learn more.”

你可以想象这样一种个性化的多模态体验：比如我想深入了解太阳系，与其给我一段文字，不如呈现一个可交互的 3D 太阳系可视化界面，并配备丰富的交互功能，帮助我更直观、更深入地学习。

“I think there will be this sort of cool future of generative entertainment for people to learn and share new games with other people.”

我设想这样一个很酷的未来：以“生成式娱乐”为媒介，人们不仅可以学习，还能随时与他人一起创造并分享全新的游戏体验。

“I think the way I’m thinking about it is the kind of interface to AI is a blank canvas that kind of molds to your intent. So for example you come to the work today and your intention is to just write code, then the canvas becomes more of an IDE—like a cursor or like a coding IDE, although future programming might change.”

在我看来，与 AI 的交互界面就像一块“空白画布”，会根据你的意图自动定制。

如果你今天的目标只是写代码，这块画布就会变成一个类似 IDE 的开发环境：自动生成光标、代码补全、调试工具等（当然，未来的编程模式也许会进一步演进）。

“Or if you’re a writer and you decided to write a novel together, the model can start creating tools on the fly for you such that it will be much easier for you to brainstorm or edit the writing or create character plots and visualize the structure of the plot itself.”

又或者你是一名作家，想和 AI 一起创作小说，模型便会即时为你生成写作辅助工具，让你更轻松地进行头脑风暴、修改文稿、构思角色线索，并可视化展示故事结构。

“Finally, I think the co‑innovation is actually going to happen with co‑direction creative collaboration with the models itself, and it’s through collaboration with highly reasoning agent systems that will be extremely capable of superhuman tasks to create new novels, films, games, and essentially new science, new knowledge creation.”

最后，我相信“共同创新”将真正实现于人与模型的“共创共导”——通过与高度推理的智能体系统协作，这些系统将具备超越人类的能力，共同创作小说、电影、游戏，乃至推动全新的科学发现与知识创造。

“Cool. Um, thank you so much.”

太酷了。嗯，非常感谢大家！