
在生成式人工智能应用中,情境真的是一切。LLM输出的质量与你提供的信息质量和数量密切相关。实际上,许多现实场景都伴随着庞大的背景:在庞大的代码库中生成代码、查询复杂的知识系统,甚至在研究理想度假地时进行漫无目的的漫无目的聊天(我们都经历过)。
不幸的是,LLM只能在有限的上下文条件下高效工作。这不仅仅是上下文窗口的硬性限制,尤其是现在前沿模型支持数十万甚至数百万个令牌。而且这些限制还在不断扩大。更大的挑战是所谓的上下文腐烂现象,即随着上下文长度增加,模型性能会下降。
这一效应在谢等人的论文《RULER:你的长上下文语言模型的真实上下文大小是多少?》中得到了清晰展示。作者介绍了RULER,这是一个评估长期上下文性能的新基准,并测试了一系列模型。结果显示出一个一致的模式:随着上下文长度的增加,所有模型的性能显著下降。在许多情况下,有效上下文长度(性能仍可与Llama 2–7B相当)仅约为广告上下文窗口的50%,甚至更短。
在他们最近的论文《递归语言模型》中,Zhang 等人提出了一种有前景的方法来解决上下文腐烂问题。在本文中,我想深入探讨这一理念,并探讨其在实际中的工作原理,利用DSPy最近新增的支持。
递归语言模型
递归语言模型(RLMs)被引入以应对上下文长度增加时的性能下降,并使LLM能够处理非常大的上下文(比模型原生上下文窗口多达两个数量级)。随着我们采用大型语言模型(LLM)处理数亿令牌的任务,这一点变得越来越重要。
目前已有多种方法用于处理长期任务。最常见的是上下文摘要(你在使用代码代理如Cursor时可能遇到过)。在此设置中,当对话或工作上下文接近模型的上下文极限时,会反复总结。然而,在实际作中,这往往会失效:早期步骤中的重要细节和细微差别往往会被遗忘,因为模型会“遗忘”旧信息以腾出空间给新标记。
递归语言模型采取了不同且出人意料的优雅方法。提示词不再每次都传递整个提示符给LLM,而是作为一组变量,在Python REPL(读取-评估-打印循环)环境中使用。模型配备了工具,可以检查这些变量或递归地调用程序选择的片段。
换句话说,RLM鼓励LLM编写代码,构建自己的子任务,然后通过递归调用自身来解决这些子任务。这使负担从将所有内容压缩到单一提示,转变为构建问题结构,使模型能够更可靠地导航大型语境。
作者在多个长期任务中评估了他们的方法,RLM在GPT-5和Qwen-3模型上都取得了强劲的成果。
结果看起来非常有希望。RLMs(无论是否带子调用)在所有任务中始终优于其他方法。让我们看看实际作中的效果如何。
实际使用 RLM
一个常见的情况是,长上下文在大量知识库中回答问题时非常出色。所以我们这里就要试试这个方法。我把所有在《迈向数据科学》上发表过的文章都加载到一个Markdown文件里。有些令人惊讶的是,这加起来几乎有1.5MB的文本。希望这足够让模型挖掘出来。
幸运的是,DSPy最近推出了递归语言模型推理方法的实现。这意味着我们不需要自己建造任何东西;我们只需要DSPy 3.1.2或更高版本。
pip install dspy --upgrade
我们先加载数据集。
with open('articles.md', 'r') as f:
articles = f.read()接下来,我们看看它包含多少标记。
import anthropic
client = anthropic.Anthropic(api_key=config['ANTHROPIC_API_KEY'])
# Count tokens for messages
token_count = client.messages.count_tokens(
model="claude-sonnet-4-5",
messages=[
{"role": "user", "content": articles}
]
)
print(f"Input tokens: {token_count.input_tokens}")
# Input tokens: 386768
我们的上下文中几乎有40万个令牌。Claude Sonnet 4.5支持20万上下文窗口,因此在单一提示中处理整个数据集不可行。这正是RLM发挥作用的地方。
要使用 DSPy,首先需要配置语言模型。如前所述,RLM 依赖代码与提示符交互,因此它最适合具备强大编码能力的模型。因此,我在这个例子中使用 Claude,因为它在代码相关任务中表现良好。
lm = dspy.LM('anthropic/claude-sonnet-4-5', api_key=config['ANTHROPIC_API_KEY'])
dspy.configure(lm=lm)接下来,我们通过指定其签名来初始化RLM。在这个用例中,我会把整套文章和一个问题一起传递出来,并期望模型提取关键趋势并以字符串列表的形式返回。
rlm = dspy.RLM('articles, question -> trends: list[str]')这里,我用 Claude Sonnet 4.5 来处理主模型和递归子调用。DSPy还允许你通过sub_lm参数使用更小的子调用模型,这有助于降低成本。
现在我们可以执行RLM,并在输出中访问该字段。trends
output = rlm(
articles = articles,
question = '''What were the main AI trends of 2025 based on provided
articles? Pay attention to the content not only the titles.'''
)
print('\n'.join(output.trends))
# Agentic AI and Autonomous Systems: Self-reflection patterns, iterative improvement through feedback loops (Self-Refine, Reflexion, CRITIC), and LLMs as reasoning engines that can autonomously plan and execute tasks
# Multi-Agent AI Systems: Evolution from single agents to collaborative teams of specialized agents with distinct roles, using frameworks like LangGraph, CrewAI, and AutoGen for orchestration
# Context Engineering and RAG: Moving beyond static prompting to dynamic retrieval-augmented generation (RAG), adaptive memory systems, and self-improving contexts that learn from experience without retraining
# Standardization Protocols: Emergence of Model Context Protocol (MCP) for standardizing LLM-tool integrations and Agent Communication Protocol (ACP) for inter-agent communication, reducing integration complexity
# Tool-Using LLMs and Function Calling: LLMs equipped with ability to invoke external tools, execute SQL queries, browse web, and interact with APIs through structured function calling mechanisms
# Production-Ready AI Frameworks: Mature ecosystem including LangGraph, DSPy, LangChain, NeMo Agent Toolkit, CrewAI, and AutoGen, focused on moving from prototype to production with built-in observability
# LLM Evaluation and Observability: LLM-as-judge evaluation patterns, comprehensive metrics frameworks (Ragas, DeepEval, Evidently), trajectory evaluation, and continuous monitoring as essential production infrastructure
# Programming Over Prompting: Shift toward declarative, code-based AI development with frameworks like DSPy and configuration-driven approaches (YAML-based) replacing prompt engineering
# Framework Interoperability: Tools designed to integrate across multiple frameworks rather than create silos, enabling composable AI architectures that leverage best features from different ecosystems
# Local and Cost-Effective LLM Deployment: Running smaller efficient models locally (Llama, Ollama) to reduce API costs and enable experimentation, with focus on cost-quality-latency tradeoffs
# SQL Agents and Data Analysis Automation: LLM agents specialized in data analysis tasks, generating and executing SQL queries, with applications in replacing or augmenting traditional data analyst workflows
# Production Quality and Accuracy Enhancement: Techniques for improving LLM accuracy including chain-of-thought reasoning, structured outputs through function calling, and iterative refinement for business-critical applications
整个过程大约持续了三分钟,结果是对我文章中主要主题的出人意料地合理总结。不过,我们今天不是来了解这些趋势本身的。更有趣的问题是RLM最初是如何做到这一点的。那么,让我们深入探讨。
RLM引擎盖底盘
当然,最有趣的部分是了解底层到底发生了什么。
RLM实现
如前所述,递归语言模型的核心思想是将长上下文视为外部环境的一部分,而非直接作为单一提示输入模型。相反,LLM编写Python代码,通过程序性检查、分解并递归调用子LLM处理较小的数据片段。
从高层次来看,该实现有几个核心特征:
提示
要真正理解这个机制,我发现检查与LLM之间实际发送的消息很有帮助。DSPy通过以下命令让这变得简单。
# Inspect the raw LLM calls - shows the actual prompts sent to the model
dspy.inspect_history(n=39)
这让我们能够全面了解每个步骤中与模型共享的内容。
我们先从系统信息说起。它定义了模型可用的输入变量(如RLM签名中指定),并列出模型可调用的函数,如递归LLM调用和标准库工具。print
值得注意的是,RLM 模块还允许你在初始化 RLM 时通过工具参数向 Python REPL 暴露自定义函数。在这个例子中,我只依赖默认函数,但在更高级的设置中,这可以成为一个强有力的扩展点。
Your input fields are:
1. `variables_info` (str): Metadata about the variables available in the REPL
2. `repl_history` (REPLHistory): Previous REPL code executions and their outputs
3. `iteration` (str): Current iteration number (1-indexed) out of max_iterations
Your output fields are:
1. `reasoning` (str): Think step-by-step: what do you know? What remains? Plan your next action.
2. `code` (str): Python code to execute.
All interactions will be structured in the following way, with the appropriate values filled in.
[[ ## variables_info ## ]]
{variables_info}
[[ ## repl_history ## ]]
{repl_history}
[[ ## iteration ## ]]
{iteration}
[[ ## reasoning ## ]]
{reasoning}
[[ ## code ## ]]
{code}
[[ ## completed ## ]]
In adhering to this structure, your objective is:
Given the fields `articles`, `question`, produce the fields `trends`.
You are tasked with producing the following outputs given the inputs `articles`, `question`:
- {trends} # note: the value you produce must adhere to the JSON schema: {"type": "array", "items": {"type": "string"}}
You have access to a Python REPL environment. Write Python code and it will be executed. You will see the output, then write more code based on what you learned. This is an iterative process.
Available:
- Variables: `articles`, `question` (your input data)
- `llm_query(prompt)` - query a sub-LLM (~500K char capacity) for semantic analysis
- `llm_query_batched(prompts)` - query multiple prompts concurrently (much faster for multiple queries)
- `print()` - ALWAYS print to see results
- `SUBMIT(trends)` - submit final output when done
- Standard libraries: re, json, collections, math, etc.
IMPORTANT: This is ITERATIVE. Each code block you write will execute, you'll see the output, then you decide what to do next. Do NOT try to solve everything in one step.
1. EXPLORE FIRST - Look at your data before processing it. Print samples, check types/lengths, understand the structure.
2. ITERATE - Write small code snippets, observe outputs, then decide next steps. State persists between iterations.
3. VERIFY BEFORE SUBMITTING - If results seem wrong (zeros, empty, unexpected), reconsider your approach.
4. USE llm_query FOR SEMANTICS - String matching finds WHERE things are; llm_query understands WHAT things mean.
5. MINIMIZE RETYPING (INPUTS & OUTPUTS) - When values are long, precise, or error-prone (IDs, numbers, code, quotes), re-access them via variables and parse/compute in code instead of retyping. Use small, targeted prints to sanity-check, but avoid manual copying when variables can carry the exact value.
6. SUBMIT ONLY AFTER SEEING OUTPUTS - SUBMIT ends the current run immediately. If you need to inspect printed output, run it in one step, review the result, then call SUBMIT in a later step.
You have max 50 sub-LLM calls. When done, call SUBMIT() with your output.
我们还来看看发送给LLM的第一条用户消息。这为模型提供了输入变量的概览:它们的类型、长度(以便模型知道一次能读取多少),以及内容预览以理解结构。
[[ ## variables_info ## ]]
[1] «««
Variable: `articles` (access it in your code)
Type: str
Total length: 1,392,921 characters
Preview:
```
---
title: "Agentic AI from First Principles: Reflection"
date: "2025-10-24T12:00:00-05:00"
category: "Agentic AI"
read_time: "21 min read"
url: "https://towardsdatascience.com/agentic-ai-from-first-principles-reflection/"
---
Arthur C. Clarke's Third Law says that "*any sufficiently advanced technology is indistinguishable from magic*". That's exactly how a lot of today's AI frameworks feel. Tools like GitHub Copilot, Claude Desktop, OpenAI Operator, and Perplexity Comet are automating everyda...
```
»»»
[2] «««
Variable: `question` (access it in your code)
Type: str
Total length: 114 characters
Preview:
```
What were the main AI trends of 2025 based on provided articles? Pay attention to the content not only the titles.
```
»»»
[3] «««
Variable: `verbose` (access it in your code)
Type: bool
Total length: 4 characters
Preview:
```
True
```
»»»
[[ ## repl_history ## ]]
You have not interacted with the REPL environment yet.
[[ ## iteration ## ]]
1/20
Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## code ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
这是模型根据用户消息中指定的格式的初始回复。
[[ ## reasoning ## ]]
I need to analyze articles to identify main AI trends of 2025. Let me start by exploring the data:
1. First, I'll examine the structure and content of the `articles` variable to understand what I'm working with
2. I'll need to look at multiple articles, not just titles, to understand the actual content
3. Then I'll use semantic analysis (llm_query) to extract and identify trends
4. Finally, I'll compile the trends into a list
Let me start by exploring the articles data to see its structure and get a sense of what's available.
[[ ## code ## ]]
# First, let's understand the structure of the articles
print("Type of articles:", type(articles))
print("\nLength of articles:", len(articles))
print("\n" + "="*80)
print("First 2000 characters:")
print(articles[:2000])
print("\n" + "="*80)
# Let's see if we can identify the structure - looks like articles are separated
# Let's try to split and count them
article_separator = "---\ntitle:"
article_parts = articles.split(article_separator)
print(f"\nNumber of article parts found: {len(article_parts)}")
print(f"First part length: {len(article_parts[0])} (likely empty or header)")
print(f"Second part length: {len(article_parts[1]) if len(article_parts) > 1 else 0}")
[[ ## completed ## ]]
原始日志让我们对LLM在底层如何与环境互动有了扎实的理解。接下来,让我们来看看LLM达到最终输出的轨迹。
发展历程
为了结构化地查看轨迹,我们可以检查。这使模型能够访问推理的每一步、生成的代码以及最终输出。本质上,它提供了所有REPL交互的全面视图。output.trajectory
# Examine the full trajectory - this shows all REPL interactions
for i, step in enumerate(output.trajectory):
print(f"\n{'='*60}")
print(f"STEP {i+1}")
print(f"{'='*60}")
print(f"\n📝 REASONING:\n{step['reasoning']}")
print(f"\n💻 CODE:\n{step['code']}")
print(f"\n📤 OUTPUT:\n{step['output'][:1000]}{'...' if len(step['output']) > 1000 else ''}")
回顾轨迹有助于我们理解LLM用来解决问题的主要模式。这次共计13步才得出最终答案。
第一步,正如预期,是探索变量:检查其结构并了解可用数据。这一初步探索为后续所有推理和子调用奠定了基础。articles
# First, let's understand the structure of the articles
print("Type of articles:", type(articles))
print("\nLength of articles:", len(articles))
print("\n" + "="*80)
print("First 2000 characters:")
print(articles[:2000])
print("\n" + "="*80)
# Let's see if we can identify the structure - looks like articles are separated
# Let's try to split and count them
article_separator = "---\ntitle:"
article_parts = articles.split(article_separator)
print(f"\nNumber of article parts found: {len(article_parts)}")
print(f"First part length: {len(article_parts[0])} (likely empty or header)")
print(f"Second part length: {len(article_parts[1]) if len(article_parts) > 1 else 0}")
接着,LLM意识到总共有40篇文章,每篇文章之间以 、 和 等元数据字段,后面跟着主要内容。仅仅从一个1.5 MB的文本文件中识别出这个结构,就已经是一项重大成就。---\ntitle:titledatecategoryread_timeurl
基于这一理解,LLM解析了文章,生成了文章标题和主要主题的摘要。
# Parse articles into structured format
import re
# Split by the article separator
raw_articles = articles.split("---\ntitle:")
# Process each article (skip the first empty element)
parsed_articles = []
for i, raw_article in enumerate(raw_articles[1:], 1): # Skip first empty element
# Add back the "title:" that was removed by split
raw_article = "title:" + raw_article
# Extract metadata using regex
title_match = re.search(r'title:\s*"([^"]*)"', raw_article)
date_match = re.search(r'date:\s*"([^"]*)"', raw_article)
category_match = re.search(r'category:\s*"([^"]*)"', raw_article)
# Find where content starts (after the second "---")
content_start = raw_article.find("---", 10) # Skip the first part
content = raw_article[content_start+3:].strip() if content_start != -1 else ""
parsed_articles.append({
'index': i,
'title': title_match.group(1) if title_match else "Unknown",
'date': date_match.group(1) if date_match else "Unknown",
'category': category_match.group(1) if category_match else "Unknown",
'content': content,
'content_length': len(content)
})
print(f"Successfully parsed {len(parsed_articles)} articles\n")
print("="*80)
print("Sample of articles (titles and categories):")
print("="*80)
for i, article in enumerate(parsed_articles[:10], 1):
print(f"{i}. [{article['category']}] {article['title']}")
print(f" Content length: {article['content_length']} chars")
print()
在解析文章后,LLM决定专注于与AI相关的内容,因为问题特别涉及AI趋势。
# Let's focus on AI-related articles (AI, Agentic AI, LLM Applications, ChatGPT categories)
# and sample strategically to understand content-based trends
ai_related = [a for a in parsed_articles if a['category'] in [
'Artificial Intelligence', 'Agentic AI', 'LLM Applications', 'ChatGPT', 'Deep Learning'
]]
接下来,LLM确定了它想要重点关注的关键文章。这是基于相关性和内容的主观选择。为了详细分析这些,它使用递归子调用,程序化地处理每篇文章。
为了高效管理工作负载,LLM将文章分成两批并按顺序处理。每篇文章的内容还被截断为前1.5万字符。这一策略使模型能够处理大量文本而不超出上下文窗口,同时仍能捕捉每篇文章中最重要的洞见。
# Create prompts to extract AI trends from article content
# I'll include a substantial portion of each article's content for analysis
prompts_batch1 = []
for i, article in enumerate(selected_for_analysis[:8]): # First batch of 8
# Take first ~15K chars of content to stay within LLM limits while getting substance
content_sample = article['content'][:15000]
prompt = f"""Analyze this article about AI from 2025 and identify the main AI trends, technologies, methodologies, or concepts it discusses.
Article Title: {article['title']}
Article Content (excerpt):
{content_sample}
Based on the content above, list the key AI trends, technologies, or concepts discussed in this article. Focus on:
- New frameworks, tools, or methodologies mentioned
- Emerging patterns in AI development (e.g., agentic AI, multi-agent systems)
- Important techniques or approaches
- Production/deployment considerations
Provide a concise list of 3-5 main trends/concepts from this article."""
prompts_batch1.append(prompt)
print(f"Created {len(prompts_batch1)} prompts for first batch")
print("\nProcessing first batch of articles...")
# Query the first batch
results_batch1 = llm_query_batched(prompts_batch1)
print(f"\nReceived {len(results_batch1)} results")
print("\nResults from first batch:")
print("="*80)
for i, (article, result) in enumerate(zip(selected_for_analysis[:8], results_batch1), 1):
print(f"\n{i}. {article['title']}")
print("-"*80)
print(result)
print()
最后,LLM再次审查了完整的文章列表,确定了更多相关主题,并以同样方式使用递归子调用进行处理。所有信息检索完毕后,它将结果合成到变量中(代表主要AI趋势的字符串列表),并提交为最终答案。trends_2025
##### STEP 12 #####
# Now let me synthesize all findings into comprehensive AI trends
# I'll review all the results and create a structured summary
print("SYNTHESIZING ALL FINDINGS FROM 20 ANALYZED ARTICLES")
print("="*80)
print("\nBased on deep content analysis of 20 AI-focused articles from 2025,")
print("here are the main AI trends identified:\n")
# Compile the comprehensive list of trends
trends_2025 = [
"Agentic AI and Autonomous Systems: Self-reflection patterns, iterative improvement through feedback loops (Self-Refine, Reflexion, CRITIC), and LLMs as reasoning engines that can autonomously plan and execute tasks",
"Multi-Agent AI Systems: Evolution from single agents to collaborative teams of specialized agents with distinct roles, using frameworks like LangGraph, CrewAI, and AutoGen for orchestration",
"Context Engineering and RAG: Moving beyond static prompting to dynamic retrieval-augmented generation (RAG), adaptive memory systems, and self-improving contexts that learn from experience without retraining",
"Standardization Protocols: Emergence of Model Context Protocol (MCP) for standardizing LLM-tool integrations and Agent Communication Protocol (ACP) for inter-agent communication, reducing integration complexity",
"Tool-Using LLMs and Function Calling: LLMs equipped with ability to invoke external tools, execute SQL queries, browse web, and interact with APIs through structured function calling mechanisms",
"Production-Ready AI Frameworks: Mature ecosystem including LangGraph, DSPy, LangChain, NeMo Agent Toolkit, CrewAI, and AutoGen, focused on moving from prototype to production with built-in observability",
"LLM Evaluation and Observability: LLM-as-judge evaluation patterns, comprehensive metrics frameworks (Ragas, DeepEval, Evidently), trajectory evaluation, and continuous monitoring as essential production infrastructure",
"Programming Over Prompting: Shift toward declarative, code-based AI development with frameworks like DSPy and configuration-driven approaches (YAML-based) replacing prompt engineering",
"Framework Interoperability: Tools designed to integrate across multiple frameworks rather than create silos, enabling composable AI architectures that leverage best features from different ecosystems",
"Local and Cost-Effective LLM Deployment: Running smaller efficient models locally (Llama, Ollama) to reduce API costs and enable experimentation, with focus on cost-quality-latency tradeoffs",
"SQL Agents and Data Analysis Automation: LLM agents specialized in data analysis tasks, generating and executing SQL queries, with applications in replacing or augmenting traditional data analyst workflows",
"Production Quality and Accuracy Enhancement: Techniques for improving LLM accuracy including chain-of-thought reasoning, structured outputs through function calling, and iterative refinement for business-critical applications"
]
print("MAIN AI TRENDS OF 2025:")
print("="*80)
for i, trend in enumerate(trends_2025, 1):
print(f"\n{i}. {trend}")
print("\n" + "="*80)
print(f"\nTotal trends identified: {len(trends_2025)}")
print("\nThese trends are based on content analysis of articles covering:")
print("- Agentic AI systems and design patterns")
print("- Multi-agent architectures and frameworks")
print("- Production deployment and evaluation")
print("- Emerging protocols and standardization")
print("- Tool use and external system integration")
print("- Cost-effective and accurate LLM applications")
##### STEP 13 #####
# Submit the final list of AI trends from 2025
SUBMIT(trends_2025)
我们可以看到模型相当有创意:它理解了我们内容的结构,解析了文章,并用递归调用逐条处理。然而,这一分析中存在一个显著的空白。它完全忽略了按年份筛选,尽管每篇文章的元数据都明确包含了发表日期。
为此,我再次提出请求,这次强调了2025年筛选的重要性,以确保模型只聚焦于当年的文章。
output2 = rlm(
articles = articles,
question = '''What were the main AI trends of 2025 based on provided
articles? Pay attention to the content not only the titles.
Don\'t forget to filter by year, this information is present in
metadata for each article.'''
)
这一次,模型考虑了年份,并进行了更复杂的解析,使用正则表达式提取每篇文章的发表日期。
import re
from datetime import datetime
# Split articles by the separator "---" at the beginning of metadata blocks
# The pattern is: --- followed by metadata, followed by --- followed by content
article_blocks = re.split(r'\n---\n', articles)
print(f"Total article blocks found: {len(article_blocks)}")
print("\n" + "="*80)
# Let's examine the structure - articles seem to start with "---\ntitle:"
# Let me try a different approach - find all articles by their metadata pattern
article_pattern = r'---\ntitle: "(.*?)"\ndate: "(.*?)"\ncategory: "(.*?)"\nread_time: "(.*?)"\nurl: "(.*?)"\n---\n(.*?)(?=\n---\ntitle:|$)'
matches = re.findall(article_pattern, articles, re.DOTALL)
print(f"Articles found with regex: {len(matches)}")
if len(matches) > 0:
print("\nFirst article metadata:")
print(f" Title: {matches[0][0]}")
print(f" Date: {matches[0][1]}")
print(f" Category: {matches[0][2]}")
print(f" Content preview (first 200 chars): {matches[0][5][:200]}")
接着,它按年份筛选文章,发现了13篇来自2025年的文章。
# Parse articles and filter for 2025
articles_2025 = []
for match in matches:
title, date_str, category, read_time, url, content = match
# Extract year from date string (format: "2025-10-24T12:00:00-05:00")
year = date_str.split('-')[0]
if year == "2025":
articles_2025.append({
'title': title,
'date': date_str,
'category': category,
'content': content.strip()
})
print(f"Total articles: {len(matches)}")
print(f"Articles from 2025: {len(articles_2025)}")
print("\n" + "="*80)
print("2025 Articles:")
print("="*80)
for i, article in enumerate(articles_2025, 1):
print(f"\n{i}. {article['title']}")
print(f" Date: {article['date']}")
print(f" Category: {article['category']}")
print(f" Content length: {len(article['content'])} chars")
随后,它利用递归子调用分析了这13篇文章,并生成了更新的趋势列表。这次任务正确完成了。和许多其他例子一样,它强调了提出清晰、明确的问题以及明确我们希望模型遵循的标准和行动的重要性。
Agentic AI and Multi-Agent Systems: Building autonomous AI agents capable of multi-step reasoning, tool use, planning, and reflection, often with multiple specialized agents collaborating on complex tasks
Code Agents: AI agents that execute tool calls using actual code (Python) instead of JSON-based tool calling, enabling dynamic function creation and achieving higher success rates with fewer steps
Model Context Protocol (MCP): Anthropic's standardization protocol for connecting AI applications to external tools and data sources, reducing integration complexity from M*N to M+N and enabling reusable, framework-agnostic components
Agent Communication Protocol (ACP): Emerging open protocol under the Linux Foundation for standardizing communication between AI agents via RESTful APIs, enabling interoperability across different frameworks
Reflection and Self-Refinement Patterns: LLMs reviewing and improving their own outputs through iterative feedback loops, including self-feedback, verbal reinforcement learning, and tool-interactive critiquing, achieving 10-30% accuracy improvements
Framework Ecosystem Proliferation: Multiple competing frameworks including LangGraph, smolagents, CrewAI, DSPy, and NeMo Agent Toolkit, with growing emphasis on interoperability and declarative configuration approaches
Production-Ready LLM Infrastructure: Moving beyond prototypes to address 'day 2' problems like API exposure, observability, monitoring, evaluation frameworks, and deployment at scale
Parameter-Efficient Fine-Tuning (PEFT) and LoRA: Techniques for customizing LLMs by updating only small subsets of parameters, enabling task-specific optimization while reducing computational costs and enabling on-premises deployment
Advanced Fine-Tuning with Memory Experts: Lamini's Mixture of Memory Experts (MoME) using ~1 million LoRA adapters for near-perfect factual accuracy (95%) with zero loss on specific information
Shift from Prompting to Programming Paradigm: Frameworks like DSPy treating LLM tasks as modular programming rather than manual prompt crafting, with structured signatures and reusable components
LLM Evaluation and Quality Assurance: Comprehensive evaluation frameworks (Evidently, DeepEval, MLFlow, LangSmith) for testing, monitoring, and ensuring reliability in production, especially for regulated industries
RAG (Retrieval-Augmented Generation): Providing relevant context and knowledge bases to enhance LLM precision and enable specialized capabilities in agents
Tool Use and Orchestration: AI systems dynamically selecting and executing tools from multiple sources, with standardized tool definitions and safety controls
YAML-Based Declarative Configuration: Defining LLM workflows, models, and agent behavior through configuration files rather than purely code-based implementations
Interpretable AI and Rule-Based Systems: Using decision trees, extracted rules, and transparent models as alternatives to black-box approaches for compliance, explainability, and rapid deployment in regulated industries
总体而言,递归语言模型提供了一种实用且优雅的方式,推动上下文长度的极限,使大型语言模型能够更好地处理复杂且大规模的任务。然而,清晰的说明和细致的指导仍然是获得最佳效果的关键。
