Rust 写 AI Agent：架构设计与核心实现拆解

1 3月 2026
10 分钟阅读
标签:
Rust,
AI Agent,
架构设计,
LLM,
Tool Calling

你用 Python 写过 Agent 吗？LangChain 跑个 demo 很快，但真要上线——内存涨、并发拉胯、一个工具超时整个链路卡死——这些问题大概率你都遇到过。

我最近把一个日志分析 Agent 从 Python 迁到了 Rust。不是为了炫技，是因为这个 Agent 要 7×24 跑，每天处理几万条日志，Python 版本隔三差五 OOM。迁完之后内存稳定在 50MB 以下，工具调用延迟从 200ms+ 降到 30ms 左右。这篇文章把我在这个过程中沉淀的架构设计和关键实现分享出来。

一个 AI Agent 到底需要哪几块

先把概念收窄。抛开那些花哨的定义，一个能跑在生产环境的 Agent 就四个核心模块：

模块	职责	类比
LLM Client	调用大模型，拿到推理结果	大脑
Tool Registry	管理和执行外部工具	手脚
Memory	维护上下文和历史	记忆
Agent Loop	驱动"规划→执行→观察"循环	神经系统

再加一个贯穿始终的东西：护栏（Guardrails）。没有护栏的 Agent 就是一颗定时炸弹——无限循环、越权调用、token 爆炸，哪个都够你喝一壶。

项目结构：先把边界画清楚

Rust 项目最怕的就是模块耦合。我的做法是按职责严格分目录，每个模块只暴露 trait 或最小接口：

agent-rs/
  src/
    main.rs              # 组装入口
    types.rs             # 核心数据结构
    agent/
      mod.rs
      loop.rs            # Agent 主循环
    llm/
      mod.rs             # LlmClient trait
    tools/
      mod.rs             # Tool trait + Registry
      http.rs            # HTTP 工具
      fs.rs              # 文件读取工具
    memory/
      mod.rs
      short_term.rs      # 对话上下文
      long_term.rs       # 持久化存储

这个结构的好处：换模型厂商只改 llm/，加新工具只改 tools/，Memory 从文件换成 Redis 只改 memory/。Agent Loop 本身不需要动。

数据协议：先定"怎么通信"再写逻辑

Agent 的核心交互就两件事：LLM 告诉你"我要干什么"，工具告诉 LLM"结果是什么"。把这个协议先定死，后面写逻辑就顺了。

// src/types.rs
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ChatMessage {
    pub role: String,        // "system" | "user" | "assistant" | "tool"
    pub content: String,
    pub name: Option<String>, // tool name（role 为 "tool" 时使用）
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ToolSpec {
    pub name: String,
    pub description: String,
    pub input_schema: serde_json::Value, // JSON Schema
}

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum AgentAction {
    ToolCall {
        tool_name: String,
        input: serde_json::Value,
    },
    Final {
        answer: String,
    },
}

AgentAction 用 serde(tag = "type") 做标签枚举，LLM 输出的 JSON 直接反序列化。比如 {"type":"ToolCall","tool_name":"read_file","input":{"path":"./log.txt"}} 就能直接解析成 AgentAction::ToolCall。

为什么不用各家厂商的原生 tool calling 协议？因为每家格式不一样，用 JSON 输出是最大公约数。你接 OpenAI、Anthropic、通义、智谱都能用同一套解析逻辑。

Tool 层：用 trait 把"能力"变成插件

工具层是 Agent 的手脚。设计上有一个关键决策：用 trait 抽象，而不是硬编码。

// src/tools/mod.rs
use async_trait::async_trait;
use serde_json::Value;
use anyhow::Result;

#[async_trait]
pub trait Tool: Send + Sync {
    fn name(&self) -> &str;
    fn spec(&self) -> crate::types::ToolSpec;
    async fn call(&self, input: Value) -> Result<Value>;
}

三个方法，职责清晰：name() 返回标识符，spec() 返回 JSON Schema 描述（喂给 LLM 让它知道怎么调用），call() 执行实际逻辑。Send + Sync 约束是因为工具会在 tokio 异步运行时里跨线程调用。

然后用一个 Registry 统一管理：

pub struct ToolRegistry {
    tools: std::collections::HashMap<String, std::sync::Arc<dyn Tool>>,
}

impl ToolRegistry {
    pub fn new() -> Self {
        Self { tools: Default::default() }
    }

    pub fn register<T: Tool + 'static>(&mut self, tool: T) {
        self.tools.insert(tool.name().to_string(), std::sync::Arc::new(tool));
    }

    pub fn list_specs(&self) -> Vec<crate::types::ToolSpec> {
        self.tools.values().map(|t| t.spec()).collect()
    }

    pub fn get(&self, name: &str) -> Option<std::sync::Arc<dyn Tool>> {
        self.tools.get(name).cloned()
    }
}

Arc<dyn Tool> 让工具实例可以在多个 tokio task 之间共享。加新工具就是实现 Tool trait 然后 registry.register(MyNewTool)，Agent Loop 完全不用改。

两个示例工具

HTTP GET 工具——最常用的，抓网页、调 API 都靠它：

// src/tools/http.rs
pub struct HttpGetTool;

#[async_trait]
impl crate::tools::Tool for HttpGetTool {
    fn name(&self) -> &str { "http_get" }

    fn spec(&self) -> crate::types::ToolSpec {
        crate::types::ToolSpec {
            name: self.name().into(),
            description: "Send HTTP GET request and return response text".into(),
            input_schema: serde_json::json!({
                "type": "object",
                "properties": {
                    "url": {"type": "string"}
                },
                "required": ["url"]
            }),
        }
    }

    async fn call(&self, input: serde_json::Value) -> anyhow::Result<serde_json::Value> {
        let url = input.get("url")
            .and_then(|v| v.as_str())
            .ok_or_else(|| anyhow::anyhow!("missing url"))?;
        let resp = reqwest::get(url).await?.text().await?;
        Ok(serde_json::json!({ "text": resp }))
    }
}

文件读取工具——日志分析、配置检查这类场景离不开：

// src/tools/fs.rs
pub struct ReadFileTool;

#[async_trait]
impl crate::tools::Tool for ReadFileTool {
    fn name(&self) -> &str { "read_file" }

    fn spec(&self) -> crate::types::ToolSpec {
        crate::types::ToolSpec {
            name: self.name().into(),
            description: "Read a local text file (UTF-8)".into(),
            input_schema: serde_json::json!({
                "type": "object",
                "properties": { "path": {"type": "string"} },
                "required": ["path"]
            }),
        }
    }

    async fn call(&self, input: serde_json::Value) -> anyhow::Result<serde_json::Value> {
        let path = input.get("path")
            .and_then(|v| v.as_str())
            .ok_or_else(|| anyhow::anyhow!("missing path"))?;
        let text = tokio::fs::read_to_string(path).await?;
        Ok(serde_json::json!({ "text": text }))
    }
}

生产环境里你肯定还需要数据库查询、RPC 调用、消息队列写入这些工具。模式都一样：实现 trait，注册，完事。

记忆系统：短期靠窗口，长期靠持久化

Agent 的记忆分两层，解决的问题不一样。

短期记忆：对话上下文窗口

就是把最近 N 轮对话存在内存里，每次调 LLM 时带上。关键是要做窗口裁剪，不然 token 数会爆：

// src/memory/short_term.rs
use crate::types::ChatMessage;

pub struct ShortTermMemory {
    pub messages: Vec<ChatMessage>,
    pub max_messages: usize,
}

impl ShortTermMemory {
    pub fn new(max_messages: usize) -> Self {
        Self { messages: vec![], max_messages }
    }

    pub fn push(&mut self, msg: ChatMessage) {
        self.messages.push(msg);
        if self.messages.len() > self.max_messages {
            let overflow = self.messages.len() - self.max_messages;
            self.messages.drain(0..overflow);
        }
    }

    pub fn all(&self) -> &[ChatMessage] {
        &self.messages
    }
}

max_messages 设多少合适？我的经验是 20-30 条。太少 Agent 会"失忆"，太多 token 成本上去了。如果你的工具返回内容很长（比如读了一个大文件），可以在存入记忆前做摘要压缩。

长期记忆：事件流追加写

长期记忆的目的是追溯和审计。最简单的方案是 JSONL 文件追加写：

// src/memory/long_term.rs
use anyhow::Result;
use serde_json::Value;
use tokio::io::AsyncWriteExt;

pub struct LongTermMemory {
    path: String,
}

impl LongTermMemory {
    pub fn new(path: impl Into<String>) -> Self {
        Self { path: path.into() }
    }

    pub async fn append_event(&self, event: &Value) -> Result<()> {
        let mut f = tokio::fs::OpenOptions::new()
            .create(true).append(true)
            .open(&self.path).await?;
        f.write_all(event.to_string().as_bytes()).await?;
        f.write_all(b"\n").await?;
        Ok(())
    }
}

后续演进路径：JSONL → SQLite（加索引查询）→ 向量数据库（语义检索）。但 MVP 阶段 JSONL 够用，别过度设计。

LLM Client：一个 trait 隔离所有厂商差异

这层的设计哲学就一句话：Agent 不应该知道自己在跟哪家模型说话。

// src/llm/mod.rs
use async_trait::async_trait;
use anyhow::Result;
use crate::types::{ChatMessage, ToolSpec};

#[async_trait]
pub trait LlmClient: Send + Sync {
    async fn complete(
        &self,
        system_prompt: &str,
        messages: &[ChatMessage],
        tools: &[ToolSpec],
    ) -> Result<String>;
}

complete() 接收 system prompt、对话历史、工具列表，返回一个字符串（就是 LLM 的原始输出）。具体怎么拼 HTTP 请求、怎么处理流式响应、怎么做 token 计数，都封装在各厂商的实现里。

换模型的时候，只需要写一个新的 struct 实现 LlmClient，然后在 main.rs 里换一行初始化代码。Agent Loop 一个字都不用改。

Agent Loop：整个系统的心脏

到了最核心的部分。Agent Loop 驱动"规划→执行→观察"循环，直到 LLM 认为任务完成或者触发安全限制。

// src/agent/loop.rs
use anyhow::{Result, anyhow};
use crate::types::{AgentAction, ChatMessage};

pub struct AgentLoop<L: crate::llm::LlmClient> {
    pub llm: std::sync::Arc<L>,
    pub tools: crate::tools::ToolRegistry,
    pub short_memory: crate::memory::short_term::ShortTermMemory,
    pub long_memory: crate::memory::long_term::LongTermMemory,
    pub system_prompt: String,
    pub max_steps: usize,
}

impl<L: crate::llm::LlmClient> AgentLoop<L> {
    pub async fn run(&mut self, user_goal: &str) -> Result<String> {
        self.short_memory.push(ChatMessage {
            role: "user".into(),
            content: user_goal.into(),
            name: None,
        });

        for step in 0..self.max_steps {
            // 1. 把当前上下文和工具列表发给 LLM
            let tool_specs = self.tools.list_specs();
            let raw = self.llm
                .complete(&self.system_prompt, self.short_memory.all(), &tool_specs)
                .await?;

            // 2. 解析 LLM 的决策
            let action: AgentAction = serde_json::from_str(&raw)
                .map_err(|e| anyhow!("LLM 输出不是合法的 JSON: {e}\nraw={raw}"))?;

            match action {
                // 3a. 任务完成，返回结果
                AgentAction::Final { answer } => {
                    self.long_memory.append_event(&serde_json::json!({
                        "type": "final", "step": step, "answer": &answer
                    })).await?;
                    return Ok(answer);
                }
                // 3b. 需要调用工具
                AgentAction::ToolCall { tool_name, input } => {
                    let tool = self.tools.get(&tool_name)
                        .ok_or_else(|| anyhow!("工具不存在: {tool_name}"))?;

                    let output = tool.call(input).await?;

                    // 工具结果写入短期记忆，供下一轮推理
                    self.short_memory.push(ChatMessage {
                        role: "tool".into(),
                        name: Some(tool_name.clone()),
                        content: output.to_string(),
                    });

                    // 同时写入长期记忆，方便事后追溯
                    self.long_memory.append_event(&serde_json::json!({
                        "type": "tool_result",
                        "step": step,
                        "tool": tool_name,
                        "output": output
                    })).await?;
                }
            }
        }

        Err(anyhow!("达到最大步数 {}, Agent 未能完成任务", self.max_steps))
    }
}

整个循环的逻辑很直白：问 LLM → 解析动作 → 执行工具 → 把结果喂回去 → 再问 LLM。直到 LLM 返回 Final 或者步数用完。

System Prompt 怎么写

System prompt 决定了 Agent 的行为边界。我用的模板：

const SYSTEM_PROMPT: &str = r#"
You are a helpful AI agent.
You MUST respond in valid JSON matching one of:
1) {"type":"ToolCall","tool_name":"...","input":{...}}
2) {"type":"Final","answer":"..."}

Rules:
- Use tools when you need external data.
- Tool input MUST follow the tool's JSON schema.
- If you have enough information, respond with Final.
- If repeated tool calls yield similar results, summarize and Final.
"#;

最后一条规则很重要——它是防止无限循环的第一道防线。LLM 有时候会陷入"调工具→结果不满意→再调同一个工具"的死循环，这条规则能让它学会"差不多就行了"。

跑起来之前的工程化清单

代码能编译和代码能上线是两回事。几个实战中踩过的坑：

工具调用必须有超时和重试

HTTP 工具不加超时，一个慢接口就能卡死整个 Agent。用 reqwest 的话，建议全局设 10 秒超时，关键接口单独配。重试用指数退避，tokio-retry 或者手写几行都行。

并发执行独立工具

如果 LLM 一次规划了多个不相关的工具调用（比如同时查日志和查配置），可以用 tokio::join! 或 FuturesUnordered 并发执行。注意合并结果时要保持顺序，不然 LLM 会混淆哪个结果对应哪个工具。

可观测性不是可选项

tracing + tracing-subscriber 是标配。每次工具调用打一个 span，记录工具名、耗时、payload 大小、是否报错。线上出问题的时候，这些 trace 能救命。

护栏要从第一天就加

max_steps 是底线，20 步足够大多数场景
检测重复调用：连续 3 次调同一个工具且结果相似度超过 90%，强制结束
高风险工具（写文件、发请求、删数据）走白名单 + 人工确认

记忆别无脑塞

短期记忆做窗口裁剪（前面已经实现了）。长期记忆定期做摘要压缩，需要时用 RAG 检索，别把所有历史一股脑塞进 prompt。

什么场景该用 Rust 写 Agent

不是所有 Agent 都值得用 Rust 写。如果你只是做个 demo 或者内部小工具，Python + LangChain 更快。

但如果你的场景是这样的，Rust 值得考虑：

IO 密集 + 高并发：大量工具调用、网络请求，tokio 的异步模型比 Python 的 asyncio 稳定得多
7×24 长期运行：内存不会慢慢涨上去，不需要定期重启"续命"
资源敏感：同样的任务，Rust 版本的 CPU 和内存占用通常是 Python 的 1/5 到 1/10
基础设施级别的 Agent：要作为平台能力长期迭代维护的，Rust 的类型系统和编译检查能帮你挡住很多运行时 bug

这套架构我已经在生产环境跑了几个月，处理日志分析和异常告警的场景。如果你也在考虑用 Rust 做 Agent，可以从这个骨架开始，按自己的需求加工具、换模型、调记忆策略。代码不多，但每一层的边界是清楚的，后面怎么演进都不会太痛苦。