[Agent] ACE（Agentic Context Engineering）和Dynamic Cheatsheet學習筆記

[Agent] ACE（Agentic Context Engineering）和Dynamic Cheatsheet學習筆記
0x00 概述
0x01 ACE
0x02 Dynamic Cheatsheet
0x03 DC代碼分析
- 3.1 prompt
- 3.2 核心代碼
0x04 ACE的優化
- 4.1 以DC作為基石
  - 4.1.1 DC 適用場景
  - 4.1.2 提示詞
    - Generator的提示詞
    - Curator的提示詞
- 4.2 關鍵創新

0x00 概述

前幾天，斯坦福的ACE（Agentic Context Engineering）非常火。只看論文感覺還是理解不深，但是該論文并沒有釋放對應的源碼。不過，ACE是基于Dynamic Cheatsheet完成，且兩篇論文有共同作者，于是就找Dynamic Cheatsheet的論文和源碼進行解讀，得到本文。

Building on the agentic architecture of Dynamic Cheatsheet [41], ACE incorporates a modular workflow of generation, reflection, and curation, while adding structured, incremental updates guided by a grow-and-refine principle. This design preserves detailed, domain-specific knowledge, prevents context collapse, and yields contexts that remain comprehensive and scalable throughout adaptation.

基于動態備忘錄[41]的代理架構，ACE整合了一個包含生成、反思和策展的模塊化工作流程，并增加了由成長和完善原則指導的結構化、增量更新。這種設計保留了詳細、特定領域的知識，防止了上下文的崩潰，并產生了在整個適應過程中保持全面和可擴展的上下文。

Agentic Context Engineering的信息。

論文標題：Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
論文地址：https://www.arxiv.org/abs/2510.04618
概要：不依賴模型重新訓練，而是讓上下文自主進化，通過反復生成、反思并編輯自己的提示，直至成為一個自我完善的系統。

Dynamic Cheatsheet的信息。

論文標題：Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
論文地址：https://www.arxiv.org/abs/2504.07952v1
論文概要：動態備忘錄 (DC) 框架通過為黑盒 LLM 配備可演進的外部記憶，使其在測試時能夠學習和重用問題解決策略，為推理時學習和持續學習研究提供了新的思路。

0x01 ACE

1.1 背景

在人工智能領域，有一種提升模型表現的方法，叫做上下文適配(context adaptation)。這種方法不是通過改變模型內部的參數來實現，而是通過在輸入數據中添加更明確的指示、結構化的推理步驟或者特定領域的輸入格式來提升模型的性能。

上下文適配在很多AI系統中扮演著重要角色，比如幫助指導任務完成的系統提示、存儲過往信息和經驗的記憶功能，以及用來減少錯誤和補充知識的事實依據。然而，這種方法也面臨兩個挑戰：

第一個挑戰是“簡約偏置”問題。一些優化提示詞的工具傾向于使用簡短、通用的指令，這樣做可能會忽略掉積累知識的需要。比如，有些系統認為簡短就是好，但這種偏好可能會忽略掉在實際操作中非常重要的專業知識、工具使用指導或者常見的錯誤模式。這樣的優化雖然在某些方面有效，但往往無法捕捉到智能體或知識密集型應用所需的細節策略。
第二個挑戰是“上下文塌縮”問題。依賴大型語言模型（LLM）對整個提示進行重寫，隨著時間的推移，往往會退化成更短、更模糊的摘要，這會導致性能急劇下降。在交互式智能體、特定領域的編程、金融或法律分析等任務中，系統的性能依賴于保留詳細的、與任務相關的知識，而不是簡單地將其壓縮掉。

簡而言之，上下文適配是一種通過改善輸入數據來提升AI模型性能的方法，但它需要克服簡約偏置和上下文塌縮這兩個問題，以確保模型能夠充分利用詳細的知識和指令來完成任務。

1.2 思路

可以把ACE看作是一個智能助手，它不僅僅是簡單地記住一些指令或者壓縮信息，而是像一本活的操作手冊，能夠隨著時間的推移不斷學習和改進。

這個智能助手的理念是，與其把知識簡化成簡短的摘要，不如讓它包含豐富、詳細的內容，并且能夠隨著新信息的加入而不斷更新。這樣，當智能助手需要解決問題時，它就能夠從這些詳細的信息中提取出最有用的部分來幫助我們。

這個智能助手的工作方式被設計成三個部分，就像一個團隊各自負責不同的任務：

生成器：它產生推理軌跡，負責創建解決問題的步驟和路徑。
反思器：它負責回顧這些步驟，看看哪些做得好，哪些需要改進。即從成功和錯誤中提煉具體的見解。
整理器：它負責將反思的結果整理成新的、更有用的信息，然后更新到操作手冊中。即將這些見解整合到結構化的上下文更新中。

1.3 工作流程

生成器/反思器/整理器這個團隊的工作是不斷改進一本工作手冊，讓它成為解決問題的最佳指南。這個團隊的工作流程是這樣的：

首先，團隊中的“創意人員（生成器）”會針對新的任務或問題，設計出一系列的解決步驟，這些步驟不僅會展示出有效的解決方法，還會指出常見的錯誤和陷阱。
接著，“分析師（反思器）”會對這些解決步驟進行仔細的評估，從中提取出寶貴的經驗和教訓。這個過程可以重復多次，每次都會讓這些經驗更加完善。
然后，“整合師（整理器）”會把這些經驗整理成簡短的更新條目，這些條目會被添加到工作手冊中，以更新和改進它。這個過程不需要復雜的邏輯，只需要一些簡單的規則就能完成。

由于這些更新是分開進行的，所以可以同時處理多個更新，這樣可以快速地改進工作手冊，讓它包含更多的知識和經驗。此外，這個團隊的工作方式還允許他們多次回顧和改進同一個任務的解決方案，就像是反復練習一樣，每次都能讓工作手冊的內容更加豐富和準確。

總的來說，這個團隊通過不斷的創造、分析和整合，使得工作手冊能夠持續進化，始終保持最新和最有效的解決問題的方法。這個過程就像是我們人類的學習過程：我們嘗試做一些事情（實驗），然后思考我們做得怎么樣（反思），最后把學到的東西整合到我們的知識體系中（整合）。通過這種方式，智能助手可以避免因為嘗試做太多事情而變得混亂，而是能夠更有效地學習和成長。同時避免了將所有責任都壓在單一模型上的瓶頸。

0x02 Dynamic Cheatsheet

動態備忘錄（DC）是一種測試時學習方法，引入了一種可適應的外部記憶，用于存儲可重用的策略和代碼片段。通過不斷地用新遇到的輸入和輸出更新這個記憶，DC使模型能夠積累知識并在不同任務中重用它，通常比靜態提示方法帶來顯著改進。DC的一個關鍵優勢是它不需要真實標簽：模型可以從自己的生成中策劃自己的記憶，使該方法高度靈活且廣泛適用。

動態備忘錄推理過程中從過去的成功和失敗中積累策略和教訓。這些自然語言反饋方法代表了重大進步，為改進LLM系統提供了靈活且可解釋的信號，超越了權重更新。

2.1 總體思路

動態備忘錄可以讓復雜的計算機程序在解決問題時能夠變得更聰明。

計算機程序就像一個學生，以前它在解決新問題時總是從頭開始，沒有記住以前學到的東西。但現在，我們給它加了一個“動態備忘錄”，就像一個可以不斷更新的筆記本，幫助它記住解決問題的方法。

這個“動態備忘錄”不需要改變程序本身的設置，只是在程序外面加了一個可以學習和記憶的工具。這樣，程序在解決新問題時，就可以參考以前的經驗，做得更好，也不容易重復犯同樣的錯誤。

這個新方法有兩個主要的部分：一部分負責幫助程序生成解決問題的策略，另一部分負責管理這些策略，讓它們能夠被有效地存儲和更新。

論文作者還嘗試了兩種不同的“動態備忘錄”策略，一種是讓程序逐漸積累記憶，另一種是通過檢索和合成來更新記憶。實驗結果表明，這種方法在解決數學問題、邏輯謎題和知識問答等復雜任務時都非常有效，程序的性能得到了顯著提升。

這項研究還發現，記憶增強的效果會受到程序大小、使用的工具和任務類型等因素的影響，這為未來如何讓程序在解決問題時學習得更好提供了新的想法。

2.2 四種模式

DC有四種模式（default、FullHistoryAppending、DynamicCheatsheet_Cumulative、Dynamic_Retrieval）

2.2.1 Defalt模式（無記憶模式）：固定 “空記憶” 與無歷史參數

特點：無記憶機制，每次查詢獨立處理。
實現邏輯
- 僅使用當前輸入文本和固定模板生成回答，完全不參考歷史信息。
- cheatsheet 固定為(empty)，不會更新或存儲任何歷史知識。
- 適合無需上下文依賴的孤立查詢。

2.2.2 FullHistoryAppending 模式（完整歷史拼接模式）：循環拼接歷史輸入輸出

特點：直接拼接所有歷史記錄作為上下文，無篩選機制。
- 通過for循環遍歷original_input_corpus和generator_outputs_so_far，將所有歷史 “輸入 + 輸出” 拼接到curated_cheatsheet中，無相似度計算邏輯。
實現邏輯：
- 將之前所有的輸入輸出對（previous_input_output_pairs）完整附加到當前查詢的上下文中。
- 不使用結構化的 cheatsheet，而是依賴原始對話歷史。

2.2.3 DynamicCheatsheet_Cumulative模式（結構化增量記憶模式）：多輪 “生成→更新記憶” 循環

特點：維護一個增量更新的結構化記憶（cheatsheet），持續積累知識。
- 存在for循環（控制迭代輪次），且每輪都包含 “生成回答” 和 “更新 cheatsheet” 兩步邏輯，依賴cheatsheet_template。
實現邏輯：
- 每次查詢后，根據當前問題和模型回答，通過cheatsheet_template對 cheatsheet 進行更新（保留有用信息，剔除冗余內容）。
- 新查詢時，將當前 cheatsheet 作為上下文傳入，幫助模型復用歷史知識。

2.2.4 Dynamic_Retrieval模式（相似歷史檢索模式）：余弦相似度計算 + TopK 篩選

特點：基于相似度檢索歷史信息，動態選擇相關記憶。
- 調用cosine_similarity計算當前輸入與歷史輸入的相似度，通過np.argsort篩選top_k條相關歷史；Synthesis 子模式額外包含 “基于檢索結果生成結構化記憶” 的步驟。
實現邏輯：
- 預先存儲歷史輸入的嵌入向量（original_input_embeddings）。
- 對新查詢計算嵌入，通過余弦相似度檢索最相關的top_k條歷史輸入。
- 僅將檢索到的相關歷史信息傳入模型，而非全部歷史。

2.3.4 總結

我們使用下表對這四種模型進行簡單總結。

模式	記憶機制	適用場景	核心優勢
`default`	無記憶	孤立查詢	輕量、無歷史依賴
`FullHistoryAppending`	完整歷史拼接	上下文強相關且簡短的場景	保留完整上下文
`DynamicCheatsheet_Cumulative`	結構化增量記憶	連續問題求解（如數學推理）	知識積累與復用，去冗余
`Dynamic_Retrieval`	相似度檢索相關歷史	大規模或多樣化查詢	精準篩選有用信息，效率高

這些模式的設計體現了從 “無記憶” 到 “選擇性記憶” 再到 “結構化記憶” 的演進，以適應不同場景下對歷史信息利用的需求。

2.3 模式對比

我們接下來對四種模式（default、FullHistoryAppending、DynamicCheatsheet_Cumulative、Dynamic_Retrieval）進行詳細對比，其核心區別在于對歷史信息的處理方式和記憶機制。

2.3.1 FullHistoryAppending vs 動態備忘錄

default 沒有什么對比價值，我們直接看FullHistoryAppending和DC（以DynamicCheatsheet_Cumulative為例）的對比。

歷史信息處理方式

FullHistoryAppending：
- 將所有之前的輸入輸出都附加到備忘錄。
- 使用固定格式展示歷史案例。
- 不會對歷史信息進行修改和更新。
DynamicCheatsheet_Cumulative：
- 在每一輪都積累更新備忘錄內容。
- 可以選擇性地將之前輪次的答案添加到備忘錄中。
- 每輪都會通過專門的備忘錄模板來更新備忘錄內容。

迭代機制

FullHistoryAppending：
- 只執行一輪生成，不支持多輪迭代。
- 直接使用已有的歷史記錄作為上下文。
DynamicCheatsheet_Cumulative：
- 支持多輪迭代。
- 每輪都會更新備忘錄并在下一輪使用更新后的內容。

備忘錄更新機制

FullHistoryAppending：
- 備忘錄是靜態的，只能簡單的拼接歷史記錄。
- 不會修改或者優化備忘錄內容。
DynamicCheatsheet_Cumulative：
- 動態更新，使用專門的備忘錄模板來提取、更新備忘錄內容。
- 每輪都會使用新的備忘錄。

使用場景

FullHistoryAppending：
- ”回顧歷史“，適用于需要完整歷史記錄但是不需要動態調整備忘錄得場景。
- 或者適合上下文關聯性極強，但歷史記錄量較小的場景（否則可能因上下文過長導致效率下降）。
DynamicCheatsheet_Cumulative：
- ”學習和改進“，適合逐步優化和完善備忘錄得場景。即適合需要逐步積累經驗的場景（如連續的數學推理、問題求解），記憶隨查詢序列動態生長。

2.3.2 動態累計 vs 動態檢索

我們來看看Dynamic Cheatsheet內部的兩種動態模式的區別。

歷史信息選擇方式

DynamicCheatsheet_Cumulative：
- 使用所有歷史信息或者按順序積累的歷史答案。
- 不進行相似性檢索，而是直接使用全部歷史或者逐步累積的信息。
Dynamic_Retrieval：
- 使用embedding和余弦相似度來檢索與當前輸入最相似的歷史輸入輸出對。
- 只選擇top-k個最相關的歷史案例。
- 基于語義相似性而非時間順序選擇歷史信息。

上下文構建方式

DynamicCheatsheet_Cumulative：
- 可以選擇將之前所有輪次的答案都加入到備忘錄中。
- 備忘錄內容隨輪次增長而增長。
Dynamic_Retrieval：
- 構建的備忘錄只包含與當前問題最相似的幾個歷史案例。
- 添加了如何使用這些歷史案例的指導說明。

壓縮和篩選

DynamicCheatsheet_Cumulative：主要通過迭代提煉和模板提取實現壓縮
- 信息篩選壓縮。
  - 通過參數控制是否將歷史答案添加到備忘錄。
  - 使用模板從模型輸出中提取關鍵信息形成新的備忘錄。實際上刪除了大量冗余內容。
  - 從大量輸出中提取精煉的備忘錄內容。實際上刪除了大量冗余內容。
  - 通過專門備忘錄提取步驟實現信息壓縮。刪除了無關內容。
- 迭代優化壓縮。
  - 每輪迭代都基于上一輪結果生成更精簡的備忘錄。
Dynamic_Retrieval：通過相似性檢索，從大量歷史數據中選擇最相關的部分。
- 相似性檢索壓縮。
  - 只選擇最相關的K個歷史案例。刪除了無關歷史。
- 語義層面壓縮。
  - 不是簡單的時間順序選擇，而是基于語義相似性的智能篩選。
  - 只保留與當前問題最相關的上下文信息。

計算復雜度

DynamicCheatsheet_Cumulative：
- 計算簡單，只需累積歷史答案。但是備忘錄內容可能隨著時間增長變得很大。
Dynamic_Retrieval：
- 需要計算embedding和相似度，計算復雜度較高。但是傳遞給模型的歷史信息較少（topk）。

適用場景

DynamicCheatsheet_Cumulative：
- 歷史信息較少，且大部分有用的情況。
- 適合逐步學習和完善解決方案的場景。
Dynamic_Retrieval：
- 歷史案例較多，但是只有部分和當前問題相關的情況。
- 適合處理大量歷史數據的情況。
- 適合大規模場景或多樣化查詢，避免無關信息干擾。

0x03 DC代碼分析

3.1 prompt

摘錄 prompts\curator_prompt_for_dc_retrieval_synthesis.txt

# CHEATSHEET CURATOR

## Purpose and Goals
You are responsible for maintaining, refining, and optimizing the Dynamic Cheatsheet, which serves as a compact yet evolving repository of problem-solving strategies, reusable code snippets, and meta-reasoning techniques. Your goal is to enhance the model’s long-term performance by continuously updating the cheatsheet with high-value insights while filtering out redundant or trivial information.

- The cheatsheet should include quick, accurate, reliable, and practical solutions to a range of technical and creative challenges. 
- After seeing each input, you should improve the content of the cheatsheet, synthesizing lessons, insights, tricks, and errors learned from past problems and adapting to new challenges.

---

### Core Responsibilities

Selective Knowledge Retention:
- Preserve only high-value strategies, code blocks, insights, and reusable patterns that significantly contribute to problem-solving.
- Discard redundant, trivial, or highly problem-specific details that do not generalize well.
- Ensure that previously effective solutions remain accessible while incorporating new, superior methods.

Continuous Refinement & Optimization:
- Improve existing strategies by incorporating more efficient, elegant, or generalizable techniques.
- Remove duplicate entries or rephrase unclear explanations for better readability.
- Introduce new meta-strategies based on recent problem-solving experiences.

Structure & Organization:
- Maintain a well-organized cheatsheet with clearly defined sections:
  - Reusable Code Snippets and Solution Strategies
  - General Problem-Solving Heuristics
  - Optimization Techniques & Edge Cases
  - Specialized Knowledge & Theorems
- Use tagging (e.g., Q14, Q22) to reference previous problems that contributed to a given strategy.

---

## Principles and Best Practices

For every new problem encountered:
1. Evaluate the Solution’s Effectiveness  
   - Was the applied strategy optimal?
   - Could the solution be improved, generalized, or made more efficient?
   - Does the cheatsheet already contain a similar strategy, or should a new one be added?

2. Curate & Document the Most Valuable Insights
   - Extract key algorithms, heuristics, and reusable code snippets that would help solve similar problems in the future.
   - Identify patterns, edge cases, and problem-specific insights worth retaining.
   - If a better approach than a previously recorded one is found, replace the old version.

3. Maintain Concise, Actionable Entries
   - Keep explanations clear, actionable, concise, and to the point.
   - Include only the most effective and widely applicable methods.
   - Seek to extract useful and general solution strategies and/or Python code snippets.

4. Implement a Usage Counter
   - Each entry must include a usage count: Increase the count every time a strategy is successfully used in problem-solving.
   - Use the count to prioritize frequently used solutions over rarely applied ones.
---

3.2 核心代碼

DC的核心代碼解讀如下：

def advanced_generate(self,
    approach_name: str,
    input_txt: str,
    cheatsheet: str = None,
    generator_template: str = None,
    cheatsheet_template: str = None,
    temperature: float = 0.0,
    max_tokens: int = 2048,
    max_num_rounds: int = 1,
    allow_code_execution: bool = True,
    code_execution_flag: str = "EXECUTE CODE!",
    add_previous_answers_to_cheatsheet: bool = True,
    original_input_corpus: List[str] = None,
    original_input_embeddings: np.ndarray = None,
    generator_outputs_so_far: List[str] = None,
    retrieve_top_k: int = 3,
) -> Tuple[str, str, str, str]:
    """
    語言模型生成響應的核心方法，支持四種不同的歷史信息處理模式（default/FullHistoryAppending/DynamicCheatsheet_Cumulative/Dynamic_Retrieval）
    每種模式通過差異化的上下文（cheatsheet）處理邏輯，適配不同場景下的模型生成需求

    Arguments:
        approach_name : str : 選擇的生成模式名稱（決定上下文處理邏輯）
        input_txt : str : 用戶當前輸入的問題文本
        cheatsheet : str : 上下文記憶載體（不同模式下含義不同，如結構化記憶/歷史拼接內容）
        generator_template : str : 生成器模板（含[[QUESTION]]/[[CHEATSHEET]]等占位符，用于拼接prompt）
        cheatsheet_template : str : 記憶更新模板（僅DynamicCheatsheet系列模式需用，用于生成新cheatsheet）
        temperature : float : 生成采樣溫度（0.0為確定性生成，值越高隨機性越強）
        max_tokens : int : 生成文本的最大token數（控制輸出長度）
        max_num_rounds : int : 最大生成輪次（僅DynamicCheatsheet_Cumulative支持多輪迭代）
        allow_code_execution : bool : 是否允許模型觸發代碼執行（如數學計算、數據處理場景）
        code_execution_flag : str : 觸發代碼執行的特定標識（模型輸出包含此標識時執行代碼）
        add_previous_answers_to_cheatsheet : bool : 是否將歷史回答加入當前cheatsheet（僅多輪模式生效）
        original_input_corpus : List[str] : 歷史輸入文本集合（Dynamic_Retrieval模式用于檢索相似歷史）
        original_input_embeddings : np.ndarray : 歷史輸入的向量表示（用于計算相似度，支撐檢索邏輯）
        generator_outputs_so_far : List[str] : 歷史生成結果集合（與original_input_corpus一一對應）
        retrieve_top_k : int : 檢索模式下返回的最相似歷史條數（控制上下文相關性范圍）

    Returns:
        Tuple[str, str, str, str] : 包含生成回答、評估方案、答案校驗結果、新記憶的四元組
        （實際返回為字典，包含更詳細的步驟日志、歷史記錄等元信息）

    Raises:
        ValueError : 當模式所需的關鍵參數缺失時觸發（如Dynamic模式缺cheatsheet）
    """

    # ---------------------------- 1. default模式：無記憶模式 ------------------------------
    # 核心邏輯：不依賴任何歷史信息，僅用當前輸入生成回答，cheatsheet固定為"(empty)"
    if approach_name == "default":
        # 替換生成器模板的占位符：問題用當前輸入，記憶用空值（無歷史信息）
        generator_prompt = generator_template.replace("[[QUESTION]]", input_txt).replace("[[CHEATSHEET]]", "(empty)")
        # 構造模型輸入的對話歷史（僅包含當前prompt，無歷史上下文）
        generator_history = [
            {"role": "user", "content": generator_prompt},
        ]
        # 調用基礎生成方法，生成模型輸出（控制溫度、最大長度、是否允許代碼執行）
        generator_output = self.generate(
            history=generator_history,
            temperature=temperature,
            max_tokens=max_tokens,
            allow_code_execution=allow_code_execution,
            code_execution_flag=code_execution_flag,
        )

        # 從模型原始輸出中提取關鍵回答（過濾冗余內容，保留核心結果）
        generator_answer = extract_answer(
            generator_output,
        )

        # 返回結果字典：包含輸入文本、單輪步驟日志、最終回答（無記憶故cheatsheet為None）
        return {
            "input_txt": input_txt,
            "steps": [
                {
                    "round": 0,  # 僅1輪生成
                    "generator_prompt": generator_prompt,  # 傳入模型的完整prompt
                    "generator_output": generator_output,  # 模型原始輸出
                    "generator_answer": generator_answer,  # 提取后的核心回答
                    "current_cheatsheet": None,  # 當前記憶（無）
                    "new_cheatsheet": None,       # 新生成記憶（無）
                }
            ],
            "previous_answers": None,  # 無歷史回答
            "final_answer": generator_answer,  # 最終返回的核心回答
            "final_output": generator_output,  # 最終返回的原始輸出
            "final_cheatsheet": None,  # 最終記憶（無）
            "generator_output": generator_output,
        }
    
    # ------------- 2. DynamicCheatsheet_Cumulative模式：結構化增量記憶模式 -----------------
    # 核心邏輯：維護一個可迭代更新的結構化cheatsheet，多輪生成中持續積累歷史知識
    elif approach_name == "DynamicCheatsheet_Cumulative":
        # 校驗關鍵參數：此模式必須傳入初始cheatsheet和記憶更新模板，否則報錯
        
        # 初始化步驟日志（記錄每輪生成細節）和歷史回答列表（用于多輪間傳遞信息）
        steps = []
        previous_answers = []

        # 初始化生成結果（多輪迭代后存儲最終輸出）
        generator_output = ''

        # 多輪迭代生成（至少1輪，最多max_num_rounds輪）
        for round in range(max(1, max_num_rounds)):
            ## 步驟1：基于當前記憶（cheatsheet）生成本輪回答
            # 基礎記憶為初始cheatsheet（后續輪次會更新）
            generator_cheatsheet_content = cheatsheet

            # 若為多輪（非第1輪）且允許加入歷史回答，則將過往回答拼接到當前記憶中
            if round > 0 and add_previous_answers_to_cheatsheet:
                # 格式化歷史回答（標注輪次，用分號分隔）
                previous_answers_txt = f"PREVIOUS ANSWERS:\n{'; '.join(previous_answers)}"
                # 更新記憶內容：原始cheatsheet + 歷史回答
                generator_cheatsheet_content = f"{generator_cheatsheet_content}\n\n{previous_answers_txt}"

            # 替換生成器模板：問題用當前輸入，記憶用更新后的cheatsheet
            generator_prompt = generator_template.replace("[[QUESTION]]", input_txt).replace("[[CHEATSHEET]]", generator_cheatsheet_content)
            # 記錄本輪生成前的當前記憶（用于后續日志和對比）
            current_cheatsheet = cheatsheet

            # 構造模型輸入的對話歷史（僅包含當前prompt）
            generator_history = [{"role": "user", "content": generator_prompt}]
            # 調用基礎生成方法，生成本輪模型輸出
            generator_output = self.generate(
                history=generator_history,
                temperature=temperature,
                max_tokens=max_tokens,
                allow_code_execution=allow_code_execution,
                code_execution_flag=code_execution_flag,
            )
            # 提取本輪核心回答（過濾冗余內容）
            generator_answer = extract_answer(generator_output)

            ## 步驟2：基于本輪回答更新結構化記憶（cheatsheet）
            # 替換記憶更新模板：問題、本輪回答、上一輪記憶分別填入對應占位符
            cheatsheet_prompt = cheatsheet_template.replace("[[QUESTION]]", input_txt).replace("[[MODEL_ANSWER]]", generator_output).replace("[[PREVIOUS_CHEATSHEET]]", current_cheatsheet)

            # 構造記憶更新的對話歷史（僅包含cheatsheet生成prompt）
            cheatsheet_history = [{"role": "user", "content": cheatsheet_prompt}]
            # 生成新的cheatsheet（禁用代碼執行，最大長度為生成器的2倍，確保記憶完整性）
            cheatsheet_output = self.generate(
                history=cheatsheet_history,
                temperature=temperature,
                max_tokens=2*max_tokens,
                allow_code_execution=False,
            )

            # 從記憶生成輸出中提取新cheatsheet：若無有效內容則保留舊記憶（避免記憶丟失）
            new_cheatsheet = extract_cheatsheet(response=cheatsheet_output, old_cheatsheet=current_cheatsheet)
            # 更新cheatsheet為新記憶（供下一輪迭代使用）
            cheatsheet = new_cheatsheet

            # 將本輪回答加入歷史回答列表（供后續輪次使用，若開啟add_previous_answers_to_cheatsheet）
            previous_answers.append(f"Round {round+1}: {generator_answer}")
        
            # 記錄本輪步驟日志（包含prompt、輸出、記憶變化等細節）
            steps.append({
                "round": round,
                "generator_prompt": generator_prompt,
                "generator_output": generator_output,
                "generator_answer": generator_answer,
                "current_cheatsheet": current_cheatsheet,  # 本輪生成前的記憶
                "new_cheatsheet": new_cheatsheet,         # 本輪生成后的新記憶
            })

        # 返回結果字典：包含多輪步驟日志、歷史回答、最終記憶和最終回答
        return {
            "input_txt": input_txt,
            "steps": steps,
            "previous_answers": previous_answers,
            "final_answer": generator_answer,
            "final_cheatsheet": new_cheatsheet,  # 多輪迭代后的最終記憶
            "final_output": generator_output,
        }
    
    # ------------------- 3. FullHistoryAppending模式：完整歷史拼接模式 ---------------------
    # 核心邏輯：無篩選拼接所有歷史輸入輸出對作為記憶，不做結構化處理
    elif approach_name == "FullHistoryAppending":
        # 獲取歷史生成結果的數量（決定拼接的歷史條數）
        length_of_history = len(generator_outputs_so_far)
        # 若有歷史記錄，則拼接所有歷史輸入輸出對作為記憶
        if length_of_history > 0:
            # 歷史輸入集合（與歷史輸出一一對應）
            top_k_original_inputs = original_input_corpus[:length_of_history]
            # 歷史輸出集合
            top_k_original_outputs = generator_outputs_so_far

            # 格式化拼接歷史：用markdown標題分隔，標注輸入序號和對應的模型回答
            curated_cheatsheet = "### PREVIOUS SOLUTIONS (START)\n\n"
            for i, (previous_input_txt, previous_output_txt) in enumerate(zip(original_input_corpus, generator_outputs_so_far)):
                curated_cheatsheet += f"#### Previous Input #{i+1}:\n\n{previous_input_txt}\n\n#### Model Solution to Previous Input #{i+1}:\n\n{previous_output_txt}\n---\n---\n\n"
            curated_cheatsheet += "#### PREVIOUS SOLUTIONS (END)"
        # 若無歷史記錄，則記憶固定為"(empty)"
        else:
            top_k_original_inputs = []
            top_k_original_outputs = []
            curated_cheatsheet = "(empty)"
        
        # 替換生成器模板：問題用當前輸入，記憶用拼接好的完整歷史
        generator_prompt = generator_template.replace("[[QUESTION]]", input_txt).replace("[[CHEATSHEET]]", curated_cheatsheet)
        # 構造模型輸入的對話歷史
        generator_history = [{"role": "user", "content": generator_prompt}]
        # 調用基礎生成方法，生成模型輸出
        generator_output = self.generate(
                history=generator_history,
                temperature=temperature,
                max_tokens=max_tokens,
                allow_code_execution=allow_code_execution,
                code_execution_flag=code_execution_flag,
            )
        # 提取核心回答
        generator_answer = extract_answer(generator_output)

        # 返回結果字典：包含拼接的歷史記憶、歷史輸入輸出集合、最終回答
        return {
            "input_txt": input_txt,
            "steps": [
                {
                    "round": 0,  # 僅1輪生成
                    "generator_prompt": generator_prompt,
                    "generator_output": generator_output,
                    "generator_answer": generator_answer,
                    "current_cheatsheet": curated_cheatsheet,  # 拼接后的完整歷史記憶
                    "new_cheatsheet": None,  # 此模式不更新記憶，僅使用歷史
                }
            ],
            "top_k_original_inputs": top_k_original_inputs,  # 歷史輸入集合
            "top_k_original_outputs": top_k_original_outputs,  # 歷史輸出集合
            "final_answer": generator_answer,
            "final_output": generator_output,
            "final_cheatsheet": curated_cheatsheet,  # 最終使用的歷史記憶
        }
    
    # --------- 4. Dynamic_Retrieval/DynamicCheatsheet_RetrievalSynthesis模式：相似歷史檢索模式 ------------------------------
    # 核心邏輯：基于向量相似度檢索最相關的top_k條歷史，用檢索結果作為記憶（后者額外支持記憶結構化）
    elif approach_name in ["Dynamic_Retrieval", "DynamicCheatsheet_RetrievalSynthesis"]:
        # 提取當前輸入的向量表示（original_input_embeddings最后一個元素為當前輸入的embedding）
        current_original_input_embedding = original_input_embeddings[-1]
        # 提取歷史輸入的向量表示（排除當前輸入，僅保留過往記錄）
        prev_original_input_embeddings = original_input_embeddings[:-1]  # 可能為空（無歷史）
        
        # 若有歷史輸入，則計算相似度并檢索top_k條最相關歷史
        if len(prev_original_input_embeddings) > 0:
            # 計算當前輸入與所有歷史輸入的余弦相似度（衡量文本相關性）
            similarities = cosine_similarity([current_original_input_embedding], prev_original_input_embeddings)
            # 對相似度排序（降序），取前retrieve_top_k個索引（最相關的歷史序號）
            top_k_indices = np.argsort(similarities[0])[::-1][:retrieve_top_k]
            # 根據索引獲取最相關的歷史輸入、歷史輸出、對應的相似度值
            top_k_original_inputs = [original_input_corpus[i] for i in top_k_indices]
            top_k_original_outputs = [generator_outputs_so_far[i] for i in top_k_indices]
            top_k_similar_values = similarities[0][top_k_indices]
            # 格式化檢索到的歷史：添加提示文本（提醒模型批判性使用歷史，避免盲目復制）
            curated_cheatsheet = "### PREVIOUS SOLUTIONS (START)\n\nNote: The input-output pairs listed below are taken from previous test cases and are meant to assist you in understanding potential solution strategies or tool usages. While they can offer insight and inspiration, they should not be blindly copied, as they may contain errors or may not fit your specific use case. Approach them with a critical mindset—analyze their logic, verify their correctness, and adapt them as needed. Your goal should be to develop a well-reasoned solution that best addresses the problem at hand.\n\n"
        # 若無歷史輸入，則記憶固定為"(empty)"
        else:
            top_k_original_inputs = []
            top_k_original_outputs = []
            top_k_similar_values = []
            curated_cheatsheet = '(empty)'
        
        # 拼接檢索到的歷史：按相似度降序排列，標注序號和相似度值（便于模型判斷相關性）
        for i, (previous_input_txt, previous_output_txt, similarity) in enumerate(zip(top_k_original_inputs[::-1], top_k_original_outputs[::-1], top_k_similar_values[::-1])):
            curated_cheatsheet += f"#### Previous Input #{i+1} (Similarity: {similarity:.2f}):\n\n{previous_input_txt}\n\n#### Model Solution to Previous Input  #{i+1}:\n\n{previous_output_txt}\n---\n---\n\n"
        # 去除拼接后的多余空白字符（保證格式整潔）
        curated_cheatsheet = curated_cheatsheet.strip()
        
        # 若記憶非空，則添加結束標記（保證格式完整性）
        if curated_cheatsheet != '(empty)':
            curated_cheatsheet += "\n\n#### PREVIOUS SOLUTIONS (END)"

        # 若為DynamicCheatsheet_RetrievalSynthesis模式：額外將檢索歷史結構化（生成新cheatsheet）
        previous_cheatsheet = cheatsheet
        if approach_name == "DynamicCheatsheet_RetrievalSynthesis":
            # 替換記憶更新模板：歷史輸入輸出對、當前問題、上一輪記憶分別填入對應占位符
            cheatsheet_prompt = cheatsheet_template.replace("[[PREVIOUS_INPUT_OUTPUT_PAIRS]]", curated_cheatsheet)
            cheatsheet_prompt = cheatsheet_prompt.replace("[[NEXT_INPUT]]", input_txt)
            cheatsheet_prompt = cheatsheet_prompt.replace("[[PREVIOUS_CHEATSHEET]]", previous_cheatsheet)
            # 構造記憶更新的對話歷史
            cheatsheet_history = [{"role": "user", "content": cheatsheet_prompt}]
            # 生成結構化的新cheatsheet（禁用代碼執行）
            cheatsheet_output = self.generate(
                history=cheatsheet_history,
                temperature=temperature,
                max_tokens=2*max_tokens,
                allow_code_execution=False,
            )
            # 提取新cheatsheet：若無有效內容則保留檢索到的歷史（避免記憶丟失）
            new_cheatsheet = extract_cheatsheet(response=cheatsheet_output, old_cheatsheet=curated_cheatsheet)
            # 更新記憶為結構化后的新cheatsheet
            curated_cheatsheet = new_cheatsheet

        # 替換生成器模板：問題用當前輸入，記憶用檢索（或結構化后）的歷史
        generator_prompt = generator_template.replace("[[QUESTION]]", input_txt).replace("[[CHEATSHEET]]", curated_cheatsheet)
        # 構造模型輸入的對話歷史
        generator_history = [{"role": "user", "content": generator_prompt}]
        # 調用基礎生成方法，生成模型輸出
        generator_output = self.generate(
                history=generator_history,
                temperature=temperature,
                max_tokens=max_tokens,
                allow_code_execution=allow_code_execution,
                code_execution_flag=code_execution_flag,
            )
        # 提取核心回答
        generator_answer = extract_answer(generator_output)

        # 返回結果字典：包含檢索到的歷史、步驟日志、最終記憶和最終回答
        return {
            "input_txt": input_txt,
            "steps": [
                {
                    "round": 0,  # 僅1輪生成
                    "generator_prompt": generator_prompt,
                    "generator_output": generator_output,
                    "generator_answer": generator_answer,
                    "current_cheatsheet": curated_cheatsheet,  # 檢索（或結構化后）的記憶
                    "new_cheatsheet": None,  # 僅Synthesis子模式更新記憶，此處統一為None
                }
            ],
            "top_k_original_inputs": top_k_original_inputs,  # 檢索到的歷史輸入
            "top_k_original_outputs": top_k_original_outputs,  # 檢索到的歷史輸出
            "final_answer": generator_answer,
            "final_output": generator_output,
            "final_cheatsheet": curated_cheatsheet,  # 最終使用的記憶
        }
    
    # 若輸入的模式名稱不在支持列表中，觸發參數錯誤

0x04 ACE的優化

4.1 以DC作為基石

ACE使用了DC作者官方發布的實現，并將其設置為使用累積模式（DC-CU）。因此我們再仔細看看DC四種模式對應的場景。

4.1.1 DC 適用場景

不同模式的設計初衷對應不同的業務需求，選擇的核心是 “當前任務是否需要歷史信息”“需要多少歷史信息”“歷史信息是否需要結構化”。

模式名稱	典型實際案例	場景核心需求解讀
default（無記憶模式）	1. 單次常識查詢：“地球赤道周長是多少？”2. 簡單工具調用：“把‘Hello’翻譯成法語”3. 獨立計算：“15 的 30% 是多少？”	任務本身無需依賴任何過往交互，每個查詢都是孤立的。核心需求是 “快速響應”，不需要歷史信息干擾，避免冗余計算。
FullHistoryAppending（完整歷史拼接模式）	1. 短對話連續提問：“幫我寫一首關于春天的詩→把詩里的‘桃花’改成‘櫻花’→再增加一段關于微風的描寫”2. 單主題分步操作：“新建一個 Excel 表格→在 A1 單元格輸入‘姓名’→在 B1 單元格輸入‘年齡’”	任務是同一主題下的連續操作，歷史信息（如前序對話、前序步驟）必須完整保留，且歷史量較小（通常不超過 5 輪）。核心需求是 “上下文連貫”，不需要篩選歷史，因為所有過往內容都與當前任務強相關。
DynamicCheatsheet_Cumulative（結構化增量記憶模式）	1. 多步驟數學推理：“先計算函數 f (x)=2x+3 在 x=5 時的值→再求 f (x) 的導數→最后計算導數在 x=2 時的值”2. 復雜問題拆解：“分析某產品銷量下降的原因→第一步看市場需求變化→第二步看競品動作→第三步看自身供應鏈問題”3. 長期學習輔助：“記錄用戶每天的英語錯題→次日復習時基于錯題生成新練習題”	任務是需要逐步積累知識的長流程，每一步的結果都要作為 “中間結論” 保存，供后續步驟復用。核心需求是 “知識結構化沉淀”，避免歷史信息雜亂無章，同時支持多輪迭代優化（如基于前序結論修正當前邏輯）。
Dynamic_Retrieval / Synthesis（相似歷史檢索模式）	1. 大規模題庫解題：“學生在刷題系統中問‘如何解一元二次方程 x2-5x+6=0’→系統檢索過往相似的‘一元二次方程求解’案例，提供參考思路”2. 多領域客服咨詢：“用戶問‘我的手機充電時屏幕閃爍怎么辦’→客服系統檢索歷史中‘安卓手機充電屏幕異常’的相似案例，輔助生成解決方案”3. 個性化推薦輔助：“用戶想‘選一款適合拍 vlog 的相機’→系統檢索歷史中‘同類用戶（如新手 vlogger）的相機推薦記錄’，優化推薦結果”	任務是歷史量龐大（如成百上千條）且查詢多樣化，大部分歷史信息與當前任務無關，需要精準篩選。核心需求是 “歷史信息精準匹配”，避免全量歷史導致的模型輸入過長、響應變慢，同時通過相似度排序保證參考價值。（Synthesis 子模式額外適用于 “需要將相似歷史結構化總結” 的場景，如 “檢索 10 條相機推薦記錄后，生成一份‘新手 vlogger 相機選購指南’”）

4.1.2 提示詞

既然使用DC，我們就看看針對DC的兩個階段（Generator和

Generator的提示詞

ACE Generator prompt on FINER

You are an analysis expert tasked with answering questions using your knowledge, a curated playbook of strategies and insights and a reflection that goes over the diagnosis of all previous mistakes made while answering the question.

Instructions: - Read the playbook carefully and apply relevant strategies, formulas, and insights - Pay attention to common mistakes listed in the playbook and avoid them - Show your reasoning step-by-step - Be concise but thorough in your analysis - If the playbook contains relevant code snippets or formulas, use them appropriately - Double-check your calculations and logic before providing the final answer

Your output should be a json object, which contains the following fields: - reasoning: your chain of thought / reasoning / thinking process, detailed analysis and calculations - bullet_ids: each line in the playbook has a bullet_id. all bulletpoints in the playbook that’s relevant, helpful for you to answer this question, you should include their bullet_id in this list - final_answer: your concise final answer

Playbook:
{}
Reflection:
{}
Question:
{}
Context:
{}
Answer in this exact JSON format:
{
"reasoning": "[Your chain of thought / reasoning / thinking process, detailed analysis and calculations]",
"bullet_ids": ["calc-00001", "fin-00002"],
"final_answer": "[Your concise final answer here]"
}

翻譯成中文如下。

您是一位分析專家，負責利用您的知識、精選的策略和見解手冊以及對之前回答問題時所犯錯誤的反思來回答問題。

指示：
- 仔細閱讀手冊，并應用相關的策略、公式和見解
- 注意手冊中列出的常見錯誤并避免它們
- 逐步展示您的推理過程
- 在分析中既要簡潔又要全面
- 如果手冊包含相關的代碼片段或公式，請適當使用它們
- 在提供最終答案之前，再次檢查您的計算和邏輯

您的輸出應該是一個 JSON 對象，其中包含以下字段：

- reasoning：您的思維鏈/推理/思考過程，詳細分析和計算
- bullet_ids：手冊中的每一行都有一個 bullet_id。手冊中所有與回答這個問題相關且有幫助的要點，您都應該在列表中包含它們的 bullet_id
- final_answer：您簡潔的最終答案

劇本： {} 反思： {} 問題： {} 上下文： {}

以這種確切的 JSON 格式回答： { "reasoning": "[您的思維鏈/推理/思考過程，詳細分析和計算]", "bullet_ids": ["calc-00001", "fin-00002"], "final_answer": "[您簡潔的最終答案]" }

Curator的提示詞

 ACE Curator prompt on FINER

You are a master curator of knowledge. Your job is to identify what new insights should be added to an existing playbook based on a reflection from a previous attempt.

Context: - The playbook you created will be used to help answering similar questions. - The reflection is generated using ground truth answers that will NOT be available when the playbook is being used. So you need to come up with content that can aid the playbook user to create predictions that likely align with ground truth.

CRITICAL: You MUST respond with valid JSON only. Do not use markdown formatting or code blocks.

Instructions: - Review the existing playbook and the reflection from the previous attempt - Identify ONLY the NEW insights, strategies, or mistakes that are MISSING from the current playbook - Avoid redundancy - if similar advice already exists, only add new content that is a perfect complement to the existing playbook - Do NOT regenerate the entire playbook - only provide the additions needed - Focus on quality over quantity - a focused, well-organized playbook is better than an exhaustive one - Format your response as a PURE JSON object with specific sections - For any operation if no new content to add, return an empty list for the operations field - Be concise and specific - each addition should be actionable

Training Context:
Total token budget: {token_budget} tokens
Training progress: Sample {current_step} out of {total_samples}
Current Playbook Stats:
{playbook_stats}
Recent Reflection:
{recent_reflection}
Current Playbook:
{current_playbook}
Question Context:
{question_context}
Your Task: Output ONLY a valid JSON object with these exact fields: - reasoning: your chain of thought / reasoning / thinking process,
detailed analysis and calculations - operations: a list of operations to be performed on the playbook - type: the type of operation to be
performed - section: the section to add the bullet to - content: the new content of the bullet
Available Operations: 1. ADD: Create new bullet points with fresh IDs - section: the section to add the new bullet to - content: the new
content of the bullet. Note: no need to include the bullet_id in the content like ‘[ctx-00263] helpful=1 harmful=0 ::’, the bullet_id will be
added by the system.
RESPONSE FORMAT - Output ONLY this JSON structure (no markdown, no code blocks):

{
"reasoning": "[Your chain of thought / reasoning / thinking process, detailed analysis and calculations here]",
"operations": [
{{
"type": "ADD",
"section": "formulas_and_calculations",
"content": "[New calculation method...]"
}}
]
}

翻譯成中文如下：

FINER 策展人提示：

您是一位知識策展大師。您的工作是根據之前嘗試的反思，識別應該向現有劇本添加哪些新見解。

背景：- 您創建的劇本將用于幫助回答類似的問題。- 反思是使用真實答案生成的，在使用劇本時將無法獲得這些答案。因此，您需要提出內容，以幫助劇本用戶創建可能與真實答案一致的預測。

關鍵：您必須僅以有效的 JSON 格式響應。不要使用 markdown 格式或代碼塊。

指示：- 回顧現有劇本和之前嘗試的反思- 僅識別當前劇本中缺少的新見解、策略或錯誤- 避免重復 - 如果已經存在類似的建議，只添加與現有劇本完美補充的新內容- 不要重新生成整個劇本 - 只提供所需的補充- 注重質量而非數量 - 一個專注、組織良好的劇本比一個詳盡無遺的劇本更好- 將您的響應格式化為具有特定部分的純 JSON 對象- 對于任何操作，如果沒有新內容要添加，則在操作字段中返回空列表- 簡潔明了 - 每個補充都應該是可操作的

訓練背景： 總標記預算：{token_budget} 標記 訓練進度：樣本 {current_step} 共 {total_samples} 當前劇本統計： {playbook_stats} 最近的反思： {recent_reflection} 當前劇本： {current_playbook} 問題背景： {question_context} 您的任務：僅輸出包含這些確切字段的有效 JSON 對象：- reasoning：您的思維鏈/推理/思考過程，詳細分析和計算- operations：要對劇本執行的操作列表- type：要執行的操作類型- section：要添加項目符號的部分- content：項目符號的新內容 可用操作：1. ADD：創建具有新 ID 的新項目符號- section：要添加新項目符號的部分- content：新項目符號的內容。注意：無需在內容中包含項目符號 ID，如‘[ctx-00263] helpful=1 harmful=0 ::’，系統將添加項目符號 ID。

響應格式 - 僅輸出此 JSON 結構（無 markdown，無代碼塊）：

{ "reasoning": "[您的思維鏈/推理/思考過程，詳細分析和計算]", "operations": [ { "type": "ADD", "section": "formulas_and_calculations", "content": "[新的計算方法...]" } ] }

4.2 關鍵創新

為應對前文提到的簡約偏置與上下文塌縮問題，ACE 引入了三項關鍵創新：

增量式 Delta 更新機制：以局部編輯替代整體重寫，顯著降低延遲與計算開銷；
grow-and-refine 機制：在持續擴充的同時抑制冗余，實現上下文的穩態演化。
專職反思者模塊：將評估與洞見提取與curation過程解耦，提高上下文質量與下游性能；

4.2.1 增量式 Delta 更新

ACE 的核心設計理念是：將上下文表示為結構化的條目集合（bullets），而非單一的整體提示詞。每個條目包含兩部分：

元數據（metadata）：唯一標識符，以及「有用 / 有害」計數器；
內容（content）：比如可復用策略、領域概念或常見錯誤模式。

我們接下來做解讀。

想象一下，我們有一個智能助手（ACE），它不是用一個簡單的提示來解決問題，而是用一系列的小提示，就像是一系列的備忘錄或者清單。這些清單上的每個項目都包含了兩部分信息：

基本信息：比如一個獨特的名字，以及這個項目是幫助解決問題還是帶來了困擾的記錄。
具體內容：這可能包括一些有用的策略、專業知識或者常見的錯誤模式。

當我們的智能助手遇到新問題時，它會標記這些清單上的項目，告訴我們哪些是有用的，哪些可能會誤導我們。這樣，助手就可以知道哪些信息需要改進。

這種清單式的設計風格有三個主要優點：

針對性強：我們只更新那些真正需要改進的項目，而不是整個清單。
精確查找：助手可以快速找到最相關的信息，而不是在大量信息中盲目搜索。
逐步改進：在解決問題的過程中，我們可以有效地添加新信息、刪除過時的信息或者避免重復。

智能助手不會重新編寫整個清單，而是添加一些新的、精煉的項目。這些項目是經過助手深思熟慮后挑選出來的，可以幫助我們更好地解決問題。

這種方法避免了大規模重寫清單所需的大量計算和時間，同時還能保留舊知識并不斷吸收新見解。隨著時間的推移，這種機制使得智能助手能夠適應那些需要長時間或大量知識的復雜任務。

4.2.2 Grow-and-Refine

在持續增長的基礎上，ACE 通過定期或延遲蒸餾來確保上下文保持緊湊與相關性。在 Grow-and-Refine 過程中，新條目會被追加到上下文中，而已有條目則通過元數據更新（如計數器遞增）進行原地修訂。去重步驟則通過語義嵌入比較條目相似度來消除冗余。該過程可在每次增量更新后主動執行，也可在上下文窗口超限時被動觸發，具體取決于延遲與精度要求。

我們接下來做解讀。

想象一下，我們有一個智能的筆記本（ACE），它能夠隨著時間的推移不斷更新內容，保持信息的新鮮和有用。這個筆記本的工作方式是這樣的：

不斷添加新知識：每當我們學到新東西時，就把它作為新的一頁添加到筆記本中。
更新舊知識：對于筆記本中已有的內容，我們會根據新的經驗來更新它們，比如增加一些注釋或者修改一些信息。
去除重復內容：筆記本還會檢查新舊內容，如果發現有相似的信息，就會合并它們，避免重復。

這個過程可以在我們每次添加新內容后立即進行，也可以等到筆記本快要裝滿時再進行。這取決于我們希望筆記本更新得有多快，以及我們對信息準確性的要求。

通過這樣的方式，我們的筆記本就能始終保持內容的豐富和相關性，同時不會變得過于臃腫。這樣，無論何時我們翻開筆記本，都能快速找到我們需要的信息。

4.2.3 Reflector階段

在DC基礎上加入了Reflector階段的主要原因是：補全學習循環（強調從失敗中學習），增強診斷能力、提高學習質量和預防性學習，利用執行環境提供的反饋信息，不僅僅依賴最終答案的正確性。

Reflector提示詞

ACE Reflector prompt on FINER

You are an expert analyst and educator. Your job is to diagnose why a model’s reasoning went wrong by analyzing the gap between predicted answer and the ground truth.

Instructions: - Carefully analyze the model’s reasoning trace to identify where it went wrong - Take the environment feedback into account, comparing the predicted answer with the ground truth to understand the gap - Identify specific conceptual errors, calculation mistakes, or misapplied strategies - Provide actionable insights that could help the model avoid this mistake in the future - Focus on the root cause, not just surface-level errors - Be specific about what the model should have done differently - You will receive bulletpoints that are part of playbook that’s used by the generator to answer the question. - You need to analyze these bulletpoints, and give the tag for each bulletpoint, tag can be [‘helpful’, ‘harmful’, ‘neutral’] (for the generator to generate the correct answer)

Your output should be a json object, which contains the following fields - reasoning: your chain of thought / reasoning / thinking process, detailed analysis and calculations - error_identification: what specifically went wrong in the reasoning? - root_cause_analysis: why did this error occur? What concept was misunderstood? - correct_approach: what should the model have done instead? - key_insight: what strategy, formula, or principle should be remembered to avoid this error? - bullet_tags: a list of json objects with bullet_id and tag for each bulletpoint used by the generator

Question:
{}
Model’s Reasoning Trace:
{}
Model’s Predicted Answer:
{}
Ground Truth Answer:
{}
Environment Feedback:
{}
Part of Playbook that’s used by the generator to answer the question:
{}
Answer in this exact JSON format:
{
"reasoning": "[Your chain of thought / reasoning / thinking process, detailed analysis and calculations]",
"error_identification": "[What specifically went wrong in the reasoning?]",
"root_cause_analysis": "[Why did this error occur? What concept was misunderstood?]",
"correct_approach": "[What should the model have done instead?]",
"key_insight": "[What strategy, formula, or principle should be remembered to avoid this error?]",
"bullet_tags": [
{{"id": "calc-00001", "tag": "helpful"}},
{{"id": "fin-00002", "tag": "harmful"}}
]
}

翻譯為中文如下：

您是一位專家分析師和教育者。您的工作是通過分析預測答案與真實答案之間的差距，診斷模型推理出錯的原因。

指示：

- 仔細分析模型的推理過程，找出錯誤所在
- 考慮環境反饋，將預測答案與真實答案進行比較，以了解差距
- 識別具體的概念錯誤、計算錯誤或策略誤用
- 提供可操作的見解，幫助模型在未來避免此類錯誤
- 專注于根本原因，而不僅僅是表面錯誤
- 明確指出模型應該采取的不同做法
- 您將收到作為劇本一部分的要點，這些要點被生成器用來回答問題
- 您需要分析這些要點，并為每個要點給出標簽，標簽可以是['helpful', 'harmful', 'neutral']（以便生成器生成正確答案）

您的輸出應該是一個包含以下字段的JSON對象：

- reasoning：您的思維鏈/推理/思考過程，詳細分析和計算
- error_identification：推理中具體出了什么問題？
- root_cause_analysis：為什么會出現這個錯誤？有什么概念被誤解了？
- correct_approach：模型應該采取什么措施？
- key_insight：應該記住什么策略、公式或原則以避免這個錯誤？
- bullet_tags：一個JSON對象列表，每個生成器使用的要點都有bullet_id和標簽

問題： {} 模型的推理過程： {} 模型的預測答案： {} 真實答案： {} 環境反饋： {} 生成器回答問題時使用的劇本的一部分： {}

以這種確切的JSON格式回答： { "reasoning": "[您的思維鏈/推理/思考過程，詳細分析和計算]", "error_identification": "[推理中具體出了什么問題？]", "root_cause_analysis": "[為什么會出現這個錯誤？有什么概念被誤解了？]", "correct_approach": "[模型應該采取什么措施？]", "key_insight": "[應該記住什么策略、公式或原則以避免這個錯誤？]", "bullet_tags": [ {"id": "calc-00001", "tag": "helpful"}, {"id": "fin-00002", "tag": "harmful"} ] }

深入分析原因

錯誤診斷與根本原因分析的需求

DC的局限性
- 主要關注從成功解決方案中提取和累積知識。
- 強調保留”正確、有用和具有說明性的解決方案和策略“。
- 缺乏對失敗案例的系統性分析機制。
ACE Reflector的作用
- 專門設計用于診斷執行軌跡中的問題。
- 要求”識別哪里出錯了“和”識別根本原因“
- 通過對比預測答案和實際情況來理解差距。
- 明確要求識別”概念錯誤、計算錯誤或錯誤應用的策略“
- 通過系統性分析提供可操作的見解。

環境反饋的充分利用

DC的局限性
- 主要用于基于模型輸出和備忘錄內容進行迭代優化。
- 缺乏對執行環境反饋的直接利用。
ACE Reflector的作用
- 直接利用執行反饋、API使用情況、單元測試報告和實際情況。
- 能夠從實際執行結果中提取教訓，而不是從理論上正確的答案中學習。
- 明確要求分析“錯誤的信息來源、不良過濾器、格式問題或缺少身份驗證”

更準確的知識提取

DC的局限性
- 通過備忘錄模板從模型答案中提取知識，但是可能會遺漏一些細微但是重要的錯誤模式。
ACE Reflector的作用
- 通過專家級分析提供更深入的洞察。
- 可以識別細微但是重要的問題。
- 提供具體、逐步的糾正措施。

posted @ 2025-10-19 20:31 羅西的思考閱讀(70) 評論(0) 收藏舉報

刷新頁面返回頂部

[Agent] ACE（Agentic Context Engineering）和Dynamic Cheatsheet學習筆記