folded-memory-implementation

为长周期 Agent 构建分层记忆的开发者指南：不是用单一摘要替换历史，而是维护三个不同细节程度的记忆层。旧内容被更激进地压缩——而不是丢弃。每层独立存储，按需召回。

前置条件：先阅读 compact-memory-implementation。折叠记忆在其 fork-agent 和触发概念基础上扩展。

The core idea

L1 Working memory    [ turn 38..50 ] — raw turns, full detail, short window
L2 Episodic memory   [ turn 10..37 ] — compressed episodes, medium detail
L3 Semantic memory   [ turn 1..9  ] — abstract facts and decisions, sparse

When L1 fills up → fork an episode compactor → move oldest L1 turns into L2. When L2 fills up → fork a semantic extractor → distill L2 into L3.

At each agent turn, inject the right combination of layers into the system prompt.

Step 1 — Understand the setup

Same questions as compact-memory-implementation, plus:

How long do sessions run? If sessions are short (<50 turns), flat compact is enough.
What kind of information ages badly? Decisions and patterns age well (good for L3). Exact tool outputs age badly (keep only in L1 or summarize into L2).
Does the agent need to cite past reasoning? If yes, L2/L3 must preserve decision rationale, not just conclusions.

Step 2 — Three-layer architecture

Layer 1 — Working memory

Content: raw conversation turns, full fidelity
Window: last N turns (e.g., 20 turns or ~30k tokens)
Trigger to flush: when L1 exceeds its window, oldest turns move to L2
Injected as: full message history in messages[]

Layer 2 — Episodic memory

Content: compressed episode summaries — what happened, what was decided, what was tried
Window: up to M episodes (e.g., 10 episodes, each covering ~20 turns)
Trigger to flush: when episode count exceeds M, oldest episodes distill into L3
Injected as: structured block in system prompt

Layer 3 — Semantic memory

Content: abstract facts, stable decisions, eliminated approaches, domain knowledge learned
Window: unbounded, but aggressively filtered — only durable knowledge
Trigger to flush: never purged, but updated/merged when contradicted
Injected as: compact block in system prompt, always present

Step 3 — Data structures

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Episode:
    episode_id: int
    turn_range: tuple[int, int]       # (start_turn, end_turn)
    summary: str
    decisions: list[dict]             # [{"decision": ..., "reason": ...}]
    eliminated: list[dict]            # [{"approach": ..., "why": ...}]
    open_questions: list[str]
    tool_results: dict[str, Any]      # summarized results worth keeping

@dataclass
class SemanticMemory:
    facts: list[str]                  # stable domain facts
    decisions: list[dict]             # durable decisions (never re-litigate)
    eliminated_approaches: list[dict] # things proven not to work
    patterns: list[str]               # recurring patterns observed

@dataclass
class FoldedMemory:
    semantic: SemanticMemory
    episodes: list[Episode]
    current_episode_id: int = 0
    total_turns_seen: int = 0

Step 4 — Fork agent: L1 → L2 (episode compactor)

Triggered when working memory (L1) exceeds its window. Takes the oldest N turns and compresses them into one Episode.

EPISODE_COMPACTOR_PROMPT = """
You are an episode compactor. Read the provided conversation turns and produce a structured Episode summary.

An Episode captures:
- What was attempted and what happened (not the full dialogue — the outcome)
- Decisions made and WHY (the reasoning behind them, not just the choice)
- Approaches tried and ruled out, with reasons (prevents re-exploration)
- Open questions that were not resolved
- Tool results that future turns will need (summarize, don't dump raw output)

Do NOT include:
- Intermediate back-and-forth that led to a conclusion (keep the conclusion, drop the path)
- Tool outputs that have already been acted on and have no future relevance
- Anything a fresh agent could derive by reading the code or running a command

Output valid JSON:
{
  "summary": "2-3 sentence narrative of what happened in this episode",
  "decisions": [{"decision": "...", "reason": "...", "constraint": "..."}],
  "eliminated": [{"approach": "...", "why": "..."}],
  "open_questions": ["..."],
  "tool_results": {"key": "summarized result"}
}
"""

def compact_l1_to_episode(turns: list[dict], episode_id: int, turn_range: tuple) -> Episode:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system=EPISODE_COMPACTOR_PROMPT,
        messages=[{"role": "user", "content": format_turns(turns)}],
    )
    data = json.loads(response.content[0].text)
    return Episode(
        episode_id=episode_id,
        turn_range=turn_range,
        **data,
    )

Step 5 — Fork agent: L2 → L3 (semantic extractor)

Triggered when the episode count exceeds the L2 window. Takes the oldest episodes and distills durable knowledge into SemanticMemory.

SEMANTIC_EXTRACTOR_PROMPT = """
You are a semantic memory extractor. Read the provided episode summaries and extract only
knowledge that is durable — facts, decisions, and patterns that will still matter many
sessions from now.

Extract:
- Stable domain facts discovered ("the API always returns 200 even on failure — check data.success")
- Decisions that should never be re-litigated ("chose optimistic locking because DB doesn't support SELECT FOR UPDATE")
- Approaches definitively ruled out ("tried polling every 5s — causes rate limiting, don't retry")
- Recurring patterns that should inform future behavior

Do NOT extract:
- Task-specific state that will be resolved (current blockers, in-progress work)
- Results tied to a specific turn or tool call
- Anything that changes frequently

Merge with the existing semantic memory provided — update facts that were contradicted,
remove decisions that are now resolved, add new ones.

Output valid JSON:
{
  "facts": ["..."],
  "decisions": [{"decision": "...", "reason": "...", "constraint": "..."}],
  "eliminated_approaches": [{"approach": "...", "why": "..."}],
  "patterns": ["..."]
}
"""

def distill_episodes_to_semantic(
    episodes: list[Episode],
    existing_semantic: SemanticMemory,
) -> SemanticMemory:
    payload = {
        "existing_semantic": asdict(existing_semantic),
        "episodes_to_distill": [asdict(e) for e in episodes],
    }
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=2048,
        system=SEMANTIC_EXTRACTOR_PROMPT,
        messages=[{"role": "user", "content": json.dumps(payload, indent=2)}],
    )
    data = json.loads(response.content[0].text)
    return SemanticMemory(**data)

Step 6 — When to trigger each layer

L1_MAX_TURNS = 20        # working memory window
L2_MAX_EPISODES = 8      # episodic memory window

def maybe_fold(memory: FoldedMemory, l1_turns: list[dict]) -> tuple[FoldedMemory, list[dict]]:
    """Check thresholds and fold layers if needed. Returns updated memory and remaining L1."""

    # L1 → L2: flush oldest turns when L1 is full
    if len(l1_turns) >= L1_MAX_TURNS:
        flush_count = L1_MAX_TURNS // 2             # flush half, keep recent half
        turns_to_flush = l1_turns[:flush_count]
        turn_range = (
            memory.total_turns_seen - len(l1_turns),
            memory.total_turns_seen - len(l1_turns) + flush_count - 1,
        )
        episode = compact_l1_to_episode(
            turns_to_flush,
            episode_id=memory.current_episode_id,
            turn_range=turn_range,
        )
        memory.episodes.append(episode)
        memory.current_episode_id += 1
        l1_turns = l1_turns[flush_count:]           # keep the recent half

    # L2 → L3: distill oldest episodes when L2 is full
    if len(memory.episodes) >= L2_MAX_EPISODES:
        flush_count = L2_MAX_EPISODES // 2
        episodes_to_distill = memory.episodes[:flush_count]
        memory.semantic = distill_episodes_to_semantic(episodes_to_distill, memory.semantic)
        memory.episodes = memory.episodes[flush_count:]

    return memory, l1_turns

Step 7 — Recall: build system prompt from layers

Inject L3 always. Inject L2 episodes as a digest. L1 goes into the messages[] array directly.

def build_system_with_folded_memory(base_system: str, memory: FoldedMemory) -> str:
    blocks = [base_system]

    # L3 — always present
    if memory.semantic.facts or memory.semantic.decisions:
        blocks.append(_format_semantic(memory.semantic))

    # L2 — episode digest (most recent episodes first)
    if memory.episodes:
        blocks.append(_format_episodes(memory.episodes))

    return "\n\n".join(blocks)

def _format_semantic(s: SemanticMemory) -> str:
    lines = ["## Semantic memory (durable knowledge)"]
    if s.facts:
        lines += ["**Facts**:"] + [f"- {f}" for f in s.facts]
    if s.decisions:
        lines += ["**Decisions**:"] + [f"- {d['decision']} (because {d['reason']})" for d in s.decisions]
    if s.eliminated_approaches:
        lines += ["**Ruled out**:"] + [f"- {e['approach']}: {e['why']}" for e in s.eliminated_approaches]
    if s.patterns:
        lines += ["**Patterns**:"] + [f"- {p}" for p in s.patterns]
    return "\n".join(lines)

def _format_episodes(episodes: list[Episode]) -> str:
    lines = ["## Episode memory (recent history, oldest → newest)"]
    for ep in episodes:
        lines.append(f"\n### Episode {ep.episode_id} (turns {ep.turn_range[0]}–{ep.turn_range[1]})")
        lines.append(ep.summary)
        if ep.decisions:
            lines += ["Decisions:"] + [f"- {d['decision']}" for d in ep.decisions]
        if ep.open_questions:
            lines += ["Open:"] + [f"- {q}" for q in ep.open_questions]
    return "\n".join(lines)

Step 8 — Full agent loop

def run_agent(session_id: str, user_input: str) -> str:
    memory = load_folded_memory(session_id)   # returns empty FoldedMemory if new session
    l1_turns = []

    while True:
        system = build_system_with_folded_memory(BASE_SYSTEM, memory)

        response = client.messages.create(
            model="claude-opus-4-7",
            system=system,
            messages=l1_turns + [{"role": "user", "content": user_input}],
            max_tokens=8192,
        )
        memory.total_turns_seen += 1

        # Fold if needed
        l1_turns.append({"role": "user", "content": user_input})
        l1_turns.append({"role": "assistant", "content": response.content[0].text})
        memory, l1_turns = maybe_fold(memory, l1_turns)

        save_folded_memory(session_id, memory, l1_turns)

        if response.stop_reason == "end_turn":
            return response.content[0].text

        user_input = handle_tool_calls(response)

Step 9 — Persistence

import json, pathlib
from dataclasses import asdict

MEMORY_DIR = pathlib.Path("memory")

def save_folded_memory(session_id: str, memory: FoldedMemory, l1_turns: list[dict]) -> None:
    MEMORY_DIR.mkdir(exist_ok=True)
    (MEMORY_DIR / f"{session_id}_folded.json").write_text(
        json.dumps({"memory": asdict(memory), "l1_turns": l1_turns}, indent=2)
    )

def load_folded_memory(session_id: str) -> tuple[FoldedMemory, list[dict]]:
    path = MEMORY_DIR / f"{session_id}_folded.json"
    if not path.exists():
        return FoldedMemory(semantic=SemanticMemory([], [], [], [])), []
    data = json.loads(path.read_text())
    memory = FoldedMemory(
        semantic=SemanticMemory(**data["memory"]["semantic"]),
        episodes=[Episode(**e) for e in data["memory"]["episodes"]],
        current_episode_id=data["memory"]["current_episode_id"],
        total_turns_seen=data["memory"]["total_turns_seen"],
    )
    return memory, data["l1_turns"]

When to use folded memory vs. flat compact

Situation	Use
Sessions under ~50 turns	`compact-memory-implementation`
Context overflows but information loss is acceptable	`compact-memory-implementation`
Agent runs for hundreds of turns across many sessions	folded memory
Decisions from 10 sessions ago must still be retrievable	folded memory
Domain knowledge accumulates and shouldn't be re-learned	folded memory
Team is just getting started with agent memory	`compact-memory-implementation` first

Common pitfalls

Pitfall	Fix
L3 grows unbounded with low-quality facts	Set a quality bar in the extractor prompt: "only extract if it will matter 10 sessions from now"
Episodes lose tool results needed later	Include summarized results in `Episode.tool_results`
System prompt gets too long with all three layers	Cap L2 to 5 most recent episodes; L3 to 20 items max
Semantic memory contradicts itself across updates	Include prior semantic in extractor input and instruct it to merge, not append
Folding too eagerly (every turn)	L1 window should be at least 15–20 turns

查看完整skill详情

藏在《折叠记忆实现指南》里的Skill

✦技能描述

folded-memory-implementation

The core idea

Step 1 — Understand the setup

Step 2 — Three-layer architecture

Layer 1 — Working memory

Layer 2 — Episodic memory

Layer 3 — Semantic memory

Step 3 — Data structures

Step 4 — Fork agent: L1 → L2 (episode compactor)

Step 5 — Fork agent: L2 → L3 (semantic extractor)

Step 6 — When to trigger each layer

Step 7 — Recall: build system prompt from layers

Step 8 — Full agent loop

Step 9 — Persistence

When to use folded memory vs. flat compact

Common pitfalls

安装使用

相关技能

Compact Memory 实现指南 AI Skill

Agent 记忆实现指南 AI Skill

Compact with Memory AI Skill

Session Dream AI Skill

Query Loop 实现 AI Skill

工具权限系统 AI Skill

Agent 工具构建器 AI Skill

Harness 第 3 步——会话状态管理 AI Skill

Harness 第 1 步——创建 AGENTS.md 和知识库 AI Skill

LangChain 工具构建器 AI Skill

藏在《折叠记忆实现指南》里的Skill

✦技能描述

Skill.md

folded-memory-implementation

The core idea

Step 1 — Understand the setup

Step 2 — Three-layer architecture

Layer 1 — Working memory

Layer 2 — Episodic memory

Layer 3 — Semantic memory

Step 3 — Data structures

Step 4 — Fork agent: L1 → L2 (episode compactor)

Step 5 — Fork agent: L2 → L3 (semantic extractor)

Step 6 — When to trigger each layer

Step 7 — Recall: build system prompt from layers

Step 8 — Full agent loop

Step 9 — Persistence

When to use folded memory vs. flat compact

Common pitfalls

安装使用

安装指令

直接下载

相关技能

Compact Memory 实现指南 AI Skill

Agent 记忆实现指南 AI Skill

Compact with Memory AI Skill

Session Dream AI Skill

Query Loop 实现 AI Skill

工具权限系统 AI Skill

Agent 工具构建器 AI Skill

Harness 第 3 步——会话状态管理 AI Skill

Harness 第 1 步——创建 AGENTS.md 和知识库 AI Skill

LangChain 工具构建器 AI Skill