← Back to Blog
    June 10, 20265 min readEngineering
    Article

    Agentic Memory : Context That Persists

    A field guide to seven agentic memory architectures—buffers, sliding windows, summaries, knowledge graphs, episodic, semantic, and procedural memory.

    Neelam Pawar
    Neelam Pawar
    Engineer
    Agentic Memory : Context That Persists

    A large language model, on its own, is stateless. Each call is a clean slate: it sees only the text you hand it in that exact moment and nothing else. Ask it a follow-up question and it has no idea what "it" refers to. Tell it your name twice and it greets you like a stranger both times.

    Memory is the layer we build around the model to fix this — the machinery that decides what to carry forward, in what shape, and for how long. Get it right and the agent feels like it knows you. Get it wrong and it either forgets the thread or drowns in its own history, slow and expensive. Concretely, a good memory system lets an agent:

    • Maintain context — track the flow of a conversation so you never repeat yourself.
    • Personalize — recall details you shared earlier, like your name, tone, or constraints.
    • Execute multi-step tasks — build on the output of previous steps instead of restarting.
    • Learn over time — accumulate facts and outcomes across runs to improve decisions and avoid repeating mistakes.
    Screenshot 2026-06-10 at 11.25.33 PM

    None of this is free. Every token of remembered context costs money and latency, and every model has a hard ceiling on how much it can read at once. The six methods below are really six different answers to one question: what do we keep, and what do we let go?

    Borrowed from biology

    How Human Memory works

    Engineers didn't invent these patterns from scratch. The way humans form memories — capture, consolidate, reconstruct — maps almost one-to-one onto how agents manage theirs.

    Picture a surprise birthday party. Your eyes catch the candles, your ears the singing, your tongue the chocolate — and within a year, the smell of a bakery can pull the whole scene back. That single experience moves through three distinct stages, and each one has a direct analogue in agent design.

    Screenshot 2026-06-10 at 11.02.20 PM

    Two details matter for what follows. First, retrieval is reconstructive — the brain pulls fragments from different stores and stitches them together, which is exactly what a knowledge graph or vector search does. Second, recalling a memory makes it unstable again: you re-save it with whatever you were thinking at the time, subtly rewritten. Agentic systems call this reconsolidation, and the better ones embrace it — deduplicating and updating facts rather than blindly stacking them up.

    The Agent Memory toolkit

    Seven ways Agent can use to remember

    From the dead-simple to the genuinely clever. Most production agents combine several of these — a recent-turn buffer for immediacy, plus a long-term store for everything that should outlive the session.

    1 Conversation Buffer

    Keep everything, replay everything

    A court stenographer who records every word, and before each ruling reads the entire transcript aloud from page one.

    The simplest possible strategy: store every message in a list, and re-send the whole list to the model on every call. Nothing is lost, nothing is summarized. The agent always sees the conversation in full, verbatim.

    Screenshot 2026-06-10 at 11.05.40 PM
    Screenshot 2026-06-10 at 11.07.15 PM

    2 Sliding Window

    Only the last few turns

    Screenshot 2026-06-10 at 11.08.05 PM

    The buffer's problem is that it grows without bound. The sliding window fixes that with one rule: keep only the last k interactions. When a new message arrives and the window is full, the oldest one is evicted. Inference stays inside fixed token limits, and costs become predictable.

    Screenshot 2026-06-10 at 11.09.55 PM
    Screenshot 2026-06-10 at 11.11.06 PM

    3 Summary Memory

    Compress the past into a running gist

    Screenshot 2026-06-10 at 11.11.35 PM

    Summary memory keeps the sliding window's recent turns verbatim, but instead of throwing old turns away, it folds them into a running, LLM-written summary. The agent keeps a high-level grasp of the whole conversation while token usage stays bounded and predictable.

    Screenshot 2026-06-10 at 11.12.51 PM
    Screenshot 2026-06-10 at 11.13.35 PM

    04 Knowledge Graph

    Remember relationships, not text

    Screenshot 2026-06-10 at 11.16.33 PM

    Rather than storing conversation as text, a knowledge graph extracts (subject, predicate, object) triples from each turn and weaves them into a directed graph. At query time the agent matches entities in your question and pulls a one-to-two-hop neighbourhood — structured context that grounds the answer in established relationships.

    Screenshot 2026-06-10 at 11.17.06 PM

    A lightweight in-memory graph (e.g. NetworkX) covers a lot of ground. At scale, reach for temporal graph frameworks like Graphiti or a dedicated graph database — Neo4j, FalkorDB, Memgraph — wired into your LLM workflow.

    Screenshot 2026-06-10 at 11.28.55 PM
    Screenshot 2026-06-10 at 11.18.07 PM

    05 Episodic Memory

    Recall whole sessions, later

    Screenshot 2026-06-10 at 11.19.29 PM

    Everything so far lives inside a single conversation. Episodic memory crosses the session boundary. When a session ends, the agent distills it into one indexed episode — a timestamp, the core topic, and the outcome — and stores it in a long-term engine. In a future session, a semantic query pulls the relevant episodes back.

    Screenshot 2026-06-10 at 11.32.38 PM
    Screenshot 2026-06-10 at 11.20.13 PM

    06 Semantic Memory

    Distil durable facts from the noise

    Screenshot 2026-06-10 at 11.21.53 PM

    If episodic memory is the dashcam, semantic memory is the CRM compiler. It strips away conversational fluff, isolates the persistent truths, and refines a central knowledge base. Two sentences that mean the same thing collapse to one fact — and duplicates are quietly deduplicated.

    Screenshot 2026-06-10 at 11.34.42 PM
    Screenshot 2026-06-10 at 11.22.48 PM

    07 Procedural Memory

    Procedural Memory is the agent memory technique that gives an LLM agent this same learn-by-doing ability. It captures reusable workflows (step-by-step action sequences) and stores them in a skill library. When a similar task appears, the agent retrieves the proven procedure instead of reasoning from scratch. In short, this is similar to how humans use their procedural memory to handle unconscious skills like riding a bike.

    Screenshot 2026-06-10 at 11.36.34 PM
    Screenshot 2026-06-10 at 11.42.14 PM

    Putting it together

    Choosing a strategy

    There's no single winner — the right answer is usually a stack. The axis that matters most: does this information need to outlive the session?

    Screenshot 2026-06-10 at 11.42.56 PM
    Screenshot 2026-06-10 at 11.37.09 PM

    Pick the lightest method that preserves what your agent actually needs to remember. Reach for the heavier machinery only when continuity and structure genuinely earn their cost.

    Related reading

    View all →

    More insights await

    Explore our latest articles on AI evaluation, LLM optimization, and engineering best practices.

    Read more articles →