A large language model, on its own, is stateless. Each call is a clean slate: it sees only the text you hand it in that exact moment and nothing else. Ask it a follow-up question and it has no idea what "it" refers to. Tell it your name twice and it greets you like a stranger both times.
Memory is the layer we build around the model to fix this — the machinery that decides what to carry forward, in what shape, and for how long. Get it right and the agent feels like it knows you. Get it wrong and it either forgets the thread or drowns in its own history, slow and expensive. Concretely, a good memory system lets an agent:
- Maintain context — track the flow of a conversation so you never repeat yourself.
- Personalize — recall details you shared earlier, like your name, tone, or constraints.
- Execute multi-step tasks — build on the output of previous steps instead of restarting.
- Learn over time — accumulate facts and outcomes across runs to improve decisions and avoid repeating mistakes.

None of this is free. Every token of remembered context costs money and latency, and every model has a hard ceiling on how much it can read at once. The six methods below are really six different answers to one question: what do we keep, and what do we let go?
Borrowed from biology
How Human Memory works
Engineers didn't invent these patterns from scratch. The way humans form memories — capture, consolidate, reconstruct — maps almost one-to-one onto how agents manage theirs.
Picture a surprise birthday party. Your eyes catch the candles, your ears the singing, your tongue the chocolate — and within a year, the smell of a bakery can pull the whole scene back. That single experience moves through three distinct stages, and each one has a direct analogue in agent design.

Two details matter for what follows. First, retrieval is reconstructive — the brain pulls fragments from different stores and stitches them together, which is exactly what a knowledge graph or vector search does. Second, recalling a memory makes it unstable again: you re-save it with whatever you were thinking at the time, subtly rewritten. Agentic systems call this reconsolidation, and the better ones embrace it — deduplicating and updating facts rather than blindly stacking them up.
The Agent Memory toolkit
Seven ways Agent can use to remember
From the dead-simple to the genuinely clever. Most production agents combine several of these — a recent-turn buffer for immediacy, plus a long-term store for everything that should outlive the session.
1 Conversation Buffer
Keep everything, replay everything
A court stenographer who records every word, and before each ruling reads the entire transcript aloud from page one.
The simplest possible strategy: store every message in a list, and re-send the whole list to the model on every call. Nothing is lost, nothing is summarized. The agent always sees the conversation in full, verbatim.


2 Sliding Window
Only the last few turns

The buffer's problem is that it grows without bound. The sliding window fixes that with one rule: keep only the last k interactions. When a new message arrives and the window is full, the oldest one is evicted. Inference stays inside fixed token limits, and costs become predictable.


3 Summary Memory
Compress the past into a running gist

Summary memory keeps the sliding window's recent turns verbatim, but instead of throwing old turns away, it folds them into a running, LLM-written summary. The agent keeps a high-level grasp of the whole conversation while token usage stays bounded and predictable.


04 Knowledge Graph
Remember relationships, not text

Rather than storing conversation as text, a knowledge graph extracts (subject, predicate, object) triples from each turn and weaves them into a directed graph. At query time the agent matches entities in your question and pulls a one-to-two-hop neighbourhood — structured context that grounds the answer in established relationships.

A lightweight in-memory graph (e.g. NetworkX) covers a lot of ground. At scale, reach for temporal graph frameworks like Graphiti or a dedicated graph database — Neo4j, FalkorDB, Memgraph — wired into your LLM workflow.


05 Episodic Memory
Recall whole sessions, later

Everything so far lives inside a single conversation. Episodic memory crosses the session boundary. When a session ends, the agent distills it into one indexed episode — a timestamp, the core topic, and the outcome — and stores it in a long-term engine. In a future session, a semantic query pulls the relevant episodes back.


06 Semantic Memory
Distil durable facts from the noise

If episodic memory is the dashcam, semantic memory is the CRM compiler. It strips away conversational fluff, isolates the persistent truths, and refines a central knowledge base. Two sentences that mean the same thing collapse to one fact — and duplicates are quietly deduplicated.


07 Procedural Memory
Procedural Memory is the agent memory technique that gives an LLM agent this same learn-by-doing ability. It captures reusable workflows (step-by-step action sequences) and stores them in a skill library. When a similar task appears, the agent retrieves the proven procedure instead of reasoning from scratch. In short, this is similar to how humans use their procedural memory to handle unconscious skills like riding a bike.


Putting it together
Choosing a strategy
There's no single winner — the right answer is usually a stack. The axis that matters most: does this information need to outlive the session?


Pick the lightest method that preserves what your agent actually needs to remember. Reach for the heavier machinery only when continuity and structure genuinely earn their cost.


