TEL301

Five Essentials of Agentic AI

Framework for designing, building, evaluating, and improving agentic AI systems based on five essential components: Agentic Harness, Unit of Work, Workflows, Memory, and Skills. Use this skill whenever someone is architecting an agentic system, evaluating whether an agent implementation is production-ready, designing an agentic development workflow, reviewing or auditing an existing agent's architecture, building a product that uses AI agents to do real work, or discussing what makes an agent different from a chatbot. Also trigger when users mention terms like 'agentic loop', 'agent architecture', 'AI agent design', 'agentic development', 'agent framework', or ask questions like 'what do I need to build an agent' or 'how do I make my agent production-ready'. This skill applies to both building agents AND doing development with agents — the same five essentials appear in both contexts.


This framework defines the five components required to build agentic AI systems that ship real work. Each essential builds on the previous one — together they form a complete system. The core insight is: the model is not the agent — the system is.

These essentials apply in two contexts:

  1. Building agents — designing systems where AI agents do work autonomously
  2. Doing development with agents — using agentic AI as a development tool (e.g. Claude Code)

Both contexts require the same five components. When advising on agentic systems, evaluate all five and identify which are missing or underdeveloped.


The Five Essentials

1. Agentic Harness

The runtime that manages the agentic loop, tool access, caching, and context compaction. This is the most commonly underestimated component — people focus on the model when they should focus on the harness.

What it does:

  • Runs the agentic loop: Plan → Act → Observe → Repeat
  • Controls which tools the agent can access and how they're invoked
  • Manages context window limits through caching and compaction
  • Handles error recovery and retry logic

Key design questions:

  • How does the harness decide when to stop the loop?
  • What happens when a tool call fails?
  • How is the context window managed as work accumulates?
  • How are tool results cached to avoid redundant work?

Reference example: Claude Code is an agentic harness for software development. It manages tool access (file edit, bash, web search), handles context compaction automatically, runs the plan-act-observe loop, and integrates MCP servers for extensibility.

Red flags when this is missing or weak: The agent loses context mid-task, hits token limits unexpectedly, makes redundant tool calls, or gets stuck in loops without termination.


2. Unit of Work

The container that gives the agent scope, persistence, and the ability to finish real work. This is what separates a chatbot from a system that actually completes tasks.

Spectrum of complexity:

SimpleProduction
Chat sessionTicket / Job / Task
Starts and ends with conversationCan span hours or days
Context is temporaryState is persistent
Work is ephemeralWork is resumable, trackable, completable

Key design questions:

  • What defines the boundaries of one unit of work?
  • How does the agent know when the work is "done"?
  • Can work be paused and resumed?
  • How is progress tracked and reported?
  • What happens if the agent fails mid-unit?

The progression: Most teams start with chat sessions, but production systems need a more durable container. A ticket in a project management system, a job in a queue, or a task in a workflow engine — these give the agent something concrete to work against and a clear definition of done.

Red flags when this is missing or weak: The agent can't do work that takes more than one session, there's no way to track what the agent has done, work gets lost when sessions end, or there's no concept of "completion."


3. Workflows & Commands

Predefined patterns that kick off the agentic loop with the right context. These are the playbooks that make agents repeatable and reliable rather than hoping the LLM figures out what to do from a vague prompt.

The three-step pattern:

  1. Trigger — A command, event, or scheduled action initiates the workflow
  2. Context — The workflow loads relevant data, history, and constraints
  3. Execute — The agentic loop runs with clear goals and boundaries

Key design questions:

  • What are the most common actions the agent needs to perform?
  • What context does each workflow need to load before starting?
  • How are workflows parameterised for different inputs?
  • Can users create their own workflows or only use predefined ones?
  • How do workflows compose (one workflow calling another)?

The insight: Without workflows, you're hoping the LLM figures out what to do from a vague prompt. Workflows encode your team's best practices into repeatable automation. They're the difference between "ask the AI to help" and "run this process."

Red flags when this is missing or weak: Every interaction starts from scratch, users get inconsistent results for the same type of task, there's no way to standardise common operations, or the agent requires extensive prompting to do routine work.


4. Memory

Not just "remember stuff" — memory must be self-learning, self-managing, and properly scoped. This is what compounds the agent's value over time. Agents without memory start from zero every time.

Three required properties:

Self-Learning — The memory system automatically updates from work the agent does. Every completed task, every correction, every observed pattern feeds back into what the agent knows. This shouldn't require manual curation — the system should learn from its own work.

Self-Managing — Memory needs to prune, prioritise, and organise itself. As the volume of remembered information grows, the system must decide what's still relevant, what can be compressed, and what should be forgotten. Unbounded memory becomes noise.

Properly Scoped — Different contexts need different memories. A well-designed memory system distinguishes between:

  • Personal — individual user preferences and history
  • Project — context specific to a piece of work
  • Organisation — shared knowledge across the team or company
  • Global — general knowledge applicable everywhere

Key design questions:

  • How does the agent learn from completed work without explicit instruction?
  • What triggers memory consolidation and cleanup?
  • How are scope boundaries enforced (preventing project A's context from leaking into project B)?
  • What's the retrieval strategy (vector search, structured lookup, hybrid)?
  • How do you handle contradictions between old and new information?

Red flags when this is missing or weak: The agent asks the same questions repeatedly, doesn't improve at recurring tasks, bleeds context between unrelated projects, or requires users to manually maintain its knowledge base.


5. Skills

Reusable, testable capabilities the agent draws on, with a built-in feedback loop for continuous improvement. This is where organisational knowledge gets encoded and where the system gets better with use rather than staying static.

Skill properties:

  • System-wide — shared across the organisation, not locked to one user or project
  • Versioned — track changes and roll back when a skill regresses
  • Testable — validate against known scenarios before deploying
  • Composable — combine skills for complex tasks (a "write report" skill might use a "research" skill and a "format document" skill)
  • Self-improving — feedback from usage drives refinement

The continuous improvement loop:

  1. Deploy — Ship the skill into the system
  2. Observe — Monitor how the agent uses it and what outcomes it produces
  3. Evaluate — Measure quality against success criteria
  4. Refine — Update the skill based on what was learned
  5. Return to step 1

Key design questions:

  • How are skills discovered and loaded by the agent?
  • What's the mechanism for skill authors to publish and share?
  • How do you measure whether a skill is working well?
  • What prevents skill bloat (too many skills degrading selection quality)?
  • How do skills handle edge cases they weren't designed for?

The analogy: Think of skills like packages in a package manager (npm, pip), but at the knowledge layer rather than the code layer. They're the reusable units that encode "how to do X well" and improve over time.

Red flags when this is missing or weak: The agent is equally mediocre at everything, there's no way to encode domain expertise, improvements aren't captured for reuse, or the system doesn't get better with use.


How They Fit Together

The five essentials form a stack, each building on the one below:

┌─────────────────────────────────┐
│  Skills — Give it capability    │
├─────────────────────────────────┤
│  Memory — Give it context       │
├─────────────────────────────────┤
│  Workflows — Tell it what to do │
├─────────────────────────────────┤
│  Unit of Work — Give it scope   │
├─────────────────────────────────┤
│  Harness — Run the loop         │
└─────────────────────────────────┘

The harness runs the loop. The unit of work gives it boundaries. Workflows tell it what to do. Memory gives it context. Skills give it capability.


Using This Framework

For Architecture Reviews

When reviewing an agentic system, evaluate each essential on a maturity scale:

  • Missing — Not present at all
  • Ad hoc — Present but informal, inconsistent, or manual
  • Defined — Deliberately designed with clear interfaces
  • Managed — Monitored, measured, and actively maintained
  • Optimising — Self-improving with feedback loops

A system doesn't need all five at "Optimising" to be useful, but any essential at "Missing" is a significant gap. Start by getting everything to "Defined" — that's where most of the value unlocks.

For New Projects

Start with the harness and unit of work — these are the foundation. You can build a useful system with just these two (many chat-based agents operate here). Add workflows when you find yourself repeatedly setting up the same context. Add memory when you notice the agent re-learning things it should already know. Add skills when you have domain expertise worth encoding for reuse.

For Evaluating Tools and Platforms

When evaluating agentic AI tools or platforms, use the five essentials as a checklist. Most tools are strong on the harness and weak on everything else. The differentiation happens in how well they handle units of work, workflows, memory, and skills.

See references/evaluation-checklist.md for a detailed evaluation template.

TEL301 — Five Essentials of Agentic AI | Telos - Public | Skillbook