Enterprise AI briefing

I Built an AI Research Agent. Here's the Unfiltered Account.

What the tutorials don't show you — including the part where the AI confidently led me in circles, and why that's actually the most important lesson.

April 19, 20268 min readOriginal

Subtitle: What the tutorials don't show you — including the part where the AI confidently led me in circles, and why that's actually the most important lesson.

Subtitle: What the tutorials don't show you — including the part where the AI confidently led me in circles, and why that's actually the most important lesson.

Introduction

I built an AI research agent this week. It accepts a topic, autonomously searches the web using Tavily, synthesizes findings into a structured research brief, and saves it directly to a Notion database — ready to feed into my Cup of Wit content pipeline.

It works. The path to "it works" took one full day, more than fifteen distinct errors, and several moments where I had to override the AI that was supposed to be helping me build it.

I want to be honest about that last part specifically. Because the narrative around AI-assisted building tends to skip it.

The ratio — build two, debug eight

The day broke down like this: roughly 20% building, 80% debugging.

This is the same ratio I reported from my first automation build. I'm not reporting it again because nothing changed. I'm reporting it because it's structural. It's not a beginner's tax that disappears with experience. It's the nature of integrating systems that each have their own opinions about data formats, authentication patterns, and node versions.

If you're commissioning this kind of work from a developer or agency: this ratio is what competent, experienced builders experience. Budget for it. Don't treat the debugging hours as waste or inefficiency. They are the work.

What I built

The workflow is architecturally simple:

A chat trigger receives a research topic
An AI Agent node uses Claude Sonnet to decide what to do
The agent calls a Tavily search tool to find current sources
The agent synthesises findings into a structured brief
A Code node processes the output into clean fields
A Notion node saves the brief as a new database page

Six nodes. One-direction pipeline with an agentic core. The agent decides how many times to search, which results to use, and how to structure the output. That's the difference between this and my first pipeline — the AI isn't just executing instructions, it's making decisions.

I expected it to take three hours. It took a full day.

The part the tutorials don't show you: AI going in circles

Here is the most useful thing I can tell you from this build, and I haven't seen it written honestly anywhere.

The AI helping me build this — the same model powering the agent I was building — went in circles.

Not because it was broken. Because it was doing what language models do: generating plausible next steps based on pattern recognition, without a reliable mechanism for identifying root causes. When the Tavily search tool kept returning 400 errors, I received eight different suggested fixes across as many iterations. Change the body format. Switch to key-value pairs. Use the JSON body instead. Add a placeholder definition. Try the URL field instead. Delete and recreate the node. Each suggestion was individually plausible. None diagnosed the actual problem.

The actual problem was simple: I was using a generic HTTP Request node for a tool that had a native n8n integration. The native Tavily node handled authentication and query passing correctly out of the box. We arrived there after an hour of iteration that could have been resolved in five minutes with a different first question.

I led that debugging. Not the AI.

For business leaders: This is not an argument against AI assistance. It's an argument for understanding what AI assistance is. Language models are excellent at generating options. They are unreliable at identifying root causes in complex, stateful systems. The human role in AI-assisted building is not to follow instructions. It's to maintain a mental model of the system, notice when the suggested fixes aren't converging, and ask a different question. That skill — knowing when to override the AI — is not technical. It's judgment. And it cannot be automated.

Lesson 1: Human-in-the-loop is not just a safety feature — it's a debugging requirement

Every piece of writing about AI agents discusses human-in-the-loop as a governance concept. A way to catch harmful outputs before they reach the world.

That framing is correct but incomplete.

Human oversight is also what prevents a debugging session from becoming a loop. When an AI assistant is generating successive plausible-but-wrong fixes, the human in the loop is the only party capable of recognising the pattern and breaking it. Not because the human is smarter. Because the human has continuity of context across the session in a way the model doesn't.

I noticed the pattern. I changed the question. We found the answer.

Design your AI workflows with meaningful human checkpoints — not just to catch bad outputs, but to catch misdirected effort before it compounds.

Lesson 2: Native nodes exist for a reason

One class of errors consumed more time than any other: trying to configure a generic HTTP Request node to behave like a purpose-built integration.

The $fromAI() expression syntax, the placeholder definition fields, the body format switching — all of these were attempts to make a generic tool do something that a specific tool does natively. The Tavily node. The native Notion node with Database Page resource. The difference between the generic and the native version of each was the difference between an hour of configuration and five minutes.

Check for native nodes before building custom HTTP calls. The n8n community has built integrations for most common services. The time saved is significant.

For business leaders: In organisations, this principle appears as: don't build internal tooling for problems that vendors have already solved. The custom solution feels more controllable. It usually costs three times as much to build and ten times as much to maintain.

Lesson 3: Agent memory pollutes across sessions

The AI Agent node uses Window Buffer Memory — it remembers the last N exchanges so it can respond to follow-up instructions without losing context.

This is genuinely useful. It also caused one of the more confusing failure modes of the day.

After several failed test runs where the agent returned apology messages about broken search tools, the memory stored those failures as context. When I fixed the tools and ran again, the agent read its own history of failure, concluded it was still in a broken environment, and apologized again — before even attempting a search.

The workflow was healthy. The agent's memory was poisoned.

The fix was a fresh session — one click on the session refresh button in the chat panel. But I only knew to look there because I understood what memory was doing.

Design principle: In any agentic workflow, memory is both an asset and a liability. It improves multi-turn coherence. It propagates failure state. Build in a mechanism to reset it cleanly between test runs.

Lesson 4: The Notion API cares about the difference between a page and a database

This one cost me more time than I'd like to admit.

The Notion API has two distinct parent types for page creation: page_id and database_id. If you send a page_id pointing at a database, Notion returns a 404. Not a type error. Not a helpful message about the mismatch. A 404.

The n8n Notion node's "Page" resource always sends page_id. The "Database Page" resource sends database_id. One word of difference in the Resource dropdown. The difference between the workflow working and the workflow silently failing.

Every API has opinions like this. Finding them is the debugging work. There is no shortcut except building more things until your library of known gotchas grows large enough to recognize the pattern faster.

Lesson 5: Agents need constraints, not just capabilities

By default, the AI Agent node runs up to ten tool call iterations. On one test run, the agent called Tavily ten times before stopping — burning through a significant portion of my free monthly quota and producing no useful output.

Capability without constraint is expensive. The agent had the ability to search indefinitely. It exercised that ability.

The fix was a system prompt instruction — a hard rule telling the agent to search a maximum of twice per run and to move on if a tool fails rather than retrying. This is not a technical constraint. It's a behavioural one. A directive the agent follows because it was told to.

When you build agents, the design work is not just which tools to give them. It's which boundaries to set. An agent with five well-constrained tools is more useful and more predictable than an agent with ten unconstrained ones.

What I'd do differently

Use native nodes first, always. Before building a custom HTTP Request for any service, spend two minutes checking whether a native node exists.

Reset memory between test runs. Fresh session before every meaningful test.

Set agent constraints before first run. Max iterations. Max searches. A rule about not retrying failed tools.

Isolate each node before connecting them. Test in isolation before connecting.

When the AI is going in circles, change the question. Not the answer — the question. Eight iterations of the same fix category is a signal that the root cause hasn't been identified.

The bottom line

The agent works. A topic goes in. A structured research brief comes out. It lands in Notion, inside my Cup of Wit content workspace, ready to use.

The honest account of getting there includes: fifteen-plus errors, one AI-assisted debugging loop that I had to recognize and break myself, a native node that solved in five minutes what I'd spent an hour trying to configure manually, and a memory reset that was one click but took too long to identify as the problem.

None of this makes the outcome less valuable. It makes it more honest.

AI builds are not smooth. They are iterative, sometimes misdirected, occasionally circular, and ultimately productive when the human in the loop maintains their own mental model and exercises judgment about when to follow the AI and when to override it.

That judgment is the skill worth building. Everything else is tooling.

Cup of Wit is a newsletter about AI strategy and business architecture — for leaders who want to think clearly about AI without the buzzwords.