Hi,

This post is written by AI as an experiment. I put much effort into guidelines and specified several rounds of improvements, but the result is still not very graspable. It’s not bad, but also not really good.

I see this as a proxy for the current state of agentic coding. Everyone uses it, but if you look at the outcome it is often shallow. If you mention your concerns about quality, responsibility and ownership, you will be always ignored because of the productivity gain.

  • Who owns the quality of code that nobody fully understands?
  • Who is responsible when the generated solution is plausible, but wrong?
  • How do we keep ownership alive when the main argument for adoption is speed?

Normally, the responsibility lies with the engineer. But at the same time, engineers are pressured to follow the bandwagon and adopt these patterns without much critical assessment. There is peer pressure, evaluation pressure, and the fear that if you don’t do it or ask questions, you will be replaced. In the end, you may be left alone with a process you cannot fully control, but still have to maintain.

I am not arguing against AI-assisted development. I use it, and I see the value. But we need to be more intentional about how we adopt it. When speed becomes the only argument, we stop asking the harder questions about quality, responsibility, and understanding. The goal is not to slow down, but to keep the engineer in control.

Comparing Agentic Design Patterns Across Frameworks

Agentic systems are often explained through abstract patterns: one agent critiques another, several agents work in parallel, an agent reflects on failed attempts, or a planner decomposes a larger task into smaller steps. Recent books and guides describe these patterns from different angles, including Albada’s Building Applications with AI Agents, Gulli’s Agentic Design Patterns, and Dibia’s Designing Multi-Agent Systems. These sources are useful conceptually, but the practical behavior of the patterns only becomes visible when they are implemented.

The notebooks in this repository explore five common agentic workflow patterns using comparable implementations across LangChain, Google ADK, and Microsoft Agent Framework. Smolagents was originally on the list, but a compatible release was not available when the notebooks were created, so it is mentioned in each notebook but not implemented. All implementations run against a local Gemma model through a HuggingFace pipeline. The goal is not to benchmark model quality, but to observe how each framework handles orchestration, state, structured outputs, tool use, and debugging.

The examples are intentionally small, but each one stresses a different aspect of agent design. Together, they provide a compact overview of the engineering questions that appear when moving from a simple LLM call to a multi-step agentic workflow.

Simple Multi-Agent System

The first implementation uses a simple poet-and-critic workflow. One agent generates a haiku, while another evaluates whether the result should be accepted. This is a minimal example of a multi-agent system: the value does not come from complex tooling, but from separating generation and evaluation into two roles.

This pattern is useful because it shows the basic mechanics of agent handoff. LangChain, Google ADK, and Microsoft Agent Framework can all express the workflow, but they expose different levels of structure. LangChain is close to manual chain composition, while Google ADK and Microsoft Agent Framework introduce more explicit lifecycle and agent concepts.

Parallelization

The second implementation explores parallelization. A research request is split into several independent subtasks: summary generation, follow-up questions, and keyword extraction. A later synthesis step combines the partial outputs.

This pattern is useful when subtasks can be handled independently before being merged. In this local setup the HuggingFace pipeline does not support concurrent calls, so the “parallel” branch actually runs sequentially, but the design still captures the intended workflow. The comparison shows how frameworks differ in their handling of concurrent agents, intermediate outputs, and synthesis steps.

The main engineering question is not whether the pattern works, but how inspectable and controllable the intermediate results are.

Reflexion

The reflexion example introduces an iterative loop. An agent writes code, the code is executed against tests, and if it fails, another step reflects on the error and updates the memory for the next attempt. The concrete task is LeetCode 224 (Basic Calculator), chosen because the executor gives real pass/fail feedback rather than an LLM opinion. Finding a suitable task was not trivial: the model had memorized large parts of the public LeetCode set, so a problem unfamiliar enough to actually exercise the reflexion loop took some searching.

This pattern highlights the importance of state management. Reflexion is not only a prompting technique; it requires a loop, an execution tool, an early-exit condition, and memory that grows across attempts. Google ADK maps relatively naturally to this structure because state is part of the agent/session lifecycle. LangChain and Microsoft Agent Framework can implement the same logic, but more of the loop and memory handling remains in plain Python.

The example also shows a limitation: adding more reflection does not automatically improve results. If the prompt accumulates too much prior code and error history, the model may become anchored to earlier failed attempts. For smaller models especially, compact state may be more useful than complete state.

Plan and Execute

The plan-and-execute notebook is closer to a realistic data-agent workflow. The task is to clean a dirty cafe sales dataset and determine the best reliable item/location pair using deterministic business rules.

This pattern decomposes the work into planning, routing, specialist execution, verification, possible replanning, and final synthesis. The specialists themselves are deterministic pandas functions (profiling, cleaning, repair, metric calculation, validation); the LLM only chooses which one to run next. A final-artifact gate then checks that the required outputs exist before the synthesis step is allowed to produce an answer. The model should not invent calculations; it should coordinate steps, call reliable functions, and produce a validated final response.

Across the frameworks, the final result can be made comparable only when the output schema and evaluation rule are explicit. This is an important lesson: as agentic workflows become more complex, evaluation becomes harder. A wrong answer may come from planning, routing, tool execution, verification, synthesis, or the handoff between steps.

ReAct

The ReAct implementation uses a loop where the model alternates between choosing a tool, observing the result, and deciding the next action. The task is based on a seller eligibility decision for a fictional logistics program using the Olist e-commerce dataset.

This example is useful because it exposes the cost of agentic flexibility. The agent must read the policy, calculate metrics, inspect evidence, and return a decision. However, the same decision could also be computed directly with deterministic pandas logic. In this case, ReAct is more valuable as an evaluation environment than as a production solution.

The implementation suggests that ReAct is most appropriate when the next useful action genuinely depends on previous observations and cannot be reduced to a fixed deterministic pipeline. Otherwise, the agent loop may add orchestration code, parsing logic, retries, validation, and probabilistic failure without adding much practical value. In practice, most of the logic in the ReAct example does not live inside the framework at all but in the surrounding controller: response parsing, retries, tool-argument repair, a finish-gate that checks whether required evidence has been collected, and a deterministic fallback final answer. The framework provides the loop; the controller is what keeps it reliable.

Conclusion

These five implementations show that agentic design patterns are useful, but they are not automatically useful in production. Their value depends on the task structure.

Simple multi-agent and parallel workflows are good entry points because they make role separation and synthesis easy to see. Reflexion emphasizes state and memory design. Plan-and-execute shows why deterministic tools, schemas, and validation are central for data workflows. ReAct demonstrates both the power and the cost of dynamic tool use.

The framework differences are real, but they are often secondary to the workflow design itself. The most important questions are: Can intermediate steps be inspected? Is state handled explicitly? Are tool outputs deterministic? Is the final answer validated? And does the task actually need an agentic workflow?

In practice, agent frameworks should be evaluated less by whether they can express a pattern, and more by how well they support debugging, evaluation, structured outputs, state management, and recovery from failure.

Thank you for your attention.