aluzzardi's comments

aluzzardi · 2026-02-28T04:25:03 1772252703

It started with Sonnet 4.0 as a single agent and now it’s a mix of Opus 4.6 and Haiku 4.5 agents.

Opus plans the investigation and orchestrates the searches.

Haiku is the one actually querying ClickHouse and returning relevant bits

aluzzardi · 2026-02-27T23:07:08 1772233628

> it's not magic and you need to make the job of the agent easier by giving it good instructions, tools, and environments.

This. We had much better success by letting the agent pull context rather trying to push what we thought was relevant.

Turns out it's exactly like a human: if you push the wrong context, it'll influence them to follow the wrong pattern.

verdverm · 2026-02-27T23:57:05 1772236625

I'd put it somewhere in the middle, but closer to the pull end.

- I force the AGENTS.md into the system prompt if the agent reads a directory, or file within, that contains one such file. This is anecdotally very good and saves on function calls and context growth in multiple ways. Sort them. I'm now doing this with planning and long-term task tracking markdown files.

- Everything else is pull, ideally be search, yet to substantially leverage subagents for context gathering. Savings elsewhere have pushed the need out.

btw, hi Al, I see you are working on a new company since our last collaboration, want to catch up sometime and talk shop?

aluzzardi · 2026-02-27T19:33:02 1772220782

There are 2 layers of compression:

- ZSTD (actual data compression)

- De-duplication (i.e. what you're saying)

Although AFAIK it's not "just point to it" but rather storing sorted data and being able to say "the next 2M rows have the same PR Title"

aluzzardi · 2026-02-27T18:48:20 1772218100

Mendral co-founder and post author here.

I agree with your statement and explained in a few other comments how we're doing this.

tldr:

- Something happens that needs investigating

- Main (Opus) agent makes focused plan and spawns sub agents (Haiku)

- They use ClickHouse queries to grab only relevant pieces of logs and return summaries/patterns

This is what you would do manually: you're not going to read through 10 TB of logs when something happens; you make a plan, open a few tabs and start doing narrow, focused searches.

aluzzardi · 2026-02-27T18:11:20 1772215880

> My experience with LLM generated SQL in OLTP and OLAP platforms has been a mixed bag

Models are evolving fast. If your experience is older than a few months, I encourage you to try again.

I mean this with the best intentions: it's seriously mind boggling. We started doing this with Sonnet 4.0 and the relevance was okay at best. Then in September we shifted to Sonnet 4.5 and it's been night and day.

Every single model released since then (Opus 4.5, 4.6) has meaningfully improved the quality of results

whoami4041 · 2026-02-27T18:28:26 1772216906

I totally agree. However, none of them are infallible and never will be. They're nondeterministic by nature. There is an interesting psychological nuance that I've noticed even in myself that comes with AI assistance in coding, and that's the review/approval fatigue. The model could be chugging along happily for hours and make a sudden, terrific error in the 10th hour after you've been staring at reasoning and logs endlessly. The risk of missing the terrific error in that moment is very high at the tail end of the session. The point I was making (poorly) is that in this specific domain, where businesses are making data-driven decisions on output and insights that can determine the trajectory of the entire organization, human involvement is more critical than, say, writing something like a python function with an LLM.

shad42 · 2026-02-27T19:27:06 1772220426

I agree, we automated in the Mendral agent what is time consuming for human (like debugging a flaky test), but it will need permission to confirm the remediation and open a PR.

But it's night and day to fix your CI when someone (in this case an agent) already dug into the logs, the code of the test and propose options to fix. We have several customers asking us to automate the rest (all the way to merge code), but we haven't done it for the reasons you mention. Although I am sure we'll get there sometimes this year.

whoami4041 · 2026-02-27T20:39:17 1772224757

Shameless plug here for Lexega—a deterministic policy enforcement layer for SQL in CI/CD :) https://lexega.com

There are bridges here that the industry has yet to figure out. There is absolutely a place for LLMs in these workflows, and what you've done here with the Mendral agent is very disciplined, which is, I'd venture to say, uncommon. Leadership wants results, which presses teams to ship things that maybe shouldn't be shipped quite yet. IMO the industry is moving faster than they can keep up with the implications.

aluzzardi · 2026-02-27T17:55:34 1772214934

Mendral co-founder here and author of the post.

This is an interesting approach. I definitely agree with the problem statement: if the LLM has to filter by error/fatal because of context window constraints, it will miss crucial information.

We took a different approach: we have a main agent (opus 4.6) dispatching "log research" jobs to sub agents (haiku 4.5 which is fast/cheap). The sub agent reads a whole bunch of logs and returns only the relevant parts to the parent agent.

This is exactly how coding agents (e.g. Claude Code) do it as well. Except instead of having sub agents use grep/read/tail, they use plain SQL.

buryat · 2026-02-27T18:11:41 1772215901

yeah, I saw Claude Code doing lots of grepping/find and was curious if that approach might miss something in the log lines or if loading small portion of interesting log lines into the context could help. I find frequently that just looking at ERROR/WARN lines is not enough since some might not actually be errors and some other skipped log lines might have something to look into.

And I just wanted to try MCP tooling tbh hehe Took me 2 days to create this to be honest

aluzzardi · 2026-02-27T18:26:07 1772216767

From our experience running this, we're seeing patterns like these:

- Opus agent wakes up when we detect an incident (e.g. CI broke on main)

- It looks at the big picture (e.g. which job broke) and makes a plan to investigate

- It dispatches narrowly focused tasks to Haiku sub agents (e.g. "extract the failing log patterns from commit XXX on job YYY ...")

- Sub agents use the equivalent of "tail", "grep", etc (using SQL) on a very narrow sub-set of logs (as directed by Opus) and return only relevant data (so they can interpret INFO logs as actually being the problem)

- Parent Opus agent correlates between sub agents. Can decide to spawn more sub agents to continue the investigation

It's no different than what I would do as a human, really. If there are terabytes of logs, I'm not going to read all of them: I'll make a plan, open a bunch of tabs and surface interesting bits.

prescriptivist · 2026-02-27T18:56:59 1772218619

I have an agent system analyzing time series data periodically. What I've landed on is the tools themselves pre-process time series data, giving it more semantic meaning. AKA converting timestamps to human dates, additionally preprocessing it with statistical analysis, such as calculating current windows min/mean/max value for the series as well as a the same for a trailing window and surfacing those in the data. Also adding a volatility score, and doing things like collapsing runs of similar series that aren't particularly interesting from a volatility perspective and just trying to highlight anomalous series in the window in various ways.

This isn't anything new. It's not particularly technical or novel in any way, but it seems to work pretty well for identifying anomalies and comparing series over time horizons. It's even less token efficient on small windows than piping in a bunch of json, but it seems to be more effective from an analysis point of view.

The strange thing about it is that it involves fairly deterministic analysis before we even send the data to the LLM, so one might ask, what's the point if you're already doing analysis? The answer is that LLMs can actually find interesting patterns across a lot of well presented data, and they can pick up on patterns in a way that feels like they are cross-referencing many different time series and correlate signals in interesting ways. That's where the general purpose LLMs are helpful in my experience.

Breaking out analysis into sub-agents is a logical next step, we just haven't gotten there yet.

And yeah the goal is to approximate those of us engineers who are good at RCAs in the moment, who have instincts about the system and can juggle a bunch of tabs and cross reference the signals in them.

azinman2 · 2026-02-27T19:28:01 1772220481

So how can this be a company when it’s just what Claude code already does?

almosthere · 2026-02-27T20:59:32 1772225972

You may want to also have your agents write small scripts that auto flag future logs.

Have an array of scripts to run against each log (just rust code probably for speed) and have them flag for performance, errors, intrusions, etc...

aluzzardi · 2026-02-27T16:36:34 1772210194

Post author here.

Yes, it works really well.

1) The latest models are radically better at this. We noticed a massive improvement in quality starting with Sonnet 4.5

2) The context issue is real. We solve this by using sub agents that read through logs and return only relevant bits to the parent agent’s context

hinkley · 2026-02-27T18:15:29 1772216129

So you’re not getting alerts at 2 am from hallucinations?

tclancy · 2026-02-28T01:07:04 1772240824

Not from AI no.

sollewitt · 2026-02-27T16:45:53 1772210753

I would be very interested in reading about this kind of orchestration and filtering than data acquisition if you have the energy for another post :)

shad42 · 2026-02-27T16:48:24 1772210904

We started writing very recently: https://www.mendral.com/blog - there is a another post we made yesterday about the overall architecture. And we have a long list of things we're planning to write about in more details.

Taking good note of your comment :)

aluzzardi · 2026-02-27T18:01:58 1772215318

We've actually started to gather metrics this week to write that exact post :) Coming soon!