Hacker Newsnew | past | comments | ask | show | jobs | submit | bfeynman's commentslogin

Isn't what the leading labs are currently chasing after is not pretraining and massive parameters but enriched and deep fine tuning and post training for agentic tasks/coding? MoE with just new post training paradigms lets smaller models perform quite well, and much more pragmatic to scale inference with. Given that, this choice seems super odd, as the frontier labs seem to stay neck and neck, and I don't even see Grok being used in any benchmarks because of how poorly it performs

Nice read but falls into a vast reductionist trap, a lot of survivorship bias dressed up as design philosophy or strategic bets. The context of decisions made decades ago != now, people were working under different constraints etc. Trying to frame the avionics example as the "subtractive" innovation is the most egregious, transistors were over 1000x times smaller, weight wasn't even a consideration.

> a lot of survivorship bias dressed up as design philosophy or strategic bets

I wish more people realized this


Robinhood did exact same thing, it's more for marketing reach and distribution stuff. Wouldn't be surprised in few years they let it go or spin it down, just paying for a funnel/some narrative control

pretty horrifying. I only use it as lightweight wrapper and will most likely move away from it entirely. Not worth the risk


Even just having an import statement for it is enough to trigger the malware in 1.82.8.


not to mention they are using 3p apis for everything.. gemini, reranking etc...


I think I've lost count of how many of these start ups I've seen. But what I really cant fathom is that pricing which is completely out of band. You can already talk to files directly with gemini, just wrapping other apis etc makes no sense. This is even stuff now you can easily codegen entire solutions for esp object storage based ones. Don't see actual any value add or differentiators here. It's obviously not that secure, and ingestion pipeline/connectors are also commodity.


You're right that you can chat with files using Gemini or a codegen'd RAG pipeline, and that does work well for a lot of teams.

The problem that Captain really addresses comes when production pipelines need to run continuously over large file corpora with fast, incremental indexing, and reliable latency. The maintenance required in these situations is often quite significant.

Captain focuses specifically on making sure the retrieval layer can operate smoothly so folks don't have to scale & maintain the infrastructure themselves.


For use cases where the increased value ~= 20%, the cost of the distraction with that low of a margin is a hard sell. (Just based on your intro, that was my read)

No disputation of the core idea, I think you are on the right track, but the pitch isn't compelling. People looking for these kinds of AI solutions tend to favor simplicity and ~80% is fine, because the overall perceived productivity improvement is 5-10x, with such wide error bars that the approximate gain is just not worth maximizing for right now.

You might be a few months-years early, or target people who have maxxed out because they cannot retrieve from their second brain effectively. Most folks I've talked to are just trying to keep up, optimization/efficiency is not on their radar.


Does anyone else find the way they are writing this full marketing hubris that definitely misconstrues how most people would interpret this. Codex isn't a "model" that is self improving, it's using GPT to write code that is in a wrapper program that also uses GPT. Sure it's kind of neat loop for development, but why are they anthropomorphizing it so much? People designing chips don't say that computers are self evolving, even anthropic just says that claude (the model) writes most of claude code. Heck, you could use claude or any llm to write code for codex..


Lot of puffery in this describing constraint and actual messy problems that you are all most likely just being thrown into the context for an llm agent... None of the case studies demonstrate complex scheduling at all and are just all individual serial threads. buffers, preferences and options are all simple. The hard part of scheduling is when you have multiple pending invites or invitations that have to resolve and track it down, if someone asks for a meeting on a day that you currently already have a pending invite for, and how far away that day is, and how important the relationship is etc...


The concurrent resolution problem you're describing is exactly what we deal with. When a staffing coordinator has 15 interviews to book across shared interviewers, confirming one cascades into others. We track pending holds, rank by urgency, and when a confirmation on one thread invalidates a proposal on another, Vela detects the conflict and re-proposes. Theres

The only other alternative is a booking link but this, slows down business, doesnt work in many many real life situations and more :)


Fair feedback that the case studies don't show this well - they're simplified to demonstrate the flow. The multi-party dependency resolution is happening underneath but we could surface that better.

On the LLM point - agreed that context window alone doesn't cut it. The coordination and state management layer sits outside the model. We learned that the hard way early on.


openclaw while cool just allowed a larger tranche of technophiles who didn't necessarily have all the skills/understanding or time to do a bunch of things that have been readily available for like over 1.5 years. There is value in that, but there is huge surge in the number of people who are even able to take advantage of the novelty. Reminds me of when hugging face came out with transformers and all of a sudden you no longer needed to wrestle with anaconda and order of installation for all the deps.


All in on AWS and using GitOps with TF instead of much more feature rich CDK...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: