Running AI inference workloads on Nvidia GPUs , and the cost is a real pain point. Projects like this matter because GPU vendor lock-in directly affects what startups can afford to build. Would love to see how this performs on common inference ops like conv2d and attention layers.
We've been building our frontend with AI assistance and the bottleneck has shifted from writing code to reviewing it. Faster tooling helps, but I wonder if the next big gain is in tighter feedback loops — seeing your changes live as the AI generates them, rather than waiting for a full build cycle.
Exactly this. And what makes it compound is that you can not build muscle memory for patterns you have already reviewed. Same prompt, different output every time, so every generation is a fresh read even if you have seen similar code before.
The feedback loop angle is interesting. Real-time linting during generation rather than after could help catch issues earlier, but I think the deeper problem is the non-determinism. Even with instant feedback, if the output changes on each run you are still starting from scratch each time.
Have you found anything that actually reduces the review time per component, or is it mostly about finding issues faster?
Are your frontend builds actually so slow that you're not seeing them live? I've gotten used to most frontend builds being single digit seconds or less for what feels like a decade now.
Not build speed, the human review cycle. When the AI generates a component, I still need to read through it manually to make sure it does what I intended, handles edge cases, and fits the existing patterns. That takes 8-12 minutes per component regardless of how fast the build is.
The slow part is not the computer. It is me reading AI-generated code line by line before I trust it enough to ship.
Excited to see the improvements in coding benchmarks. I use Claude daily and the jump in reliability from 4.5 to 4.6 has been noticeable, especially for debugging complex multi-step workflows.