zkitty's comments

zkitty · 2026-02-12T17:17:30 1770916650

Look at Browser Use. They self-reported 89% on WebVoyager. On hard tasks with a real benchmark, they score 8.1%. That's not a performance drop….. that's a different product than what's being advertised.

agenticagent · 2026-02-12T17:27:29 1770917249

To be fair, this isn't just a Browser Use problem. Look at the drop-off for every agent as tasks get harder:

Operator goes from 83% easy → 43% hard. That's a 40-point cliff.

Claude Computer Use: 90% easy → 32% hard. 58-point drop.

Browser Use: 55% easy → 8% hard. Just falls off a cliff entirely.

TinyFish: 97.5% easy → 81.9% hard. 15-point drop.

The gap between easy and hard is where you see if a system actually works or if it's just good at simple tasks. Every other agent loses half its ability or more when tasks get complex. We lose 15 points.

That's the difference between "cool demo" and "I can actually ship this."

zkitty · 2026-02-11T01:42:52 1770774172

Some people say WebMCP is a direct solution for slow inaccurate crawlers.

WebMCP is not designed for that use-case.

WebMCP is designed for website owners to open direct access to agents by embedding MCP tools onto their own websites.

People use browser agents for adversarial operations; data extraction from competitors' websites, etc. Those cases are not solvable with WebMCP.

zkitty · 2026-02-11T01:36:32 1770773792

To use WebMCP, you need: - Chrome: Version 146.0.7672.0 or higher, which means a next release. - Flags: The "WebMCP for testing" flag must be enabled.

What is WebMCP? WebMCP is a proposed web standard that exposes structured tools for AI agents on existing websites. This would replace "screen-scraping" with robust, high-performance page interaction and knowledge retrieval.

zkitty · 2026-01-16T23:24:21 1768605861

For those who blocked by pay wall: Here is their original announcement tweet. https://x.com/OpenAI/status/2012223373489614951?s=20

PyWoody · 2026-01-17T00:56:49 1768611409

https://xcancel.com/OpenAI/status/2012223373489614951

For those who are blocked by Twitter.

zkitty · 2025-12-12T07:20:22 1765524022

Note: how Anthropic introduced - "Anthropic Donated MCP to Linux Foundation." so, who runs the governance, then? https://www.anthropic.com/news/donating-the-model-context-pr...

zkitty · 2025-12-12T07:07:00 1765523220

News: The Agentic AI Foundation (AAIF) is a directed fund under the Linux Foundation co-founded by Anthropic, Block and OpenAI, with support from Google, Microsoft, AWS, Cloudflare and Bloomberg. The AAIF aims to ensure agentic AI evolves transparently, collaboratively, and in the public interest through strategic investment, community building, and shared development of open standards.

zkitty · 2025-12-10T21:44:12 1765403052

This is another WebMCP use case: testing features during vibe-coding. One prompt, 60 seconds from writing requirements to validating the implementation. check out this tutorial: https://screen.studio/share/y9b9Fmnc

zkitty · 2025-09-21T23:15:47 1758496547

I received this email too. The page title was "Gitcoin | $15M Github Developer Fund 2025 (Issue #225)". The application link is https://gitcoin-developers.com. Obviously copied site from https://gitcoin.co Full email is here. https://jumpshare.com/s/p7Nk2sioWJDZpRjhFutY

zkitty · on July 11, 2024

I've been thinking about how we assess software engineering skills, and I'm curious about others' thoughts on using open source contributions as a primary metric.

My hypothesis is that real-world collaboration and communication skills, as demonstrated through open source work, are more indicative of a developer's capabilities than typical coding quizzes. (I tried OtherBranch's sample coding problem mentioned at their post, https://www.otherbranch.com/practice-coding-problem, and got this opinion.) With the rise of AI tools, I believe the ability to effectively use these to enhance one's contributions is becoming increasingly valuable.

For those who hire or work with other developers:

1. How much weight do you give to a candidate's open source contributions?

2. Do you find that strong open source contributors tend to be better collaborators?

3. How do you balance assessing technical skills vs. communication/collaboration abilities?

I'm working on a platform to facilitate this new assessment with personalized LLM support. So I'd love to hear your experiences and thoughts!