More

edg5000 · 2026-01-31T05:23:23 1769837003

Residential proxies are the only way to crawl and scrape. It's ironic for this article to come from the biggest scraping company that ever existed!

If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.

mrweasel · 2026-01-31T13:48:11 1769867291

First of: Google has not once crashed one of our sites with GoogleBot. They have never tried to by-pass our caching and they are open and honest about their IP ranges, allowing us to rate-limit if needed.

The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.

TZubiri · 2026-01-31T15:34:42 1769873682

>The residential proxies are not needed, if you behave

I'm starting to think that somee users in hackernews do not 'behave' or at least they think they do not 'behave' and provide an alibi for those that do not 'behave'.

That the hacker in hackernews does not attract just hackers as in 'hacking together features' but also hackers as in 'illegitimately gaining access to servers/data'

As far as I can tell, as a hacker that hacks features together, resi proxies are something the enemy uses. Whenever I boot up a server and get 1000 log in requests per second and requests for commonly exploited files from russian and chinese IPs, those come from resi IPs no doubt. There's 2 sides to this match, no more.

Ronsenshi · 2026-01-31T07:53:30 1769846010

One thing about Google is that many anti-scraping services explicitly allow access to Google and maybe couple of other search engines. Everybody else gets to enjoy CloudFlare captcha, even when doing crawling at reasonable speeds.

Rules For Thee but Not for Me

chii · 2026-01-31T08:11:45 1769847105

> many anti-scraping services explicitly allow access to Google and maybe couple of other search engines.

because google (and the couple of other search engines) provide enough value that offset the crawler's resource consumption.

JasonADrury · 2026-01-31T11:20:01 1769858401

That's cool, but it's impossible for anyone to ever build a competitor that'd replace google without bypassing such services.

ehhthing · 2026-01-31T09:05:52 1769850352

You say this like robots.txt doesn't exist.

toofy · 2026-01-31T15:04:23 1769871863

it almost sounds like they’re saying the contents of robots.txt shouldn’t matter… because google exists? or something?

implying “robots.txt explicitly says i can’t scrape their site, well i want that data, so im directing my bot to take it anyway.”

sitzkrieg · 2026-01-31T15:42:14 1769874134

so many things flat out ignore it in 2026 let's be real

ErroneousBosh · 2026-01-31T14:34:57 1769870097

Why are you scraping sites in the first place? What legitimate reason is there for you doing that?

toofy · 2026-01-31T15:00:38 1769871638

do we think a scraper should be allowed to take whatever means necessary to scrape a site if that site explicitly denies that scraper access?

if someone is abusing my site, and i block them in an attempt to stop that abuse, do we think that they are correct to tell me it doesn’t matter what i think and to use any methods they want to keep abusing it?

that seems wrong to me.

BatteryMountain · 2026-01-31T06:38:27 1769841507

Saying the quiet part out loud...Shhhs

megous · 2026-01-31T13:07:31 1769864851

I'd still like the ability to just block a crawler by its IP range, but these days nope.

1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.

Just checked my access log... 958k hits in a week from 622k unique addresses.

95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.

It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D

Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.

I don't host a popular repo. I host a bot attraction.

kstrauser · 2026-01-31T15:39:39 1769873979

I’ve been enduring that exact same traffic pattern.

I used Anubis and a cookie redirect to cut the load on my Forgejo server by around 3 orders of magnitude: https://honeypot.net/2025/12/22/i-read-yann-espositos-blog.h...

edg5000 · 2026-01-26T06:55:29 1769410529

My wild guess is that jamming is local. Major cities may be fully jammed. To get an idea about GNSS jamming range (different signal of course, probably much easier to jam), there are maps online where you can see which parts of Europe are currently GNSS-jammed. But I have the same question as you.

4gotunameagain · 2026-01-26T07:44:00 1769413440

> probably much easier to jam

Definitely much easier to jam. Much higher orbits for gnss satellites, much lower signal intensity.

Also, starlink uses phased arrays with beamforming, effectively creating an electronically steerable directional antenna. It is harder to jam two directional antennas talking to each other, as your jammers are on the sides, where the lobes of the antenna radiation pattern are smaller.

Still, we're talking about signals coming from space, so maybe it is just enough to sprinkle more jammers in an urban setting.. I'm curious as well.

exDM69 · 2026-01-26T13:56:28 1769435788

The GPS jamming maps are based on commercial air traffic flying in the area.

While that gives some ideas of how widespread the jamming is, it won't give accurate information about the range (air traffic avoids areas with jamming) of the interference or any information from places where there is no commercial air traffic (war zones, etc).

moebrowne · 2026-01-26T13:08:19 1769432899

https://gpsjam.org/

edg5000 · 2026-01-26T04:14:44 1769400884

> the software support for what I need is much better in the Windows world

Please elaborate; can you name a few tools and what you use them for? Just curious.

bdcravens · 2026-01-26T05:33:16 1769405596

A number of the Autodesk tools and Solidworks, for modeling. Slicers can use APIs native to Windows to perform model repairs. Bambu Lab's farm manager only runs on Windows.

edg5000 · 2026-01-26T07:57:31 1769414251

Not sure about Autodesk, but have you tried FreeCAD? I own a perpetual SolidWorks license but haven't even activated it. Used it quite lot on another license but I just prefer FreeCAD so much. It does choke on high primitive counts though. Probably has worse FEA (invokes external simulation tools) but that is an assumption, never did FEA. Mostly did parametric CAD, not much technical drawings either, can't say much about that.

For slicers I use PrusaSlicer on Linux (don't have a Prusa; it's really good for generic slicing). But I can see how Bambu stuff could be an issue if it's Win only and not Wineable.

bdcravens · 2026-01-26T16:36:14 1769445374

Bambu (and other slicers in the same Prusa Slicer family) runs fine on Windows and Mac. It's the automatic model repair that gives it a leg up on Windows.

giancarlostoro · 2026-01-26T06:04:33 1769407473

The Creality one runs decent on Mac and Windows, sadly on Linux its a nightmare, and technically why I ditched Ubuntu / popOS for Arch Linux, but I can't help but still feel it runs a little weirder + its out of date compared to Mac and Windows versions. My buddy used to use Orca slicer on my printer, that one iirc should run on Mac too, but I havent tried it.

hagbard_c · 2026-01-26T09:30:58 1769419858

SuperSlicer, PrusaSlicer and Creality Print work fine for me using Debian. Orca Slicer runs but reliably crashes when opening the preferences window, something which it has been doing for a long time according to the bug report. Cura also works fine om Debian for me. Which problems did you have running any of these?

edg5000 · 2026-01-26T13:02:44 1769432564

Does Creality have special changes made to the slicer? If it's just the profilem, then running the PrusaSlicer app image might be the easiest. PrusaSlicer appimage has always worked perfectly on Ubuntu 22 LTS.

giancarlostoro · 2026-01-26T14:04:18 1769436258

I'm probably never going back to Ubuntu. I believe it was crying about me not having the right version of GLIBC, and it just frustrated me.

bdcravens · 2026-01-26T16:37:03 1769445423

Bambu (and Orca, etc) runs fine on Windows and Mac. It's the automatic model repair that gives it a leg up on Windows.

wileydragonfly · 2026-01-26T04:33:04 1769401984

[flagged]

kstrauser · 2026-01-26T04:56:01 1769403361

Not the person you replied to, but I’ll go. Try experimenting with ham radio on anything but Windows. As far as I can tell, they revoke your Apple developer’s license and confiscate your Linux install disks when you start selling radio hardware.

That’s not completely true. There’s good Linux and Mac software for lots of things. But approximately 100% of radio manufacturers ship Windows software. Far fewer support anything else.

I bought a new radio at Christmas. Before buying it, I ruled out alternatives that didn’t have 1st party or good 3rd party support. It’s like trying to buy a scanner in 2003.

quietsegfault · 2026-01-26T12:58:51 1769432331

Almost all the “good” radio software runs primarily on Windows. Almost all of it is ancient in ways that genuinely suck. It’s like going back in time!

kstrauser · 2026-01-26T15:26:24 1769441184

So true! “Thank your buying Yaesu! Here’s your configuration app. Requirements: Windows 98 or below.”

quietsegfault · 2026-01-26T20:57:07 1769461027

I find it quaint, and reminds me of the good ol' days.

edg5000 · 2026-01-25T05:55:23 1769320523

What is 1A and PAFCA?

joecool1029 · 2026-01-25T06:02:06 1769320926

1st amendment and https://en.wikipedia.org/wiki/Protecting_Americans_from_Fore...

edg5000 · 2026-01-24T09:16:06 1769246166

Are you being sarcastic or serious? Meeting requirements is implicitly part of any task. Quality/quantification will be embedded in the tasks (e.g. X must be Y <unit>); code style and quality guidelines are probably there somewhere in his tasks templates. Implicitly, explicit portions of tasks will be covered by testing.

I do think it's overly complex though; but it's a novel concept.

63stack · 2026-01-24T10:54:53 1769252093

Everything you said is also done for regular non-ai development, OP is saying there is no way to compare the two (or even compare version x of gas town to version y of gas town) because there are 0 statistics or metrics on what gas town produces.

walthamstow · 2026-01-24T13:13:22 1769260402

It's 3 weeks old. If you're so desperate for numbers, give it a go?

pydry · 2026-01-24T11:07:37 1769252857

>Are you being sarcastic or serious?

I think if you'd read the article through you'd know they were serious coz Yegge all but admits this himself.

edg5000 · 2026-01-24T08:59:04 1769245144

First time I'm seeing this on HN. Maybe it was posted earlier.

Have been doing manual orchestration where I write a big spec which contains phases (each done by an agent) and instructions for the top level agent on how to interact with the sub agent. Works well but it's hard utilize effectively. No doubt this is the future. This approach is bottlenecked by limitations of the CC client; mainly that I cannot see inter-agent interactions fully, only the tool calls. Using a hacked client or compatible reimplementation of CC may be the answer. Unless the API was priced attractively, or other models could do the work. Gemini 3 may be able to handle it better than Opus 4.5. The Gemini 3 pricing model is complex to say the least though (really).

edg5000 · 2026-01-15T05:11:50 1768453910

Has anybody tried this? Probably won't make sense to use Opus with this due to the API costs, but other models may work well perhaps?

a7m-1st · 2026-01-19T18:33:23 1768847603

You can have a try; almost all sota models are supported all powered thanks to https://github.com/camel-ai/camel

edg5000 · 2026-01-21T07:10:47 1768979447

Wow, CAMEL looks very interesting, newer heard of that. Will look into it.

WorldPeas · 2026-01-15T18:04:59 1768500299

some of the top-scorers of telecom bench may be good candidates https://artificialanalysis.ai/evaluations/tau2-bench

edg5000 · 2026-01-11T11:28:00 1768130880

> state of the art LLMs are able to complete large subtasks or medium size projects alone, almost unassisted, given a good set of hints about what the end result should be

No. I agree with the author, but it's hyperbolic of him to phrase it like this. If you have solid domain knowledge, you'll steer the model with detailed specs. It will carry those out competently and multiply your productivity. However, the quality of the output still reflects your state of knowledge. It just provides leverage. Given the best tractors, a good farmer will have much better yields than a shit one. Without good direction, even Opus 4.5 tends to create massive code repetion. Easy to avoid if you know what you are doing, albeit in a refactor pass.

biophysboy · 2026-01-11T17:03:08 1768150988

I feel like a lot of the disagreement over this "large project" capability is that "large project" can mean anything. It can mean something that has a trillion github repos to work with, or it can mean something that is basically uncharted territory.

falloutx · 2026-01-11T11:57:29 1768132649

If this only works for people with like 10+ years of domain experience, doesnt that make this an Anti-AI article? Whole vibe coding sells on the promise that it works and it works for every tom and their mom.

gherkinnn · 2026-01-11T15:37:19 1768145839

This conflates two things.

One is LLMs writing code. Not everything and not for everyone. But they are useful for most of the code being written. It is useful.

What it does not do (yet, if ever) is bridging the gap from "idea" to a working solution. This is precisely where all the low-code ideas of the past decades fell apart. Translating an idea in to formal rules is very, very hard.

Think of all of the "just add a button there"-type comments we've all suffered.

Xunjin · 2026-01-12T09:21:21 1768209681

"How hard can it be to add just a button?"

artdigital · 2026-01-11T11:39:42 1768131582

Yes that’s how I see it too. It’s a productivity multiplier, but depends on what you put in.

Sure Opus can work fully on its own by just telling it “add a button that does X”, but do that 20 times and the good turns into mush. Steer the model with detailed tech specs on the other hand, and the output becomes magical

epolanski · 2026-01-12T00:29:36 1768177776

Didn't somebody built a rather decent and fully compliant html parser by copy pasting 8000+ tests from another project?

That qualifies as a good set of hints about what the end result should be.

edg5000 · 2026-01-11T11:24:20 1768130660

Nonsense indeed. The model knowledge is the current state of the art. Any computation it does, advances it. It re-ingests work of prior agents every time you run it on your codebase, so even though the model initializes the same way (until they update the model), upon repeated calls it ingests more and more novel information, inching the state of the art ever forwards.

JackSlateur · 2026-01-11T11:39:14 1768131554

Current state of the art ? You must be joking .. I see code it has generated, some interns does better.

Obviously, you are also joking about the thing that AI is immune to consanguinity, right ?

edg5000 · 2026-01-14T05:46:43 1768369603

I've seen terrible things where it would overcomplicate and duplicate. But I've also seen it write really good code. I've been trying to get it to do the latter consistently. Detailed specs and heavy use of agents really helps with the code quality. The next step is editing the system prompts, to trim away any of the fat that's polluting the context.

simonw · 2026-01-11T12:15:18 1768133718

If you have had interns who can write better code than Opus 4.5 I would very much like to hire them.

edg5000 · 2026-01-11T11:20:59 1768130459

Are you hand-fixing the issues or having AI do it? I've found that second pass quality is miles away from an initial implementation. If you're experienced, you'll know exactly where the code smells are. Point this out, and the agents will produce a much better implementation in this second pass. And have those people store the promps in the repo! I put my specifications in ./doc/spec/*.md

Every time I got bad results, looking back I noticed my spec was just vague or relied on assumptions. Of course you can't fix your collegues, if they suck they suck and sombody gotta do the mopping :)