Residential proxies are the only way to crawl and scrape. It's ironic for this article to come from the biggest scraping company that ever existed!
If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.
First of: Google has not once crashed one of our sites with GoogleBot. They have never tried to by-pass our caching and they are open and honest about their IP ranges, allowing us to rate-limit if needed.
The residential proxies are not needed, if you behave. My take is that you want to scrape stuff that site owners do not want to give you and you don't want to be told no or perhaps pay a license. That is the only case where I can see you needing a residential proxies.
>The residential proxies are not needed, if you behave
I'm starting to think that somee users in hackernews do not 'behave' or at least they think they do not 'behave' and provide an alibi for those that do not 'behave'.
That the hacker in hackernews does not attract just hackers as in 'hacking together features' but also hackers as in 'illegitimately gaining access to servers/data'
As far as I can tell, as a hacker that hacks features together, resi proxies are something the enemy uses. Whenever I boot up a server and get 1000 log in requests per second and requests for commonly exploited files from russian and chinese IPs, those come from resi IPs no doubt. There's 2 sides to this match, no more.
One thing about Google is that many anti-scraping services explicitly allow access to Google and maybe couple of other search engines. Everybody else gets to enjoy CloudFlare captcha, even when doing crawling at reasonable speeds.
do we think a scraper should be allowed to take whatever means necessary to scrape a site if that site explicitly denies that scraper access?
if someone is abusing my site, and i block them in an attempt to stop that abuse, do we think that they are correct to tell me it doesn’t matter what i think and to use any methods they want to keep abusing it?
I'd still like the ability to just block a crawler by its IP range, but these days nope.
1 Hz is 86400 hits per day, or 600k hits per week. That's just one crawler.
Just checked my access log... 958k hits in a week from 622k unique addresses.
95% is fetching random links from u-boot repository that I host, which is completely random. I blocked all of the GCP/AWS/Alibaba and of course Azure cloud IP ranges.
It's almost all now just comming of a "residential" and "mobile" IP address space from completely random places all around the world. I'm pretty sure my u-boot fork is not that popular. :-D
Every request is a new IP address, and available IP space of the crawler(s) is millions of addresses.
I don't host a popular repo. I host a bot attraction.
My wild guess is that jamming is local. Major cities may be fully jammed. To get an idea about GNSS jamming range (different signal of course, probably much easier to jam), there are maps online where you can see which parts of Europe are currently GNSS-jammed. But I have the same question as you.
Definitely much easier to jam. Much higher orbits for gnss satellites, much lower signal intensity.
Also, starlink uses phased arrays with beamforming, effectively creating an electronically steerable directional antenna. It is harder to jam two directional antennas talking to each other, as your jammers are on the sides, where the lobes of the antenna radiation pattern are smaller.
Still, we're talking about signals coming from space, so maybe it is just enough to sprinkle more jammers in an urban setting.. I'm curious as well.
The GPS jamming maps are based on commercial air traffic flying in the area.
While that gives some ideas of how widespread the jamming is, it won't give accurate information about the range (air traffic avoids areas with jamming) of the interference or any information from places where there is no commercial air traffic (war zones, etc).
A number of the Autodesk tools and Solidworks, for modeling. Slicers can use APIs native to Windows to perform model repairs. Bambu Lab's farm manager only runs on Windows.
Not sure about Autodesk, but have you tried FreeCAD? I own a perpetual SolidWorks license but haven't even activated it. Used it quite lot on another license but I just prefer FreeCAD so much. It does choke on high primitive counts though. Probably has worse FEA (invokes external simulation tools) but that is an assumption, never did FEA. Mostly did parametric CAD, not much technical drawings either, can't say much about that.
For slicers I use PrusaSlicer on Linux (don't have a Prusa; it's really good for generic slicing). But I can see how Bambu stuff could be an issue if it's Win only and not Wineable.
Bambu (and other slicers in the same Prusa Slicer family) runs fine on Windows and Mac. It's the automatic model repair that gives it a leg up on Windows.
The Creality one runs decent on Mac and Windows, sadly on Linux its a nightmare, and technically why I ditched Ubuntu / popOS for Arch Linux, but I can't help but still feel it runs a little weirder + its out of date compared to Mac and Windows versions. My buddy used to use Orca slicer on my printer, that one iirc should run on Mac too, but I havent tried it.
SuperSlicer, PrusaSlicer and Creality Print work fine for me using Debian. Orca Slicer runs but reliably crashes when opening the preferences window, something which it has been doing for a long time according to the bug report. Cura also works fine om Debian for me. Which problems did you have running any of these?
Does Creality have special changes made to the slicer? If it's just the profilem, then running the PrusaSlicer app image might be the easiest. PrusaSlicer appimage has always worked perfectly on Ubuntu 22 LTS.
Not the person you replied to, but I’ll go. Try experimenting with ham radio on anything but Windows. As far as I can tell, they revoke your Apple developer’s license and confiscate your Linux install disks when you start selling radio hardware.
That’s not completely true. There’s good Linux and Mac software for lots of things. But approximately 100% of radio manufacturers ship Windows software. Far fewer support anything else.
I bought a new radio at Christmas. Before buying it, I ruled out alternatives that didn’t have 1st party or good 3rd party support. It’s like trying to buy a scanner in 2003.
Are you being sarcastic or serious? Meeting requirements is implicitly part of any task. Quality/quantification will be embedded in the tasks (e.g. X must be Y <unit>); code style and quality guidelines are probably there somewhere in his tasks templates. Implicitly, explicit portions of tasks will be covered by testing.
I do think it's overly complex though; but it's a novel concept.
Everything you said is also done for regular non-ai development, OP is saying there is no way to compare the two (or even compare version x of gas town to version y of gas town) because there are 0 statistics or metrics on what gas town produces.
First time I'm seeing this on HN. Maybe it was posted earlier.
Have been doing manual orchestration where I write a big spec which contains phases (each done by an agent) and instructions for the top level agent on how to interact with the sub agent. Works well but it's hard utilize effectively. No doubt this is the future. This approach is bottlenecked by limitations of the CC client; mainly that I cannot see inter-agent interactions fully, only the tool calls. Using a hacked client or compatible reimplementation of CC may be the answer. Unless the API was priced attractively, or other models could do the work. Gemini 3 may be able to handle it better than Opus 4.5. The Gemini 3 pricing model is complex to say the least though (really).
> state of the art LLMs are able to complete large subtasks or medium size projects alone, almost unassisted, given a good set of hints about what the end result should be
No. I agree with the author, but it's hyperbolic of him to phrase it like this. If you have solid domain knowledge, you'll steer the model with detailed specs. It will carry those out competently and multiply your productivity. However, the quality of the output still reflects your state of knowledge. It just provides leverage. Given the best tractors, a good farmer will have much better yields than a shit one. Without good direction, even Opus 4.5 tends to create massive code repetion. Easy to avoid if you know what you are doing, albeit in a refactor pass.
I feel like a lot of the disagreement over this "large project" capability is that "large project" can mean anything. It can mean something that has a trillion github repos to work with, or it can mean something that is basically uncharted territory.
If this only works for people with like 10+ years of domain experience, doesnt that make this an Anti-AI article? Whole vibe coding sells on the promise that it works and it works for every tom and their mom.
One is LLMs writing code. Not everything and not for everyone. But they are useful for most of the code being written. It is useful.
What it does not do (yet, if ever) is bridging the gap from "idea" to a working solution. This is precisely where all the low-code ideas of the past decades fell apart. Translating an idea in to formal rules is very, very hard.
Think of all of the "just add a button there"-type comments we've all suffered.
Yes that’s how I see it too. It’s a productivity multiplier, but depends on what you put in.
Sure Opus can work fully on its own by just telling it “add a button that does X”, but do that 20 times and the good turns into mush. Steer the model with detailed tech specs on the other hand, and the output becomes magical
Nonsense indeed. The model knowledge is the current state of the art. Any computation it does, advances it. It re-ingests work of prior agents every time you run it on your codebase, so even though the model initializes the same way (until they update the model), upon repeated calls it ingests more and more novel information, inching the state of the art ever forwards.
I've seen terrible things where it would overcomplicate and duplicate. But I've also seen it write really good code. I've been trying to get it to do the latter consistently. Detailed specs and heavy use of agents really helps with the code quality. The next step is editing the system prompts, to trim away any of the fat that's polluting the context.
Are you hand-fixing the issues or having AI do it? I've found that second pass quality is miles away from an initial implementation. If you're experienced, you'll know exactly where the code smells are. Point this out, and the agents will produce a much better implementation in this second pass. And have those people store the promps in the repo! I put my specifications in ./doc/spec/*.md
Every time I got bad results, looking back I noticed my spec was just vague or relied on assumptions. Of course you can't fix your collegues, if they suck they suck and sombody gotta do the mopping :)
If you crawl at 1Hz per crawled IP, no reasonable server would suffer from this. It's the few bad apples (impatient people who don't rate limit) who ruin the internet for both users and hosters alike. And then there's Google.
reply