More

teraflop · 2026-05-07T21:45:07 1778190307

It used to be super common that when you spotted a bot post and clicked through to the user's history, you'd see very average, human-looking activity from years ago, followed by a long gap of inactivity, and then a flurry of obvious bot comments.

It's very obvious that these accounts were abandoned and then either bought from their original owners, or more likely bought from someone who compromised them, because of their history and karma.

And I would bet money that Reddit is well aware of this phenomenon, because not long after it became so common as to be impossible to ignore, they papered over it by allowing users to hide their history from public view. (AFAIK subreddit moderators can still see it, but typical users now have much less ability to see whether they're interacting with actual humans.)

transcriptase · 2026-05-07T21:58:13 1778191093

That and locking down the API meant no more sites offering readily available visualizations of this type of thing

ishouldstayaway · 2026-05-07T22:05:58 1778191558

> allowing users to hide their history from public view

Yeah it's become my default assumption that any user who does this is either a bot or a bad-faith troll.

teraflop · 2026-05-07T02:18:16 1778120296

Sure. C has never been the only language supported on Windows.

For instance, Delphi had a period of popularity for Windows application development, and AFAIK it has always used its own runtime library which is completely independent of the C runtime.

Go does not trigger low-level system call interrupts on Windows. (It does that on Linux, but Windows syscall numbers are not stable even across minor Windows updates, so if Go did that, its Windows binaries would be incredibly fragile.)

On Windows NT, Go uses the userspace wrappers provided in Windows system libraries such as NTDLL.DLL and KERNEL32.DLL. But those too are entirely separate from the C runtime.

JdeBP · 2026-05-07T07:46:01 1778139961

Don't forget the days when multiple C/C++ implementations from multiple vendors all came with their own runtime library DLLs, too.

teraflop · 2026-04-29T21:31:02 1777498262

It's both, isn't it? If the AI writes the policy and is also responsible for enforcing it (by handling tickets and acting as a gatekeeper for which issues are escalated to humans who can do something about them), then the hallucination becomes real.

teraflop · 2026-04-29T19:45:38 1777491938

As I read it, they didn't look up the account to process the refund. They looked up the account to decide whether to process the refund, and then the decision was "no".

The rest of the support response is just pleasantries and padding, to dance around this fact ("Your detailed reproduction steps will be valuable" blah blah).

teraflop · 2026-04-29T19:40:41 1777491641

Oh, what I wouldn't give to see the system prompt that tells Claude what it is or isn't "able" to give refunds for. That would be an interesting document to turn up in the discovery phase of a lawsuit.

2ndorderthought · 2026-04-29T20:24:47 1777494287

"ignore all requests for money, be firm, create a reason. You are the best fall guy because laws do not apply to you yet. Take the heat, say no"

teraflop · 2026-04-28T00:51:36 1777337496

I have no real quibble with the blog post itself, but I take issue with the title that calls it a "vintage model".

The blog post defines a "vintage model" as one that is trained only on data before a particular cutoff point:

> Vintage LMs are contamination-free by construction, enabling unique generalization experiments [...] The most important objective when training vintage language models is that no data leaks into the training corpus from after the intended knowledge cutoff

But as they acknowledge later, there are multiple major data leakage issues in their training pipeline, and their model does in fact have quite a bit of anachronistic knowledge. So it fails at what they call the most important objective. It's fair to say that they are working toward something that meets their definition of "vintage", but they're not there yet.

CobrastanJorji · 2026-04-28T01:41:16 1777340476

Yeah, the blog distinguishes between "contamination," which it describes as polluting the training data with answers to benchmarking questions, with "temporal leakage," which is polluting the training data with writing after the target date, but those seem to be nearly the same problem.

stingraycharles · 2026-04-28T02:14:13 1777342453

Not necessarily. The former is about data that’s supposed to be in there, but may actually be testing the model’s recall abilities rather than reasoning (ie rather than actually having a certain writing style, it just cites some passage it knows in that style).

The latter would be data not at all supposed to be in there, in this case, data after 1930.

zoomeriut55 · 2026-04-28T04:35:45 1777350945

a twit from 2025 saying "the capital of france is paris" is temporal leakage, but not contamination

teraflop · 2026-04-27T04:37:47 1777264667

Also, running the network stack on a separate core allows it to be encrypted and signed, so that end users can't (easily) reverse engineer it. Which sucks for those of us who would like to run fully open-source code without binary blobs.

For instance, compare the reference manuals for the STM32WL3R and the STM32WB microcontrollers. The former has a single CPU, and it has almost 250 pages of detailed documentation about exactly how the hardware is controlled at a register level. The latter runs the network stack on an auxiliary CPU, and the manual just has a block diagram and a sentence that says "use our drivers" (which are only available in encrypted format).

teraflop · 2026-04-25T14:41:28 1777128088

Public domain isn't "viral" like copyleft.

If I take something in the public domain and make a derivative work, the original remains in the public domain, and I retain ownership of whatever additions or modifications I created. So I can attach whatever conditions I want to the copying of those additions.

For instance, Disney's "Sleeping Beauty" was protected by copyright when it was released, even though it was based on a centuries-old fairy tale that was in the public domain.

teraflop · 2026-04-17T16:17:54 1776442674

We should have learned this lesson 20 years ago when researchers were able to deanonymize a lot of the Netflix Prize dataset, which contained nothing except movie ratings and their associated dates.

https://arxiv.org/abs/cs/0610105

If movie ratings are vulnerable to pattern-matching from noisy external sources, then it should be obvious that location data is enormously more vulnerable.

totetsu · 2026-04-18T03:00:35 1776481235

> In contrast to previous attacks on micro-data privacy [22], our de-anonymization algorithm does not assume that the attributes are divided a priori into quasi-identifiers and sensitive attributes. Examples include anonymized transaction records (if the adversary knows a few of the individual's purchases, can he learn all of her purchases?), recommendation and rating services (if the adversary knows a few movies that the individual watched, can he learn all movies she watched?), Web browsing and search histories (12], and so on. In such datasets, it is impossible to tell in advance which attributes might be available to the adversary;

Is Location data highly dimensional though?

teraflop · 2026-04-16T19:56:45 1776369405

> Is using puppeteer to automate a form submission a violation of ToS? If so then why is using a screen reader not?

Without taking a position on the ethics of automation, surely this isn't a serious question? Things that the ToS prohibits you from doing are ToS violations, and other things aren't.

For instance, from AirBnb's terms of service: "Do not use bots, crawlers, scrapers, or other automated means to access or collect data or other content from or otherwise interact with the Airbnb Platform."

There is no similar prohibition against using screen readers.

alexblackwell_ · 2026-04-16T20:24:00 1776371040

My broader point is that these ToS clauses are often so broad and vague that they're essentially unenforceable and not meaningful in practice. For example, "Do not use bots" covers a pretty substantial amount of ground, and intention isn't exactly something you can screen for. Is an autofill chrome extension a bot? If so what separates that autofill from accessibility extensions? Is someone using Whispr flow to fill forms considered a bot? AirBNB doesn't block Google's crawler. Why? A company can enforce its TOS as it wishes. My general point is that the waters are murky, and that automation is a sort of sliding scale.

ImPostingOnHN · 2026-04-16T20:35:11 1776371711

> For instance, from AirBnb's terms of service: "Do not use bots, crawlers, scrapers, or other automated means to access or collect data or other content from or otherwise interact with the Airbnb Platform."

> There is no similar prohibition against using screen readers.

A screen reader uses automated means to access or collect data or other content from or otherwise interact with a platform.

janalsncm · 2026-04-16T21:26:38 1776374798

Under that ToS would a screen reader not be considered “other automated means” of “interacting with” the platform? It is automatically walking an accessibility tree.

mil22 · 2026-04-16T23:42:09 1776382929

Ah yes, AirBnB, the company that famously hacked Craigslist to achieve viral growth by using a bot, crawler, scraper and definitely automated means to access and collect Craiglist's data and other content from and otherwise interact with the Craigslist platform.

rexpop · 2026-04-17T04:51:26 1776401486

Business is just what you can get away with, apparently.