While the caveman stuff is obviously not serious, there is a lot of legit research in this area.
Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality.
There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important.
Who would suspect that the companies selling 'tokens' would (unintentionally) train their models to prefer longer answers, reaping a HIGHER ROI (the thing a publicly traded company is legally required to pursue: good thing these are all still private...)... because it's not like private companies want to make money...
LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.
I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.
I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.
Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does.
It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task.
Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine.
I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens.
I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements.
Yeah you could easily imagine stenography like inputs and outputs for rapid iteration loops. It's also true that in social media people already want faster-to-read snippets that drop grammar so the desire for density is already there for human authors/readers.
That's not how it works. Many people get confused by the “expert” naming, when in reality the key part of the original name “sparse mixture of experts” is sparse.
Experts are just chunks of each layers MLP that are only partially activated by each token, there are thousands of “experts” in such a model (for Qwen3-30BA3, it was 48 layers x 128 “experts” per layer with only 8 active at each token)
Yes, and assuming it will not become popular, this will expire / not renew in 6 months.
It’s also worth noting that the author is affiliated with a company based in Bermuda. So it doesn’t feel like it comes from a legitimate institute. For all i know this was vibe-written by an AI in an afternoon.
"Founded in 1998, One Communications Ltd. (formerly KeyTech Limited) is a diverse telecommunications holding company. Its subsidiary companies specialise in cellular voice, high-speed internet, subscription television and data solutions for both residential and corporate customers.
In 2014, One Communications Ltd. began a series of strategic mergers and acquisitions in order to position itself competitively in an industry driven by technological change. The Company acquired internet, cellular and cable television companies in both Bermuda and the Cayman Islands. These transactions have transformed One Communications Ltd. into a robust triple-play service provider with the networks and data access infrastructures needed to meet the demands of ever-growing bandwidth consumption. Through its operating subsidiaries, the Company is positioned as the leading full-service telecommunications provider for corporate and residential customers in both Bermuda and Cayman.
The operating subsidiaries of One Communications Ltd. are Logic Communications Ltd. (trading as One Communications), Bermuda Digital Communications Ltd. (trading as One Communications), Cable Co. Ltd., and WestTel Limited in the Cayman Islands (trading as Logic)."
Why not discuss the contents of the draft and why it's awful. The fact that the author works for a telecom provider in a small country does not by itself mean much. Perhaps the proposal has been trialled there
I believe Bermuda is a tax shelter country, which means people and companies register there to hide identity and income from the nations they live and do business in. Because of that, the vast
majority of businesses registered in bermuda are not legitimate institutions - they are shell companies defrauding their home nations.
And the home nation's governments defraud their people with unnecessary wars, wasteful spending, unpayable debt, and excessive inflation. There comes a time when paying less tax is the right thing to do.
Why are you even defending this practice? It's something very wealthy people do, they're not your everyday citizens conscious about how their taxes go.
They evade taxes for financial reasons, not moral reasons.
I can think of few groups as likely to support wars than the ultra rich, but if you are very wealthy and don’t like your tax dollars going to military spending just invest in lockheed or raytheon and get it all back as dividends. War spending doesn’t justify tax fraud, unless you’re also out on the protest line when a new war breaks out.
As the top tax rates fell, from 90% in 1950 to under 40% now - the use of tax shelters increased. So unless your “comes a time” is referencing pre 1915 USA, this isn’t a valid justification.
If inflation is the issue, keep your money in a different currency.
I just don’t see actions from the very rich (the ones using tax shelters) that back up your justifications.
I think it’s simply the collapse of any kind of cohesion between the wealthy and the nation in which they live. Or put another way: I’m rich, i shouldn’t have to pay for stuff i don’t use!
> The suddenness with which his and my accounts were canceled, coupled with the complete lack of any sort of appeals process, leads me to believe this is the result of Amazon turning over their account review process to AI.
So all this is speculation.
> Amazon created an AI agent to look at every account and, instead of flagging them for any potential violations, had them canceled outright. I'm not sure what the thinking was on their part.
The speculation is getting strangely specific…
> In theory, with something like this, you would have tested the process by simply running a report before doing anything to actually impact any accounts. But if they did that, they would've gotten tons of obviously false positives
But we don’t know anything about what happened, this is based on a hypothesis, speculating on the mistakes made, and how it should have been done, etc.
> 3. They did test it, saw that it raised more flags than they had the manpower to properly investigate, but said, "We're the 800 pound gorilla in the room here. F--- them!" and rolled the AI agent into production anyway.
> Obviously, I have no particular insight into Amazon's inner workings, but I'm inclined to think it was the last of these options.
Now you’re even suggesting that the players here are being malignant, and no you don’t have insights in the inner workings.
We can speculate all we want about these things, but we don’t even know whether this is related to AI, other than “it happened recently and AI also happened recently”. Yes, it’s plausible, but making any claims more specific inner workings of Amazon and what mistakes were made, never mind suggesting what they should have done otherwise, is just reaching and needs a huge disclaimer.
> The title seems misleading since we don't even know it's AI.
We don’t even know whether it”s related to webcomics either. All we know is that OP’s account got banned a while ago, and now also another person that is into webcomics, and that’s literally about all the facts we have. The rest of the article is pure speculation.
I can say I have seen a lot of cases where someone who was flagrantly guilty of abuse complained loudly that their account at some big tech company was unfairly canceled. I cannot say that's what is going on here, and I can also say I've seen plenty of cases where it was unfair and there was no due process.
Whether it's AI or not, the process is flawed and not giving their customers an avenue for having a human review their case is just a shitty business practice. The article is just a warning to not put all our eggs in Amazon's basket.
You are absolutely correct in everything you say. However, assumptions are usually correct, and past behavior is a good indication for present behavior.
This is roughly how I use it as well, except I made it a little bit more proactive and actively chases me whether it should track certain things in OmniFocus and chases me about completing overdue tasks or other things I would otherwise forget.
I keep data collection out of the LLMs. I have separate scripts that push and pull data from external sources. So I don’t need to provide the LLM with auth keys, and the things it can do is very limited.
I find it fascinating that after all this time reporters still don’t even bother to proofread content for obvious AI tells. I guess nobody really cares anymore?
No it does not. This is about as “edge” as AI gets.
In a general sense, edge just means moving the computation to the user, rather than in a central cloud (although the two aren’t mutually exclusive, eg Cloudflare Workers)
Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality.
There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important.
https://arxiv.org/abs/2412.13171
reply