More

sipjca · 2026-03-27T03:47:50 1774583270

I don’t think it’s about literally shrinking the models via quantization, but rather training smaller/more efficient models from scratch

Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down

HerbManic · 2026-03-27T05:19:46 1774588786

I have said for a while that we need a sort of big-little-big model situation.

The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.

Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.

But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.

root_axis · 2026-03-27T13:54:37 1774619677

There are no practically useful small models, including Qwen 3.5. Yes, the small models of today are a lot more interesting than the small models of 2 years ago, but they remain broadly incoherent beyond demos and tinkering.

CamperBob2 · 2026-03-28T23:20:14 1774740014

I don't think you can make that case for 35b and up, including the 27B dense model. A hypothetical Mac Studio with 512 GB and an M5 Ultra would be able to run the full Qwen 3.5 397B model at a decent speed, which is more like 12 months behind the current SoTA.

A lot of people got a bad first impression about the 3.5 models for a few different reasons. Llama.cpp wasn't able to run them optimally, tool calling was broken, the sampling parameters weren't documented completely, and some poor-quality quants got released. Now that these have all been addressed, they are serious models capable of doing serious business on reasonably-accessible hardware.

kyboren · 2026-03-27T05:14:42 1774588482

Yes, but bigger models are still more capable. Models shrinking (iso-performance) just means that people will train and use more capable models with a longer context.

sipjca · 2026-03-27T07:35:44 1774596944

Of course they are! Both are important and will be around and used for different reasons

sipjca · 2026-03-13T09:41:42 1773394902

This is an argument, but it’s also fundamentally comparing a computer that works out of the box to one that doesn’t.

Spivak · 2026-03-13T19:04:13 1773428653

I really don't get this comment section. You get a Macbook then you have a perfectly usable machine which will run all the mainstream software you ask of it, and then you get natively compiled well supported developer tooling, no VM required. The best argument for Chromebooks is that you can throw away ChromeOS and install Linux or use Linux in a VM. These are not even close to the same.

I think folks want to hate Apple more than they want to admit that Chromebooks kinda suck.

sipjca · 2026-03-05T02:43:23 1772678603

sipjca · 2026-03-05T02:32:30 1772677950

You literally just compared a laptop running arch to a mac. You’re not the target audience lmao

nickjj · 2026-03-05T12:27:11 1772713631

The laptop ships with Windows 11 but its parts are compatible with Linux too.

That is an important note though, the price includes a valid Windows 11 license.

sipjca · 2026-03-06T00:16:13 1772756173

That wasn’t the point. You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac. There’s always some other computer you can buy, but most people will just buy the Mac

nickjj · 2026-03-06T00:55:31 1772758531

> You’re a person who runs arch, that means most likely your requirements for a computer are VERY different than the target for this Mac

I do software development, video + image editing, writing and gaming. My requirements are it runs well, I can depend on it and I don't mind if it has a fan.

I only replied because the OP's comment made it seem like it's difficult to find a good laptop in the $600 range. If macOS is optional you can get quite decent specs.

sipjca · 2026-03-04T03:21:10 1772594470

Picked up a T14s in Shenzhen for ~$250 US and it’s a screamer. Best thing for sure

wraptile · 2026-03-06T11:36:34 1772796994

Paid over 2,000 usd 2 years ago for t14s and even then it was worth it, now it's almost 10x cheaper. Incredible machine though refurbs often have the lower end of configurations and with soldered ram etc it's not possible to upgrade.

This new modular thinkpad will be crazy for refurb market, I wonder how is Lenovo going to prevent that eating into their sales. This probably means that corporate bulk purchases are majority of their sales and they don't fear to canibalize themselves which makes you wonder why they switched to soldered ram - incredibly unpopular move - at all.

sipjca · 2026-02-27T23:20:01 1772234401

who was bought by openai

sipjca · 2026-02-17T05:37:36 1771306656

Great feedback :) also support for the v2 versions of the moonshine models should be out today!

sipjca · 2026-02-17T02:53:02 1771296782

There's an open PR in the repo which will be merged which adds this support. Post processing is an optional feature if you want to use it, and when using it, end to end latency can still be under 3 seconds easily

zachlatta · 2026-02-17T03:11:34 1771297894

That’s awesome! The specific thing that was causing the long latency was the image LLM call to describe the current context. I’m not sure if you’ve tested Handy’s post-processing with images or if there’s a technique to get image calls to be faster locally.

Thank you for making Handy! It looks amazing and I wish I found it before making FreeFlow.

sipjca · 2026-02-10T23:43:16 1770766996

Not currently

sipjca · 2026-02-10T09:00:29 1770714029

I'm looking into porting this into transcribe-rs so handy can use it.

The first cut will probably not be a streaming implementation

hazmazlaz · 2026-02-13T17:05:59 1771002359

Discovering Handy was a revelation - light years ahead of any other tool in this space IMO. Thank you for building it!

sipjca · 2026-02-10T10:23:35 1770719015

okay... so I cannot get this to run on my mac. maybe something with the burn kernels for quantized?

will report a GitHub issue

adefa · 2026-02-12T01:07:38 1770858458

this should be fixed