Hacker Newsnew | past | comments | ask | show | jobs | submit | grungegun's commentslogin

In a sense, I spend most of my time on material after quals because it's so much harder to understand.


Does anyone know if this works on vanilla deep networks? These quantization articles always seem to target LLM's which leads me to wonder if there's something special about the LLM architecture vs a vanilla deep architecture.


Transformer LLMs are just a bunch of MLPs (linear layers) where you sometimes multiply/softmax the output in a funny way (attention). In other words, they're arguably more "vanilla deep net" than most architectures (e.g., conv nets).

(There are also positional/token embeddings and normalization but those are a tiny minority of the parameters)


So there's no performance gain for quantization enabled by the transformer architecture? It seems very strange that quantization works so well since in most of my experiments, the internal model weights of mlps look random.


Ok, but what does a perceptron look like in 1-bit? Would it be just some simple logic gate, like an OR-gate?


Not my area of expertise but I'd assume it becomes a decision tree or something.

Edit: lol https://news.ycombinator.com/item?id=39868508


LLMs have been trending towards obscenely large number of parameters (314B for grok), which makes quantization crucial if you want to run them without a Meta-sized budget.


Certainly does, people have been doing this in computer vision for years.


I taught one of my younger brothers a simple sign language when I was younger. His speech was delayed by like 8 months compared to the rest of us. Can't say for sure, but there's a need to emphasize spoken communication with the sign language.


This is somewhat overblown, and only really an issue for some particular theorems and obscure corners of mathematics. And this isn't a recent problem, Hilbert's proof of Nullenschtatz in the early 20th century had logical errors, but is otherwise true.

If a proof is used by 3 people it won't have much scrutiny, but once major results start to be based on the proof, it'll get reviewed more carefully and either accepted or rejected in the long run. The ABC conjecture is probably the biggest example.


the infinite density is actually after a finite amount of time, since black hole collapse has already happened, which is why it matters that they are predicted to have infinitely dense points at their centers. The key is that this isn't a limit, GR seems to expect that there are infinitely dense objects just floating around breaking space-time.


What does this mean? Isn't the quality of a model determined by how easy it is to get a good picture?


Not necessarily. IMO a good model needs to follow your prompt well, and that was my problem with Stable Diffusion.

I've been trying to get a good portrait picture with "neon lights" on Stable Diffusion and it is almost impossible. Meanwhile with the new Dall-e, that was possible. The picture specially with SDXL is good, but it doesn't really have neon lights...

I tried now similar prompt on deepfloyd and managed to get there!


would be interesting if you could used deepfloyd first for image composition, then apply stable diffusion after for purely stylistic modifications


Definitely possible :) I've been doing this with new Dall-e + img2img with Stable Diffusion.

Explaining: I created a model of me, and wanted to create some good realistic portrait pictures. First I tried to create a model of me using some of the custom models already exist and the result was bad.

Then I tried SD 1.5/2.1... It was better, but couldn't really get some of the prompts make real...

Then I tried new Dall-e, saved, and inserted my face with img2img on SD and it worked much better!


Loved this book.

When I first started programing, I was convinced that text applications weren't real programs, and the visual aspect of the book pulled me in to be a life-time programmer :)


For diversity, Dalle 2 has a random chance of injecting "women" or "black" after a prompt. When this happens, at least for me, it generally destroyed the quality of the images. Probably "King" was identified as a gendered word. You can find some discussion of this on the subreddit r/dalle2. Sometimes the images are quite poor, but in this case, openAI is doing additional tampering.

A twitter user figured out which words they were using by generating a lot of images with the starting prompt "A sign being held that says "


I'd argue that the incredibly fast rise in quality over the last year alone is what most people are interested in. DALLE and GPT-2 were always able to make heaps of trash. The trajectory of the quality of DALLE-N and GPT-N is what interests me in terms of the AI internet...


The differential equations which give rise to physical chaotic systems operate on quantum states interacting with each other at the atomic level, so there has to be a link of some kind.


I don’t understand your comment. Differential equations don’t give rise to physical systems.

Some nonlinear DEs exhibit chaos. That’s a purely mathematical property. Whether any particular DE is a useful model of a particular physical system is a matter for the imagination, and either backed up or refuted by experiment.


> Differential equations don’t give rise to physical systems.

Sure, but differential equations describe physical systems, and there is a canonical way to derive a quantum differential equation from a classical one by quantifying the classical Lagrangian using the Path Integral formulation. Giving a fairly natural distinction between the types of equations

> Whether any particular DE is a useful model of a particular physical system is a matter for the imagination, and either backed up or refuted by experiment.

This doesn't make sense to me. The Navier-Stokes equations are known to describe the classical behavior of water and are experimentally confirmed to predict things like trajectory. Their effectiveness has nothing to do with my imagination. If I write x=x' for the position vector of atoms in a fluid that will completely fail to describe anything physical.


The NS equations are a good description of approximately Newtonian fluids like water within certain regimes (nonrelativistic velocities, larger than atomic scales, far above 0°K, etc.). “Imagination” was not the best choice of word on my part. I meant that DEs are mathematical objects; their connection to physical systems is made by the scientist, not inherent in themselves. Whether the scientist guessed right is determined by experiment. The first people to suggest that the NS equations were a good description of Newtonian fluids had a model for fluid behavior in their imaginations. We know it was a good model because of experiment. But even if the NS equations described nothing in nature, their solutions, chaotic and otherwise, would have whatever properties they have.

Note that there is no Largrangian for the NS equations, by the way.


Yes, the quantumness of a differential equation is not a property of the differential equation, but a statement about one possible taxonomy of differential equation. Then, whether quantum-type diff eq's have unique properties pertaining to chaos conditioned on our knowledge of them being labelled 'quantum' is an interesting mathematical question.

> Note that there is no Lagrangian for the NS equations, by the way.

I don't know much about fluid dynamics, but I was under the impression that Bennett derives the Lagrangian form in the book Lagrangian Fluid Dynamics


There is a clash of terminology: the Lagrangian formulation of fluid dynamics follows the path of fluid particles, in contrast to the Eulerian form, which observes the fluid passing by a fixed coordinate system. In general dissipative systems don’t have time-independent Lagrangians.


yep, that sounds right to me. also I see you're the author of the Noether article from Ars Technica. I enjoyed the read.



I’m glad, thanks!


Good discussion, thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: