Paper authors (and this posts author apparently) like to throw in lots of scary-...

catgary · 2025-08-28T15:17:30 1756394250

I’m going to push back on this a bit. I think a simpler explanation (or at least one that doesn’t involve projecting one’s own insecurities onto the authors) is that the people who write these papers are generally comfortable enough with mathematics that they don’t believe anything has been obfuscated. ML is a mathematical science and many people in ML were trained as physicists or mathematicians (I’m one of them). People write things this way because it makes symbolic manipulations easier and you can keep the full expression in your head; what you’re proposing would actually make it significantly harder to verify results in papers.

Garlef · 2025-08-28T15:40:15 1756395615

Maybe.

But my experience as a mathematician tells me another part of that story.

Certain fields are much more used to consuming (and producing) visual noise in their notation!

Some fields have even superfluous parts in their definitions and keep them around out of tradition.

It's just as with code: Not everyone values writing readable code highly. Some are fine with 200 line function bodies.

And refactoring mathematics is even harder: There's no single codebase and the old papers don't disappear.

catgary · 2025-08-28T17:40:30 1756402830

Maybe! I’ve found that people usually don’t do extra work if they don’t need to. The heavy notation in differential geometry, for example, can be awfully helpful when you’re actually trying to do Lagrangian mechanics on a Riemannian manifold. And superfluous bits of a definition might be kept around because going from the minimal definition to the one that is actually useful in practice can sometimes be non-trivial, so you’ll just keep the “superfluous” definition in your head.

godelski · 2025-08-29T00:51:19 1756428679

To add to this, I'd even argue that the most "scary looking" parts of the GAN paper are where Goodfellow is just showing intermediate steps, like in (4) and (5). I guess one can argue that this is superfluous but that feels pretentious. I'd argue that the math here is helping communicate.

I think people forget why math is used. I'm always a little surprised that programmers don't see this because the languages are being used for the same reasons. Precision. They're terrible languages to communicate something like this conversation but then again English is a terrible way to communicate highly abstract concepts.

On the other hand, I've definitely seen people use math to make their works seem more important (definitely in some ML) I think I more frequently see it just being copy pasted (like every diffusion paper ever). I think that is probably superfluous, though it's definitely debatable and I'm absolutely certain these use cases aren't for flexing lol.

voidhorse · 2025-08-28T15:26:16 1756394776

Agreed. Also, fwiw, the mathematics involved in the paper are pretty simple as far as mathematical sophistication goes. Spend two to three months on one "higher level" maths course of your choosing and you'll be able to fully understand every equation in this paper relatively easily. Even a basic course in information theory coupled with some discrete maths should give you essentially all you need to comprehend the math in this post. The concepts being presented here are not mysterious and much of this math is banal. Mathematical notation can seem foreboding, but once you grasp it, you'll see, like Von Neumann said, that life is complicated but math is simple.

gcanyon · 2025-08-28T17:11:25 1756401085

> like Von Neumann said, that life is complicated but math is simple

Maybe for Von Neumann math was simple...

MattPalmer1086 · 2025-08-28T14:17:27 1756390647

Haha, recognise. I invented a fast search algorithm and worked with some academics to publish a paper on it last year.

They threw in all the complex math to the paper. I could not initially understand it at all despite inventing the damn algorithm!

Having said that, picking it apart and taking a little time with it, it actually wasn't that hard - but it sure looked scary and incomprehensible at first!

godelski · 2025-08-29T00:21:16 1756426876

I think you misunderstand what the math is for. The math is not for training the model but for understanding why the model can be formulated that way and why this training will work. It is the exact opposite of obscurification.

Think of it this way

  You don't need math to train a good model but you need math to know why your model is wrong.

It isn't about lording over others, it is that in research you care why things work just as much as that they work. The reason for this is very simple: it's fucking hard to improve things when you don't understand them. If you just have a black box then the only strategy you have available is brute force. But if you analyze things and and build knowledge, then you don't have to brute force.

Also, the idea of using a paper to signal intelligence is kinda silly. Papers aren't being written for the general public, papers are the communication between scientists. Who are they impressing? Each other? The others who are going to call them out if they write bullshit or make arguments convoluted? I don't buy that. But maybe because I'm a researcher. But I also don't think I need to use math to look smart, my PhD and publication record do a good enough job of that on their own. I don't even need it to flex to other researchers. The math in my papers is because it is just easier to communicate. I'm sure there's concepts that you find easier to understand by reading code than by using English. Same thing. Math and programming are great languages when you need high precision and when being pedantic is essential. Math is used because it is the best way to communicate, not as a flex. We flex on each other by showing how our ideas are the best. You can't do that if the other person doesn't understand you.

@staticelf and anyone else that feels that way:

That feeling is normal in the beginning. Basically your first year of a PhD is spent going "what the fuck does any of this mean?!?!" It's rough. But also normal. You're working at the bounds of human knowledge and papers are written in the context of other papers. It's hard to jump in because it is like jumping into the middle of a decades (or longer) conversation. If you didn't feel lost then the conversation probably wasn't that complicated and we'd probably have solved those problems much earlier. So you sit down and read a lot of papers to get context to that conversation.

My point is, don't put yourself down. The hill you need to climb looks steeper than it is. Unfortunately it is also hard to track your progress so you tend to feel like it's continually out of reach until it suddenly isn't. (It's also hard because everyone feels like an imposter and many are afraid to admit not knowing. But the whole job is about not knowing lol) Probably the most important skill in a PhD is persistence. I doubt you're too stupid. I'm sure you can look back and see that you've done things you or other people are really impressed with. Things that looked like giant mountains to climb but looking back don't seem so giant anymore. We'd get nowhere in life if we didn't try to do things we initially thought were too hard. Truth is you never know till you try. I'm not going to say it's easy (it isn't), but that it isn't insurmountable. You can't compare yourself against others who have years of training. Instead look at them and see that that's where this training can take you. But you can't get there if you don't try.