I think you need to understand your audience better. Remember that most people have a very shallow understanding of DL systems. You got blog posts that talk about attention from a mathematical perspective but don't even hint at softmax tempering and just mention that there's a dot product. Or where ML researchers don't know things like why doubling batch size doesn't cut training in half or using fp16 doesn't cut memory in half. So I think toning down the language, being clearer and adding links will help you be more successful in communicating.
Please let me know what's not clear. Still no takers on my ML is CV comment below.
Research on on-device ASR and computer vision is primarily driven by the same organizations that stand to benefit the most from it. It's nearly impossible to talk about machine learning today without talking about computer vision. Just look at the daily papers from Hugging Face or any other outlet. Machine learning is basically synonymous with computer vision and natural language processing is basically synonymous with information retrieval. Research is corporate research. With very few exceptions.
Two popular lines of current ML research, multimodal and this latest neuro-symbolic re-hash, are not about furthering our understanding of what we currently can and cannot do. Not about doing science. They are about maintaining the status quo. They are about short-form video content and Google Knowledge Panels.
Multimodal ML research is a vision-first enterprise. This doesn't make sense for a number of reasons. Here are two: the latent space of images and the representations therein cannot adequately capture the nuances and expressivity of language; language is a more fundamental cognitive process than vision.
And what do you get for language in current ML research? How about a ``neuro-symbolic semantic parser'' that is neither a semantic parsers or symbolic. lol https://arxiv.org/abs/2402.00854v1 What's it good for? Computation graphs, domain adaptation, Google Knowledge Panels.
This is a directed attack on machine learning research taking concepts that could be pursued with scientific merit, such as multimodal perception and neuro-symbolic parsing, but are instead turned into marketing hype and leveraged by the powers that be for the things that keep them in power. My audience is anyone participating in this research.
Y Combinator... that's the mob of onewheels and URB-E scooters dodging human waste in the Tenderloin on their commute from Nob Hill to the Mission, right? Maybe you are referring to another audience.
I think it would help if you link "NeSy computation engine". I'm actually not familiar with this (not in the symbolic world, but interested. Just never had time, so if you got links here I'd personally appreciate it). I can find the workshop but not the engine. Maybe bad google-fu
>> Domain adaptation across verticals
This is also a bit vague and so I'm not sure what you _specifically_ mean.
> ... onto the latent space of images gets you crude semantic attributes sometimes... These LVMs aren't going to cut it.
I'm with you here, but it is controversial and most people don't understand what a latent vector machine is or the alternatives. Remember people think you can easily measure distances with L1,L2, or cosine and that these metrics are well defined in R^{3x256x256} (or just fucking R^10). So they use t-SNE and UMAPs to look at latent spaces for smooth semantic properties. I think the problem is that math is taught by a game of telephone. Really all research is. Choices were made for historical reasons but when an assumption isn't removed after sufficient time it no longer becomes a well known assumption. I mean we can mention manifolds too. Or even probability distributions or i.i.d. Reminds me that I should update my lecture slides to make these things clearer lol.
But that said, I still think vectors can do a lot. Especially since vectors and functions are interchangeable representations. Though I think we need to do a lot more to ensure that networks are capable of learning things like equivariance and importantly abstract concepts. I don't see how current systems could calculate something like an ideal of a ring. But maybe someone has some formulation.
I'm also with you in the complexity aspect. I find it silly when a Sr Director is trying to convince people Sora is learning physics while showing videos where a glass empties its contents, then spills, and neither shatters nor plastically deforms but liquefies (https://twitter.com/DrJimFan/status/1758549500585808071). I'm not sure these people understand what a physics model is nor a world model since there is no coherence. I mean look, we're dealing with people who think the stacking example proves a world model but don't understand how failing on a simple counter example disproves such notions. You're right that there isn't enough subtly and care to understand how information leakage happens and how a lot of prompting techniques or followups give away the answer rather than tease one out.
> ML research isn't meant to further your understanding of anything. You can't separate it from corporate interest and land grabbing. It's the same thing re-hashed every year by the same people. NLP is pretty much a subfield of IR at this point.
I think a bit too exaggerated but hey, I've been known to say that ML research is captured by industry and we're railroading everything. And that it is silly we publish papers on GPT when we don't have the models in hand as it just becomes free work for OpenAI and we can't verify the works because OAI will change things. But I also don't know what you mean by "IR". I'm more on the CV side though, but like above, ehh like there's a big difference.
> Still no takers on my ML is CV comment below.
Honestly I don't know what you mean by this. But if you are saying that the divide we create like NLP vs CV is dumb, then I'm all with you. I also think it's silly how we call generative models. Aren't all models generative? Yann talking about JEPAs does not give me anything to go on. But then again, no one has a definition for generative model and it doesn't seem like anyone cares to. Well, at least one that would be consistent and include GANs, VAEs, NFs, Diffusion, and EMBs.
> My audience is anyone participating in this research.
That includes me, and even I have a hard time parsing what you're saying and it doesn't help with the side snipes like URB-E scooters. I have no idea what that even is. I definitely get the feeling of gaslighting and railroading. But I've just come to accept the fact that people drank the kool aid. I think people like Jim are true believers and really do believe that they are right. So it doesn't help to talk like this. You gotta meet them at their level. The scaling people will lose out and we're just gonna have to be patient. My take is if I'm wrong, so what, give Sam his $7T, we get AGI and we win. He's going to get his opportunity to scale no matter how much funding we can get into alternative views. But if I'm right and you need more than scale, then we better keep working because I'd rather not have another AI winter. I also think it is quite odd for these companies to not be hedging their bets a little and more strongly funding other avenues. Especially those that are not already the biggest of the biggest, because where you gonna get 500 racks of H100s to compete?
At this point, all I'm trying to get people around me in ML to understand is how nuance matters. That alone is a difficult battle. I'm just told to throw compute at a problem and data with no concern to the quality of that data. It does not matter how much proof I generate to show that a model is overfit, as long as the validation loss doesn't diverge, they don't believe me. ¯\_(ツ)_/¯
>>> I think it would help if you link "NeSy computation engine". I'm actually not familiar with this (not in the symbolic world, but interested. Just never had time, so if you got links here I'd personally appreciate it). I can find the workshop but not the engine.
I linked to it in my previous comment. I'm referring to the ``NeSy computation engine'' described here. I didn't know there was a ``NeSy'' workshop and this paper was my first encounter with the term.
I think it's interesting that you mention symbolic world like it is separate from some other world. There's the AI that was and the AI that is today. There's the AI over there and the AI over here. Whenever you hear someone mention symbolic in the context of AI go ahead and grab a chair because immediately after this they are going to talk about cyc and John McCarthy for at least 20 min. If you're lucky they might throw some Prolog in there.
I don't think this is productive and I don't think there is another symbolic world. There is just the world. There are certain things in the world for which a numeric, directional representation makes sense. There are other things for which it makes no sense at all. It's my view that primitives in language are one of these things. Additionally, there are certain places where it makes sense to consider these representational approaches and other places where it only makes political sense. Lastly, there are symbols - atomic primitives - and there are ``symbols,'' objects with vectors in them and who knows what else.
What's striking to me about this paper is the coverage of formal grammars and semantic parsing entirely within the context of domain adaptation. Definitely the best part is the coverage of compositionality (https://ncatlab.org/nlab/show/compositionality) in the context of composing computational graphs. This is striking to me because all of these things (except domain adaptation) are essential to any reasonable theory of meaning but they are covered as if they've been repurposed for the practical application of populating Google Knowledge Panels, which I believe is exactly what happened. Check out the definitions of semantic parser and symbol.
>> Domain adaptation across verticals
>>> This is also a bit vague and so I'm not sure what you _specifically_ mean.
Crude semantic attributes pulled from character sequences and mapped onto the latent space of images have utility in business contexts if the mapping for some term sufficiently distinguishes it from the mapping of another term that has the same surface form. It ends there. GloVe was a half-baked representation of meaning in language when it was adapted from word2vec in 2014. GPT-2 grabbed the torch in 2019. It still doesn't work. Well, it sometimes works for adapting a general model to a specific domain such as a business vertical, but only in a crude and superficial way. Note that almost no ML research today discusses this representational issue at all, and that almost all ML research takes this representation as a starting point. If you decide to publish hyperparameters in your paper, such as in an appendix, hyperparameters related to vocab size and the dimensionality of your embedding space often aren't even worth mentioning. That's fine, I guess, because they don't mean anything anyway, but not talking about this, in my view, is not fine.
Check out the Mamba paper for example. Like most of ML research today the focus is on optimization. The representation problem has been solved so there's no need to talk about it: we map everything onto the latent space of images because short-form video content rules the day and that's how dude is gonna hit his 7T: advertising ([link redacted]).
>>> I think the problem is that math is taught by a game of telephone.
I think that, for language, the ML research community is, by and large, not even using the right maths.
>>> But that said, I still think vectors can do a lot. Especially since vectors and functions are interchangeable representations. Though I think we need to do a lot more to ensure that networks are capable of learning things like equivariance and importantly abstract concepts.
Thank you so much for highlighting the important of equivariance. I think this is a crucial concept for work at the cross-modal interfaces, especially in the context of the Curry-Howard correspondence, or, more recently, the Curry-Howard-Lambek correspondence. Right now the ML (CV) research community is labeling nouns with bounding boxes... lol. If that doesn't illustrate the fact that multimodal work is a vision-first enterprise I don't know what will.
>>> I think a bit too exaggerated but hey, I've been known to say that ML research is captured by industry and we're railroading everything. And that it is silly we publish papers on GPT when we don't have the models in hand as it just becomes free work for OpenAI and we can't verify the works because OAI will change things.
Check out the evaluation criteria in that ``NeSy'' paper, especially the metric that's supposed to tell you something about what the system was designed to do. I'm sure OpenAI is happy to have this info about their system.
>>> But I also don't know what you mean by "IR".
Ten years ago I considered NLP adjacent to information retrieval. Today I consider it part of information retrieval. There's very little work published today that suggests otherwise.
>>> Honestly I don't know what you mean by this. But if you are saying that the divide we create like NLP vs CV is dumb, then I'm all with you.
It is not my intention at all to create or highlight any divide. If there is indeed a known divide between CV and NLP I don't know anything about it, I don't want to know anything about it and it's not surprising.
>>> I also think it's silly how we call generative models. Aren't all models generative?
Generative refers to a situation where you begin with a finite set of things and productively form any number of well-formed expressions from these things.
>>> That includes me, and even I have a hard time parsing what you're saying and it doesn't help with the side snipes like URB-E scooters.
I'll take potshots at the Paul Grahams and Steve Jobs of the world every day and not lose any sleep over it. If they take their AirPods out of their ears maybe they'll hear me coming.
>>> But if I'm right and you need more than scale, then we better keep working because I'd rather not have another AI winter.
All I have to say about scaling is that, for language, I hope it's clear by now that more data and more params is not going to improve the situation. I can see how this is almost never the case for vision.
Damn it somebody said AI winter again. You aren't going to start talking about cyc and McCarthy for 20 min now are you?
>>> I also think it is quite odd for these companies to not be hedging their bets a little and more strongly funding other avenues.
The formula works.
>>> At this point, all I'm trying to get people around me in ML to understand is how nuance matters. That alone is a difficult battle. I'm just told to throw compute at a problem and data with no concern to the quality of that data. It does not matter how much proof I generate to show that a model is overfit, as long as the validation loss doesn't diverge, they don't believe me.
I'm interested in learning more about what you mean by nuance.
Probably just needs more compute and data. Just throw some synthetic data in there and call it.