I had it starred! It looks like a nice set of tools for building a multi-modal AI app - I'll give it a try when I flesh out a discord AI gaming app I was working on. Is the multi-modal aspect (image, audio, language) the main focus? Maybe putting that a bit higher in the readme would help it stick - the intro section was a bit too dense and I ended up skimming it.
As for traction, I wonder if there just isn't much interest in AI with JS/TS right now, for whatever reason?
Yes, good point, JS/TS is definitely behind Python. That might explain some of it.
I expect most models to become multi modal in the future and am building towards. A lot of the core logic of agents will nevertheless be text based imo, so that’s a central piece, but I already added text to image and speech to text, and plan to add text to speech next.
https://github.com/lgrammel/modelfusion
It is only getting limited traction so I’m wondering if I’m missing something fundamental with the approach that I’m taking.