I mean if your expected use case is "call an API and get an immediate response of the full text in under 200ms so a user interface doesn't have to make a user wait" then yea GPT4 is crazy slow. Personally I would prefer a more async thing, let me just send a message on some platform, get back to me when you have a good answer instead of making me sit watching words load one by one like I'm on a 9600 baud modem.
Also it's a text generation algo, not a mob boss. "how powerful it is" foh
People expect to wait a few seconds when calling LLMs. Just make it obvious to users. Our GPT-4 powered app has several thousand paying users and very very rarely is "slowness" a complaint.
"instead of making me sit watching words load one by one"
Huh? This is completely up to you on how you implement your application. Streaming mode isn't even on by default.
Also it's a text generation algo, not a mob boss. "how powerful it is" foh