I mean if your expected use case is "call an API and get an immediate response o...

weird-eye-issue · on Jan 22, 2024

People expect to wait a few seconds when calling LLMs. Just make it obvious to users. Our GPT-4 powered app has several thousand paying users and very very rarely is "slowness" a complaint.

"instead of making me sit watching words load one by one"

Huh? This is completely up to you on how you implement your application. Streaming mode isn't even on by default.