I'm keeping up with this pretty well, and it's very far away from being practical and controllable enough to get the actual job done. Txt2img is largely irrelevant for the real work, generation from text alone isn't going to work well beyond toy and novelty purposes.
I think you’re missing out a little on how fast this is improving and what exactly is possible already, especially if you have a powerful graphics card. Stable Diffusion has barely been out for three months and I am training my own custom models from my desktop for generating photos featuring my friends which would fool their families.
At first yes, faces and hands weren’t perfect but in the recent models and especially with model blending I can now generate super realistic photos of humans with perfect faces and eyes and perfect hands. Basically what is possible and the quality is rocketing forward right now and I can’t even begin to imagine where it will be in a year.
I would also add, the advances in prompt engineering, in-painting and image to image generation mean getting exactly the result you want with the composition you want is also very possible. If you’ve only generated a few images then you really have no idea of all of the tools and options you have. Over the last few months I’ve probably made 5000-10000 images and my own personal skill level for being able to get what I want has gone up massively.