This seems like a really hard problem to me. I actually wish image formats like PNG could have annotations inside them saying "there a is 45px tall 34px wide letter 'a' at position ..." which would be automatically generated by image editing software and screenshot apps. It would make it possible to select text inside an image and allow screen readers to work better.