Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are there some well-known examples of success in it?


Vision transformers effectively encode a grid of pixel patches. It’s ultimately a matter of ensuring the position encoding incorporates both X and Y and position.

For LLMs we only have one axis of position and - more importantly - the vast majority of training data only is oriented in this way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: