Since the release of the "CogAgent" visual language model by scientists from Tsinghua University end of 2023, more and more general GUI agents have been showing up, including those by MS. Check out
https://scholar.google.de/scholar?cites=11749002511260467707 f.e. or current publications by these authors..
I think I saw a framework designed to make RPA easier for LLMs by identifying all of the HI elements or fields with a number and allowing them to do entry by reference to the number or something. Can't remember what it was called.