I agree that this is a possible way. The main difficulty of the generated data is related to their quality and structure. Namely, how artificial data correspond (quantitatively and qualitatively) to real data.
Random data may give incorrect results when optimizing a query.
With something like this https://www.getsynth.com/docs/blog/2021/03/09/postgres-data-... (disclaimer: no affiliation with them, I've not used their product but it appears to be fully open source)