Data selection depends the use-case. Two contrasting use-cases I see are:
- Emulation
- Advisor
In case of MTG player emulation for example, I think it makes sense to group data by some rankable criteria like winrate to train rank-specific models that can mimic players of each rank.
- Emulation
- Advisor
In case of MTG player emulation for example, I think it makes sense to group data by some rankable criteria like winrate to train rank-specific models that can mimic players of each rank.