Yes, probably you are right: a load that finds a larger value is equivalent to a max. As the max wouldn't store any value in this case, also it wouldn't introduce any synchronization edge.
A load that finds a smaller value is trickier to analyze, but i think you are just free to ignore it and just proceed with the atomic max. An underlying LL/SC loop to implement a max operation might spuriously fail anyway.
edit: here is another argument in favour: if your only atomic RMW is a cas, to implement X.atomic_max(new) you would:
1: expected <- X
2: if new < expected: done
3: else if X.cas(expected, y): done
else goto 2 # expected implicitly refreshed
So a cas loop would naturally implement the same optimization (unless it starts with a random expected), so the race is benign.
A load that finds a smaller value is trickier to analyze, but i think you are just free to ignore it and just proceed with the atomic max. An underlying LL/SC loop to implement a max operation might spuriously fail anyway.
edit: here is another argument in favour: if your only atomic RMW is a cas, to implement X.atomic_max(new) you would:
So a cas loop would naturally implement the same optimization (unless it starts with a random expected), so the race is benign.