Yes, probably you are right: a load that finds a larger value is equivalent to a...

Yes, probably you are right: a load that finds a larger value is equivalent to a max. As the max wouldn't store any value in this case, also it wouldn't introduce any synchronization edge.

A load that finds a smaller value is trickier to analyze, but i think you are just free to ignore it and just proceed with the atomic max. An underlying LL/SC loop to implement a max operation might spuriously fail anyway.

edit: here is another argument in favour: if your only atomic RMW is a cas, to implement X.atomic_max(new) you would:

  1: expected <- X 
  2: if new < expected: done
  3: else if X.cas(expected, y): done
     else goto 2 # expected implicitly refreshed

So a cas loop would naturally implement the same optimization (unless it starts with a random expected), so the race is benign.