There's a fascinating example in computational studies that partly explains this: if you have people of different groups moving around randomly on a lattice, even a very slight preference (51%/49%) for being next to people in the same group as oneself ends up leading to a self-segregated pattern.
EDIT: see e.g http://ccl.northwestern.edu/netlogo/models/Segregation