We were doing some early testing on this distributed system, and process A kept ...

We were doing some early testing on this distributed system, and process A kept backing up well under it's nominal load. A had no allowance for shedding excess load (it was broadcasting high frequency safety-critical data), and the network buffer backed up under the shitty messaging layer. It turned out that process B[0..n] couldn't pull the messages off the wire quickly enough because process C was blasting some other data to B at about 1000 times the nominal loading, filling up B's VM and kicking off the (improperly-tuned) garbage collector for 2-second intervals -- it ate the processor time needed to handle the load. Total death spiral.

Needless to say we ended up with more robust load management code and tuned the output of some processes.