I think you can obtain the same variance benefits if you delay the backwards shift until you insert into a tombstone bucket. In that case you backwards shift until the "new" element has a higher DIB than all of the "promoted" ones, or an empty bucket is reached, or a DIB zero bucket is reached. Of course you're not really saving much actual computation this way (you do save a little, since you avoid situations where an element is back-shifted and immediately front-shifted), just moving it around.
(I'm not sure why fast deletions are necessary in the first place?)
(I'm not sure why fast deletions are necessary in the first place?)