There's also a chapter in Matters Computational (http://www.jjj.de/fxt/#fxtbook) "CPU instructions often missed" that mentions them:
> Primitives for permutations of bits, see section 1.29.2 on page 81. A bit-gather and a bit-scatter instruction for sub-words of all sizes a power of 2 would allow for arbitrary permutations (see [FXT: bits/bitgather.h] and [FXT: bits/bitseparate.h] for versions working on complete words).
The document is pretty easy to navigate if you know which instructions you want to look up.
On page 488, PDEP and PEXT are Parallel Bits Deposit and Parallel Bits Extract. They are essentially scatter/gather instructions for bits.
PDEP uses a mask in the second source operand (the third operand) to
transfer/scatter contiguous low order bits in the first source operand (the second
operand) into the destination (the first operand).
PEXT uses a mask in the second source operand (the third operand) to transfer either contiguous or non-contiguous bits in the first source operand (the second operand) to contiguous low order bit positions in the destination (the first operand).