I had Chapter 6 open while working on this post :) I should probably cite it too. As far as I could see that book doesn't talk about kernels arising from gradient descent, but it does say many other things about GP kernels. Certainly it has the main things I used.
It's amusing how facts that look surprising and mysterious to the rest of the world are just table stakes at the right sort of math department. As a researcher I feel the pressure to make things that are "my own", but there's so much that already exists just waiting to be grokked and plugged in!
Feels the same in linear algebra where you may be thinking you're building new stuff (e.g. specific trick to fit your specific kind of complex covariance toeplitz matrix parallel and vectorized batch-inversion problem with very little fast shared memory) but it turns out all the parts are already explored in ahard-to-parse textbooks somewhere).