*"In recent months, we’ve seen further improvements to the state of the art in R...

yorwba · on Sept 7, 2017

The Transformer network is solving a different problem: translating a given sentence into another with the same meaning.

The problem discussed here is about completing the next word in a partial sentence, where AFAIK some variety of RNN is still best. It might be possible to adapt the Transformer architecture to that task, but that would make it a different model.