Transformers are exciting as they seem to work on all types of modalities, inclu...

robbedpeter · on Nov 26, 2021

The hierarchical transformers variants are uncovering some possible optimizations that are similar to the ideas of Thousand Brains - https://arxiv.org/abs/2110.13711

The attention mechanisms, in conjunction with autoencoding, create a rough approximation of what grid cells accomplish, but transformers are still a feedforward architecture. Thanks to Moore's law, we can expand the scale of inputs to achieve human like performance, but until someone untangles the structure and devises a way of including recurrence, transformers won't be able to perform all of the functions assumed by Hawkins.

There are interesting lstm variations on transformers, but nothing public yet that really performs at the same level as the straight feedforward models. Combinatorial explosion is a bitch and lstm explodes the size and compute requirements. Hierarchical structures could constrain the requirements to something achievable.

With recurrence, you can begin to train models to perform things like discrete mathematics, as opposed to the relatively shallow semantic graphs in gpt-3 like models. The models right now don't have anything stateful that could be called memory, but with recurrence, model states will be dynamic encodings that can be processed over many cycles.

mountainriver · on Nov 25, 2021

They would say no, they believe it is more like a graph probabilistic structure