We basically expect deep neural networks to acquire language without a body that it can use to interact with an environment. I believe you cannot separate language from the body or the environment. A corpus of text is simply not holistic enough to capture the meaning behind the symbols.