The Alexa device itself is rather cheap and the running a relative huge model on 'edge' device is still fairly expensive and remains an open question. Again it falls into economic side of things, it makes sense to adapt a layered model in this case, to delay more expensive processing to the server side.