For text adventures an important kind of reasoning is Inferring Authorial Intent. Or maybe Seeing Chekhov's Gun. Or Learning The Metagame.
The game is deliberately solvable, and elements are introduced to that end. Inferring that is important to any solution. By using minimal scaffolding you are testing things like "does the LLM understand the patterns of text adventures, is it able to infer a metagame" and so on. If you tested different kinds of scaffolding I think you could tease apart some of these different kinds of reasoning. That is, distinguish between (a) does it understand text adventures, and (b) understanding text adventures, can they be solved?
The game is deliberately solvable, and elements are introduced to that end. Inferring that is important to any solution. By using minimal scaffolding you are testing things like "does the LLM understand the patterns of text adventures, is it able to infer a metagame" and so on. If you tested different kinds of scaffolding I think you could tease apart some of these different kinds of reasoning. That is, distinguish between (a) does it understand text adventures, and (b) understanding text adventures, can they be solved?
I did play around with more prompting and some statefulness: https://github.com/ianb/tale-suite/blob/main/agents/llm_prom...
It wasn't that successful, but I think it could do much better, I just had to stop myself from working on it more because of other priorities.