Except that the participants were thrown into tasks cold, seemingly without even the most basic prep one would/should do before throwing AI at a legacy codebase (sometimes called "LLM grounding" or "LLM context bootstrapping"). If the participants started without something like this, the study was either conducted incorrectly or was designed to support a certain conclusion.
By the time all of this is written, I'm familiar enough with the code to fly over it (Hello, Emacs and Vim). But by then, your tasks are small and targeted fixes, because any new feature requires lot of planning and stakeholder discussions that you can't just go and work on it.