I've also noticed Sonnet starting to degrade. It's developing some of the behaviours that put me off the competition in the first place. Needless explanations, filler in responses, wanting to put everything in lists, even increased sycophancy.
Major AI companies are not doing nearly enough to address the sycophancy problem.
I get that it's not an easy problem to solve, but how is Anthropic supposed to solve the actual alignment problem if they can't even stop their production LLMs from glazing the user all the time? And OpenAI is somehow even worse.
I feel like this is just related to my projects getting bigger. Claude Code is trying to keep up with my project evolving from 2k lines of code to 100k lines. Of course it’s going to feel worse.
I think it is how our expectations of the latest model change over time.
I expect to be completely blown away by GPT-5 in the first few days and then over time I will figure out the limitations of the model. Then I will be less impressed because you don't know what it can't do at first.
Other than it starting out trying to produce a full and complete web app (or whatever) for my daily yak shaving session instead of the normal "let's talk about and work through this thing" the new Opus 4.1 seems to 'get it' a lot quicker than the old daffy robot did. It asked pertinent questions to understand the system we are working on and accomplished the goal of updating the design document so I don't have to keep explaining details at the start of every chat session. Something, by the way, it always previously failed to do causing me to have to explain stuff each and every time before forward progress could be made.
I do agree it did hit the token limit a lot quicker than before where I could chat for hours without worrying about it.
Either way, still have one last yak to shave for this project so we'll see how efficient it is with that. If it accomplishes the task before burning through all the tokens then win, win, I suppose.
At least Sonnet 4 is still usable, but I'll be honest, it's been producing worse and worse slob all day.
I've basically wasted the morning on Claude Code when I should've just been doing it all myself.