I haven't been following this stuff too closely, but have there been any more findings on what "went wrong" with Sydney initially? Like, I thought it was just a wrapper on GPT (was it 3.5?), but maybe Microsoft took the "raw" GPT weights and did their own alignment? Or why did Sydney seem so creepy sometimes compared to ChatGPT?
I think what happened is Microsoft got the raw GPT3.5 weights, based on the training set. However for ChatGPT OpenAI had done a lot of additional training to create the 'assistant' personality, using a combination of human and model based response evaluation training.
Microsoft wanted to catch up quickly so instead of training the LLM itself, they relied on prompt engineering. This involved pre-loading each session with a few dozen rules about it's behaviour as 'secret' prefaces to the user prompt text. We know this because some users managed to get it to tell them the prompt text.