GPT4 being a mixture of experts is irrelevant imo like we don't care about how many layers there are in a network and how wide those layers are or which type of activation functions are actually used etc. all that matters are we can run it on a specific hardware and the results.
WizardCoder 34B and Phind 34B are the only models remotely comparable, and they are still slightly worse than GPT 3.5 (let alone GPT4).