I didn't go into the various ways in which AioContext lock was replaced in the a...

I didn't go into the various ways in which AioContext lock was replaced in the article. You're right, sometimes new fine-grained locks weren't necessary.

When there is really only one thread accessing some data then locking isn't needed. That's what was done for the SCSI emulation layer where request processing only happens in 1 thread. Here is a new function that was introduced to schedule work on the thread that runs SCSI emulation (a rare operation that is not performance critical and allows the rest of the code to avoid locks): https://gitlab.com/qemu-project/qemu/-/blob/master/hw/scsi/s...

QEMU's IOThreads allow the user to configure the threads and get something similar to thread per core architecture. But if 1 thread becomes a bottleneck, then some form of thread synchronization is needed again even with thread per core architecture. Some problems can be parallelized and they work well with thread per core.