I'm not gonna say it with 100% certainty, but having started using C# a few years ago i think that much of the smaller latency is simply because C# probably produces a magnitude less garbage than Java in practice (probably less than Go as well). The C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc collector(that includes software based read-barriers whilst most other collectors only use write-barriers).
Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.
But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.
When you say "C# GC's generally resemble the older Java GC's more than G1 or especially the Zgc", what do you have in mind? I'm skimming through G1 description once again and it looks quite similar to .NET's GC implementation. As for Zgc, introducing read barriers is a very strong no in .NET because it introduces additional performance overhead to all the paths that were previously free.
On allocation traffic, I doubt average code in C# allocates less than Go - the latter puts quite a lot of emphasis on plain structs, and because Go has very poor GC throughput, the only way to explain tolerable performance in the common case is that Go still allocates less. Of course this will change now that more teams adopt Go and start classic interface spam and write abstractions that box structs into interfaces to cope with inexpressive and repetition-heavy nature of Go.
Otherwise, both .NET and Java GC implementations are throughput-focused, even the ones that target few-core smaller applications, while Go GC focuses on low to moderate allocation traffic on smaller hosts with consistent performance, and regresses severely when its capacity to reclaim memory in time is exceeded. You can expect from ~4 up to ~16-32x and more (SRV GC scales linearly with cores) difference in maximum allocation throughput between Go and .NET: https://gist.github.com/neon-sunset/c6c35230e75c89a8f6592cac...
Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.
But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.