I did some non-scientific testing with this last week and, at least for my problem (brute-forcing RC4 keys with relatively small numbers of long-lives threads,) threads and go routines were approximately equal in terms of performance with goroutines very slightly faster (around 5% or so.)
I have another test I'm running with the same workload, but with more dynamically created threads (n keys dispatched to a pool of threads/goroutines until the key space is exhausted) but I haven't finished the threaded version for comparison yet.
Wouldn't you have exactly 1 thread per CPU for this sort of brute force embarrassingly parallel, CPU intensive computation? You should be context switch approximately never which makes coroutine vs thread moot.
The point was more to see what the overhead is for CPU-bound tasks that do need to switch, not that this is the best approach for this particular task. I just happened to have the threaded version available, so I thought it made a good point of comparison.