Analyzing a larger dataset would be neat indeed, though somewhat more challenging. Especially for the layout algorithm to produce something nice in a reasonable amount of time.
There are ~11k repos with >= 100 stars, compared to the 825 I had here (fewer after filtering for the giant component).
Fair, I'd say you might just want to have a searchable db of relationships instead of the visualization (or maybe do something google-maps-esque, where as you zoom you load relationships that you can see...).
Yeah, per the footer, color is the primary programming language as identified by GitHub. A key felt like a bit too much clutter; you can always click on a repository to open its page and note the language listed.