Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ok, as far as I can tell, the real story is that Engine Yard relies on a GFS/SAN setup that doesn't scale in the unique way that Github needs.

If you think about it, Github is one of the few sites that actually directly uses the filesystem heavily. Everyone else hits scaling issues on the DB first.

  The sad thing of all of this is it's not really a matter  
  of scaling, and it never has been. Our bottleneck has 
  always been the file system. GFS just... sucks. I'm sorry, 
  but I have to say it. Case in point, your graph. The first 
  rebuild I ran timed out because of GFS. The second one ran 
  fine, took maybe a minute to process, if that. GFS impacts 
  everything... gem build failures due to cloning... GFS. 
  Network graphs taking long time to build... GFS. Caching 
  jobs not completing... GFS. I think you see where I'm 
  going here. There's no plans to deploy the new code to the 
  live servers, and I think the reason is that we're afraid 
  it'll make GFS performance worse, not better. But on the 
  new servers where we don't have to fight GFS, it's 
  amazing.


Funny thing is that we told github that gfs would not scale for them over a year ago, we also outlined how to move to a shared nothing chunk server architecture. They didn't take our advice so it's mostly their own architecture decisions that were holding them back with regards to gfs.

Anyway there seems to be plenty of airchair quarterback on this one. The real story is that we can't afford to host them for free anymore.


FWIW - thanks for hosting them for so long. GH is a wonderful service, and I'm sure EY contributed greatly to it's success.


Is it so hard to imagine that they weren't jumping at the chance to write their own custom non-filesystem storage backend?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: