1GB isn't exactly "Big Data". I'd expect most truly Big Data tasks to be more I/...

fizx · on Sept 10, 2009

Depends. If you do a naive Ruby implementation, then you'll be CPU-bound quite quickly.

  #!/usr/bin/env ruby
  while line = STDIN.gets
    puts line.split(/\s+/).first
  end

This pegs my CPU at only 2MB/s, well below the IO capabilities of any modern system. I guess the tool you're using matters, which I think was the original point.

brendano · on Sept 10, 2009

I agree, 1GB is small. I do similar processing on more 100GB-ish tasks.

But I should say, this is a large enough dataset size that loading it all into memory is sometimes infeasible, at least in interactive interpreted environments like Python. That's an important boundary point.