Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1GB isn't exactly "Big Data". I'd expect most truly Big Data tasks to be more I/O bound than computation bound -- at least if your "computation" consists of text parsing and hash table lookups.

That said, it's interesting that mawk is so fast.



Depends. If you do a naive Ruby implementation, then you'll be CPU-bound quite quickly.

  #!/usr/bin/env ruby
  while line = STDIN.gets
    puts line.split(/\s+/).first
  end
This pegs my CPU at only 2MB/s, well below the IO capabilities of any modern system. I guess the tool you're using matters, which I think was the original point.


I agree, 1GB is small. I do similar processing on more 100GB-ish tasks.

But I should say, this is a large enough dataset size that loading it all into memory is sometimes infeasible, at least in interactive interpreted environments like Python. That's an important boundary point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: