1GB isn't exactly "Big Data". I'd expect most truly Big Data tasks to be more I/O bound than computation bound -- at least if your "computation" consists of text parsing and hash table lookups.
Depends. If you do a naive Ruby implementation, then you'll be CPU-bound quite quickly.
#!/usr/bin/env ruby
while line = STDIN.gets
puts line.split(/\s+/).first
end
This pegs my CPU at only 2MB/s, well below the IO capabilities of any modern system. I guess the tool you're using matters, which I think was the original point.
I agree, 1GB is small. I do similar processing on more 100GB-ish tasks.
But I should say, this is a large enough dataset size that loading it all into memory is sometimes infeasible, at least in interactive interpreted environments like Python. That's an important boundary point.
That said, it's interesting that mawk is so fast.