Perl was written partly as a replacement for awk, and as such it has command-line switches that make it more suitable than it might appear. You could get very similar behaviour with a much shorter implementation using `perl -nai~`, something like:
BEGIN { open(VOCAB, ">vocab"); }
if (!$imap{$ARGV}{$F[0]}) {
$imap{$ARGV}{$F[0]} = ++$I{$ARGV};
}
if (!$jmap{$F[1]}) {
$jmap{$F[1]} = ++$J;
print VOCAB $F[1] . "\n";
}
print "$imap{$ARGV}{$F[0]} $jmap{$F[1]} $F[2]\n"
Which apart from the BEGIN line is almost a direct translation of the awk. A lot uglier, but for one-off things that isn't much of a problem.
(And if you want to claim awk has a three-line implementation, this is four lines.)
Admittedly, it's not quite the same - instead of putting output from file1 in file1n, it renames file1 to file1~ and puts its output back in file1. If you want to change that, you have to add your own file-handling code. That would only be a few lines. And it's probably never going to be as fast as mawk.
There are other cases where I suspect perl would beat awk, but maybe get beaten by sed. Not to rain on awk's parade or anything - it's still cool. Just not that much cooler than perl. :)
aha, very nice! I was wondering how to do the awk-style structure in perl; it was unfair I didn't research it.
Maybe it's just me, but I find it much harder to read than the awk syntax, I think mostly because of the dollar signs. I think it's pretty crowded as a four liner. Awk's condition-action syntax helps a little here too.
BEGIN { open(VOCAB, ">vocab"); }
if (!$imap{$ARGV}{$F[0]}) { $imap{$ARGV}{$F[0]} = ++$I{$ARGV}; }
if (!$jmap{$F[1]}) { $jmap{$F[1]} = ++$J;print VOCAB $F[1] . "\n"; }
print "$imap{$ARGV}{$F[0]} $jmap{$F[1]} $F[2]\n"
(And if you want to claim awk has a three-line implementation, this is four lines.)
Admittedly, it's not quite the same - instead of putting output from file1 in file1n, it renames file1 to file1~ and puts its output back in file1. If you want to change that, you have to add your own file-handling code. That would only be a few lines. And it's probably never going to be as fast as mawk.
There are other cases where I suspect perl would beat awk, but maybe get beaten by sed. Not to rain on awk's parade or anything - it's still cool. Just not that much cooler than perl. :)