Ok. I had a little project at work, at which point, I turned to Perl to accomplish the task. The project was simple enough: take a comma-separated file, and look at a few fields to see what data is contained. Based on that data, create a “weight”, and assign the weight to the record at the end of the line.
The Perl code I used to accomplish the task is as follows (please take note, that WordPress loves to parse tags, even if contained within <code> tags. WordPress also likes to lowercase the tag, so in Perl, file handles will be in lowercase):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | #!/usr/bin/perl -w open(IN, "UNIVERSE.csv"); open(OUT, ">UNIVERSE_WEIGHTED.csv"); $counter = 0; while(<in>) # </in> (for Wordpress) { @fields = split(/,/,$_,183); $fields[182] =~ s/\n//; s/\n//; $counter++; $weight = 0; if ($counter == 1) { print OUT $_ . ",\"weight\"\n"; } else { $vtr_ppp00 = $fields[154]; $vtr_ppp04 = $fields[155]; $vtr_pri01 = $fields[161]; $vtr_pri03 = $fields[163]; $vtr_pri05 = $fields[165]; $vtr_pri99 = $fields[182]; $weight +=.5 if ($vtr_ppp00 eq "\"R\""); $weight -= 1 if ($vtr_ppp04 eq "\"D\""); $weight -= 1 if ($vtr_pri01 eq "\"D\""); $weight -= 1 if ($vtr_pri03 eq "\"D\""); $weight -= 1 if ($vtr_pri05 eq "\"D\""); $weight -= 1 if ($vtr_pri99 eq "\"D\""); $weight += 1 if ($vtr_ppp04 eq "\"R\""); $weight += 1 if ($vtr_pri01 eq "\"R\""); $weight += 1 if ($vtr_pri03 eq "\"R\""); $weight += 1 if ($vtr_pri05 eq "\"R\""); $weight += 1 if ($vtr_pri99 eq "\"R\""); print OUT $_ . ",\"" . $weight . "\"\n"; } print "Processed: " . $counter . "\r"; } print "\n"; close(IN); close(OUT); |
Just for fun, a buddy of mine mentioned to code it in Ruby, and compare results. Seeing as though I have never coded a Ruby script in my life, I was a bit worried. However, it wasn’t too bad. I just had to cure my itch to put a $ in front of all my variables. Anyway, heres the Ruby code, following as closely as possible to the Perl code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #!/usr/bin/ruby counter = 0 outfile = File.open("UNIVERSE_RUBY.csv","w") IO.foreach("UNIVERSE.csv") do |line| counter += 1 weight = 0 if (counter == 1) outfile < < line.chop + ",\"weight\"\n" else fields = line.chop.split(',') vtr_ppp00 = fields[154] vtr_ppp04 = fields[155] vtr_pri01 = fields[161] vtr_pri03 = fields[163] vtr_pri05 = fields[165] vtr_pri99 = fields[182] weight += 0.5 if (vtr_ppp00 == "\"R\"") weight -= 1 if (vtr_ppp04 == "\"D\"") weight -= 1 if (vtr_pri01 == "\"D\"") weight -= 1 if (vtr_pri03 == "\"D\"") weight -= 1 if (vtr_pri05 == "\"D\"") weight -= 1 if (vtr_pri99 == "\"D\"") weight += 1 if (vtr_ppp04 == "\"R\"") weight += 1 if (vtr_pri01 == "\"R\"") weight += 1 if (vtr_pri03 == "\"R\"") weight += 1 if (vtr_pri05 == "\"R\"") weight += 1 if (vtr_pri99 == "\"R\"") outfile < < line.chop + ",\"" + weight.to_s + "\"\n" end print "Processed: " + counter.to_s + "\r" end print "\n" |
Ok. As you can see, the code is fairly similar. The algorithms the same. Running the script takes a mere second or two, and the file comes out correct. However, I was curious about execution speed, so I decided to pit one script against the other, time them, and see what happens. Here are my results
aaron@hercules:~/Desktop$ time perl weight.pl Processed: 5394 real 0m1.386s user 0m1.304s sys 0m0.048s aaron@hercules:~/Desktop$ time ruby weight.rb Processed: 5394 real 0m2.180s user 0m1.992s sys 0m0.124s
Am I reading this correctly? Perl is almost 60% faster at execution with this code than Ruby? I thought Ruby was supposed to have exceptional file handling. Better than Perl, even. However, I have also heard that the Ruby devs are more concerned about functionality than speed, which should be expected. Still, that’s a serious speed factor. If I was worried about speed here, Perl, in this case, would win out.
At any rate, this was a fun little exercise to stretch my scripting muscles, and to learn a bit of Ruby. I’m curious if I can make the scripts more efficient. If you know how, comment below, or contact me.

{ 6 } Comments
#!/usr/bin/perl -w
use strict;
open my $IN, “UNIVERSE.csv”;
open my $OUT, “>UNIVERSE_WEIGHTED.csv”;
my $counter = 0;
my @names = qw( ppp00 ppp04 pri01 pri03 pri05 pri99 );
my @fields = qw( 154 155 161 163 165 182 );
while ( <$IN>) {
chomp; # This strips the newline from the line less expensively
# than s/\n//
if ( $counter == 1 ) {
print $OUT “$_,\”weight\”\n”;
next;
}
my %vtr;
@vtr{ @names } = ( split /,/, $_, $maxfields )[@fields];
$weight = 0;
$counter++;
$weight += .5 if $vtr{ ‘ppp00′ } eq ‘”R”‘);
for ( @fields ) {
$weight += $vtr{ $_ } eq ‘”D”‘ ? -1 :
$vtr{ $_ } eq ‘”R”‘ ? 1 :
0;
}
print $OUT “$_,\”$weight\”\n”;
print “Processed: $counter\r”;
}
print “\n”;
Oops … get rid of the ‘, $maxfields’ in the split command. Since you’re getting rid of a newline at the end of the line and field 182–your last field–it’s apparent that you only have that many fields in your record.
The ruby version is reading the file in a line at a time, whereas it looks like the Perl version reads the whole file into memory at the beginning. That may explain some of slowness of the ruby version.
No, the construct that’s being used in the perl examples is reading a line at a time.
phoenyx- The Perl script is reading the file line by line.
harleypig- Whatever happened to your blog, eh?
My server went south. I’m on a Linode and the last image they had was 2005.0 … they’ve since upgraded to 2006.1 but with all the changes the installation wasn’t quite right.
I’ve been working on a HowTo for Gentoo and Linode and having to re-install a bunch of times to get everything working.
It should be back up in a couple of weeks (I can’t spend more than a few hours a week on it).
BTW, a preview button would be nice. :]
Post a Comment