Image of the glider from the Game of Life by John Conway
Skip to content

Perl Versus Ruby on CSV

Ok. I had a little project at work, at which point, I turned to Perl to accomplish the task. The project was simple enough: take a comma-separated file, and look at a few fields to see what data is contained. Based on that data, create a "weight", and assign the weight to the record at the end of the line.

The Perl code I used to accomplish the task is as follows (please take note, that WordPress loves to parse tags, even if contained within <code> tags. WordPress also likes to lowercase the tag, so in Perl, file handles will be in lowercase):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/usr/bin/perl -w

open(IN, "UNIVERSE.csv");
open(OUT, ">UNIVERSE_WEIGHTED.csv");

$counter = 0;

while(<in>) # </in> (for Wordpress)
{
    @fields = split(/,/,$_,183);
    $fields[182] =~ s/\n//;
    s/\n//;
    $counter++;
    $weight = 0;

    if ($counter == 1)
    {
        print OUT $_ . ",\"weight\"\n";
    }
    else
    {
        $vtr_ppp00 = $fields[154];
        $vtr_ppp04 = $fields[155];
        $vtr_pri01 = $fields[161];
        $vtr_pri03 = $fields[163];
        $vtr_pri05 = $fields[165];
        $vtr_pri99 = $fields[182];

        $weight +=.5 if ($vtr_ppp00 eq "\"R\"");
        $weight -= 1 if ($vtr_ppp04 eq "\"D\"");
        $weight -= 1 if ($vtr_pri01 eq "\"D\"");
        $weight -= 1 if ($vtr_pri03 eq "\"D\"");
        $weight -= 1 if ($vtr_pri05 eq "\"D\"");
        $weight -= 1 if ($vtr_pri99 eq "\"D\"");
        $weight += 1 if ($vtr_ppp04 eq "\"R\"");
        $weight += 1 if ($vtr_pri01 eq "\"R\"");
        $weight += 1 if ($vtr_pri03 eq "\"R\"");
        $weight += 1 if ($vtr_pri05 eq "\"R\"");
        $weight += 1 if ($vtr_pri99 eq "\"R\"");
   
        print OUT $_ . ",\"" . $weight . "\"\n";
    }
   
    print "Processed: " . $counter . "\r";
}

print "\n";

close(IN);
close(OUT);

Just for fun, a buddy of mine mentioned to code it in Ruby, and compare results. Seeing as though I have never coded a Ruby script in my life, I was a bit worried. However, it wasn't too bad. I just had to cure my itch to put a $ in front of all my variables. Anyway, heres the Ruby code, following as closely as possible to the Perl code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/ruby

counter = 0

outfile = File.open("UNIVERSE_RUBY.csv","w")
IO.foreach("UNIVERSE.csv") do |line|
    counter += 1
    weight = 0
   
    if (counter == 1)
        outfile < < line.chop + ",\"weight\"\n"
    else    
        fields = line.chop.split(',')
        vtr_ppp00 = fields[154]
        vtr_ppp04 = fields[155]
        vtr_pri01 = fields[161]
        vtr_pri03 = fields[163]
        vtr_pri05 = fields[165]
        vtr_pri99 = fields[182]
       
        weight += 0.5 if (vtr_ppp00 == "\"R\"")
        weight -= 1   if (vtr_ppp04 == "\"D\"")
        weight -= 1   if (vtr_pri01 == "\"D\"")
        weight -= 1   if (vtr_pri03 == "\"D\"")
        weight -= 1   if (vtr_pri05 == "\"D\"")
        weight -= 1   if (vtr_pri99 == "\"D\"")
        weight += 1   if (vtr_ppp04 == "\"R\"")
        weight += 1   if (vtr_pri01 == "\"R\"")
        weight += 1   if (vtr_pri03 == "\"R\"")
        weight += 1   if (vtr_pri05 == "\"R\"")
        weight += 1   if (vtr_pri99 == "\"R\"")
       
        outfile < < line.chop + ",\"" + weight.to_s + "\"\n"
    end
    print "Processed: " + counter.to_s + "\r"
end

print "\n"

Ok. As you can see, the code is fairly similar. The algorithms the same. Running the script takes a mere second or two, and the file comes out correct. However, I was curious about execution speed, so I decided to pit one script against the other, time them, and see what happens. Here are my results

aaron@hercules:~/Desktop$ time perl weight.pl
Processed: 5394

real    0m1.386s
user    0m1.304s
sys     0m0.048s
aaron@hercules:~/Desktop$ time ruby weight.rb
Processed: 5394

real    0m2.180s
user    0m1.992s
sys     0m0.124s

Am I reading this correctly? Perl is almost 60% faster at execution with this code than Ruby? I thought Ruby was supposed to have exceptional file handling. Better than Perl, even. However, I have also heard that the Ruby devs are more concerned about functionality than speed, which should be expected. Still, that's a serious speed factor. If I was worried about speed here, Perl, in this case, would win out.

At any rate, this was a fun little exercise to stretch my scripting muscles, and to learn a bit of Ruby. I'm curious if I can make the scripts more efficient. If you know how, comment below, or contact me.

{ 6 } Comments

  1. Harley Pig using BonEcho 2.0.0.1 on GNU/Linux 64 bits | January 2, 2007 at 8:11 am | Permalink

    #!/usr/bin/perl -w

    use strict;

    open my $IN, "UNIVERSE.csv";
    open my $OUT, ">UNIVERSE_WEIGHTED.csv";

    my $counter = 0;

    my @names = qw( ppp00 ppp04 pri01 pri03 pri05 pri99 );
    my @fields = qw( 154 155 161 163 165 182 );

    while ( <$IN>) {

    chomp; # This strips the newline from the line less expensively
    # than s/\n//

    if ( $counter == 1 ) {

    print $OUT "$_,\"weight\"\n";
    next;

    }

    my %vtr;
    @vtr{ @names } = ( split /,/, $_, $maxfields )[@fields];

    $weight = 0;
    $counter++;

    $weight += .5 if $vtr{ 'ppp00' } eq '"R"');

    for ( @fields ) {

    $weight += $vtr{ $_ } eq '"D"' ? -1 :
    $vtr{ $_ } eq '"R"' ? 1 :
    0;

    }

    print $OUT "$_,\"$weight\"\n";
    print "Processed: $counter\r";

    }

    print "\n";

  2. Harley Pig using BonEcho 2.0.0.1 on GNU/Linux 64 bits | January 2, 2007 at 8:15 am | Permalink

    Oops ... get rid of the ', $maxfields' in the split command. Since you're getting rid of a newline at the end of the line and field 182--your last field--it's apparent that you only have that many fields in your record.

  3. phoenyx using Firefox 2.0 on Ubuntu | January 2, 2007 at 2:59 pm | Permalink

    The ruby version is reading the file in a line at a time, whereas it looks like the Perl version reads the whole file into memory at the beginning. That may explain some of slowness of the ruby version.

  4. Harley Pig using BonEcho 2.0.0.1 on GNU/Linux 64 bits | January 3, 2007 at 7:32 am | Permalink

    No, the construct that's being used in the perl examples is reading a line at a time.

  5. Aaron using Firefox 2.0.0.1 on Ubuntu | January 3, 2007 at 8:19 am | Permalink

    phoenyx- The Perl script is reading the file line by line.

    harleypig- Whatever happened to your blog, eh?

  6. Harley Pig using BonEcho 2.0.0.1 on GNU/Linux 64 bits | January 4, 2007 at 7:45 am | Permalink

    My server went south. I'm on a Linode and the last image they had was 2005.0 ... they've since upgraded to 2006.1 but with all the changes the installation wasn't quite right.

    I've been working on a HowTo for Gentoo and Linode and having to re-install a bunch of times to get everything working.

    It should be back up in a couple of weeks (I can't spend more than a few hours a week on it).

    BTW, a preview button would be nice. :]

Post a Comment

Your email is never published nor shared.

Switch to our mobile site