Skip to content

Perl Versus Ruby on CSV

Ok. I had a little project at work, at which point, I turned to Perl to accomplish the task. The project was simple enough: take a comma-separated file, and look at a few fields to see what data is contained. Based on that data, create a “weight”, and assign the weight to the record at the end of the line.

The Perl code I used to accomplish the task is as follows (please take note, that Wordpress loves to parse tags, even if contained within <code> tags. Wordpress also likes to lowercase the tag, so in Perl, file handles will be in lowercase):

#!/usr/bin/perl -w
 
open(IN, "UNIVERSE.csv");
open(OUT, ">UNIVERSE_WEIGHTED.csv");
 
$counter = 0;
 
while(<in>) # </in> (for Wordpress)
{
	@fields = split(/,/,$_,183);
	$fields[182] =~ s/\n//;
	s/\n//;
	$counter++;
	$weight = 0;
 
	if ($counter == 1)
	{
		print OUT $_ . ",\"weight\"\n";
	}
	else
	{
		$vtr_ppp00 = $fields[154];
		$vtr_ppp04 = $fields[155];
		$vtr_pri01 = $fields[161];
		$vtr_pri03 = $fields[163];
		$vtr_pri05 = $fields[165];
		$vtr_pri99 = $fields[182];
 
		$weight +=.5 if ($vtr_ppp00 eq "\"R\"");
		$weight -= 1 if ($vtr_ppp04 eq "\"D\"");
		$weight -= 1 if ($vtr_pri01 eq "\"D\"");
		$weight -= 1 if ($vtr_pri03 eq "\"D\"");
		$weight -= 1 if ($vtr_pri05 eq "\"D\"");
		$weight -= 1 if ($vtr_pri99 eq "\"D\"");
		$weight += 1 if ($vtr_ppp04 eq "\"R\"");
		$weight += 1 if ($vtr_pri01 eq "\"R\"");
		$weight += 1 if ($vtr_pri03 eq "\"R\"");
		$weight += 1 if ($vtr_pri05 eq "\"R\"");
		$weight += 1 if ($vtr_pri99 eq "\"R\"");
	
		print OUT $_ . ",\"" . $weight . "\"\n";
	}
	
	print "Processed: " . $counter . "\r";
}
 
print "\n";
 
close(IN);
close(OUT);

Just for fun, a buddy of mine mentioned to code it in Ruby, and compare results. Seeing as though I have never coded a Ruby script in my life, I was a bit worried. However, it wasn’t too bad. I just had to cure my itch to put a $ in front of all my variables. Anyway, heres the Ruby code, following as closely as possible to the Perl code:

#!/usr/bin/ruby
 
counter = 0
 
outfile = File.open("UNIVERSE_RUBY.csv","w")
IO.foreach("UNIVERSE.csv") do |line|
	counter += 1
	weight = 0
    
	if (counter == 1)
		outfile < < line.chop + ",\"weight\"\n"
	else    
		fields = line.chop.split(',')
		vtr_ppp00 = fields[154]
		vtr_ppp04 = fields[155]
		vtr_pri01 = fields[161]
		vtr_pri03 = fields[163]
		vtr_pri05 = fields[165]
		vtr_pri99 = fields[182]
		
		weight += 0.5 if (vtr_ppp00 == "\"R\"")
		weight -= 1   if (vtr_ppp04 == "\"D\"")
		weight -= 1   if (vtr_pri01 == "\"D\"")
		weight -= 1   if (vtr_pri03 == "\"D\"")
		weight -= 1   if (vtr_pri05 == "\"D\"")
		weight -= 1   if (vtr_pri99 == "\"D\"")
		weight += 1   if (vtr_ppp04 == "\"R\"")
		weight += 1   if (vtr_pri01 == "\"R\"")
		weight += 1   if (vtr_pri03 == "\"R\"")
		weight += 1   if (vtr_pri05 == "\"R\"")
		weight += 1   if (vtr_pri99 == "\"R\"")
		
		outfile < < line.chop + ",\"" + weight.to_s + "\"\n"
	end
	print "Processed: " + counter.to_s + "\r"
end
 
print "\n"

Ok. As you can see, the code is fairly similar. The algorithms the same. Running the script takes a mere second or two, and the file comes out correct. However, I was curious about execution speed, so I decided to pit one script against the other, time them, and see what happens. Here are my results

aaron@hercules:~/Desktop$ time perl weight.pl
Processed: 5394

real    0m1.386s
user    0m1.304s
sys     0m0.048s
aaron@hercules:~/Desktop$ time ruby weight.rb
Processed: 5394

real    0m2.180s
user    0m1.992s
sys     0m0.124s

Am I reading this correctly? Perl is almost 60% faster at execution with this code than Ruby? I thought Ruby was supposed to have exceptional file handling. Better than Perl, even. However, I have also heard that the Ruby devs are more concerned about functionality than speed, which should be expected. Still, that’s a serious speed factor. If I was worried about speed here, Perl, in this case, would win out.

At any rate, this was a fun little exercise to stretch my scripting muscles, and to learn a bit of Ruby. I’m curious if I can make the scripts more efficient. If you know how, comment below, or contact me.

{ 6 } Comments

  1. Harley Pig using Unbranded Firefox Unbranded Firefox 2.0.0.1 on Linux Linux | January 2, 2007 at 8:11 am | Permalink

    #!/usr/bin/perl -w

    use strict;

    open my $IN, “UNIVERSE.csv”;
    open my $OUT, “>UNIVERSE_WEIGHTED.csv”;

    my $counter = 0;

    my @names = qw( ppp00 ppp04 pri01 pri03 pri05 pri99 );
    my @fields = qw( 154 155 161 163 165 182 );

    while ( <$IN>) {

    chomp; # This strips the newline from the line less expensively
    # than s/\n//

    if ( $counter == 1 ) {

    print $OUT “$_,\”weight\”\n”;
    next;

    }

    my %vtr;
    @vtr{ @names } = ( split /,/, $_, $maxfields )[@fields];

    $weight = 0;
    $counter++;

    $weight += .5 if $vtr{ ‘ppp00′ } eq ‘”R”‘);

    for ( @fields ) {

    $weight += $vtr{ $_ } eq ‘”D”‘ ? -1 :
    $vtr{ $_ } eq ‘”R”‘ ? 1 :
    0;

    }

    print $OUT “$_,\”$weight\”\n”;
    print “Processed: $counter\r”;

    }

    print “\n”;

    [Reply]

  2. Harley Pig using Unbranded Firefox Unbranded Firefox 2.0.0.1 on Linux Linux | January 2, 2007 at 8:15 am | Permalink

    Oops … get rid of the ‘, $maxfields’ in the split command. Since you’re getting rid of a newline at the end of the line and field 182–your last field–it’s apparent that you only have that many fields in your record.

    [Reply]

  3. phoenyx using Mozilla Firefox Mozilla Firefox 2.0 on Ubuntu Linux Ubuntu Linux | January 2, 2007 at 2:59 pm | Permalink

    The ruby version is reading the file in a line at a time, whereas it looks like the Perl version reads the whole file into memory at the beginning. That may explain some of slowness of the ruby version.

    [Reply]

  4. Harley Pig using Unbranded Firefox Unbranded Firefox 2.0.0.1 on Linux Linux | January 3, 2007 at 7:32 am | Permalink

    No, the construct that’s being used in the perl examples is reading a line at a time.

    [Reply]

  5. Aaron using Mozilla Firefox Mozilla Firefox 2.0.0.1 on Ubuntu Linux Ubuntu Linux | January 3, 2007 at 8:19 am | Permalink

    phoenyx- The Perl script is reading the file line by line.

    harleypig- Whatever happened to your blog, eh?

    [Reply]

  6. Harley Pig using Unbranded Firefox Unbranded Firefox 2.0.0.1 on Linux Linux | January 4, 2007 at 7:45 am | Permalink

    My server went south. I’m on a Linode and the last image they had was 2005.0 … they’ve since upgraded to 2006.1 but with all the changes the installation wasn’t quite right.

    I’ve been working on a HowTo for Gentoo and Linode and having to re-install a bunch of times to get everything working.

    It should be back up in a couple of weeks (I can’t spend more than a few hours a week on it).

    BTW, a preview button would be nice. :]

    [Reply]

Post a Comment

Your email is never published nor shared. Required fields are marked *

Powered by WP Hashcash