Average CEL Perl Script

Today I wrote a Perl script to average out Affymetrix .cel files. In the middle of my Perl hacking I ran into some issues concerning Affymetrix’s file format.

In a .cel file, under a section marked [INTENSITY] Affymetrix stores a probe intensity on each row. In the row you have your x-coordinate, y-coordinate, mean (aka intensity), standard deviation, and number of pixels.

Originally, I tried using the split function to split the data on a line in their intensity section. What I tried was as follows:

($dummy, $dummy, $dummy, $total_intensity[$i], , ) =
split(/\s+/, $line);

Notice there are three dummy variables that lead prior to the actual intensity value capture. I did this because I wanted to account for the space prior to the coordinate values. So this split function actually splits the line into six variables instead of the five mentioned above to account for the space. This actually caused me a problem because once you get to coordinates three digits long (i.e. y=100) that leading space is no longer there. What I ended up doing was creating a regular expression instead as follows:

if ( $line =~
m/\s*(\d+)\s+(\d+)\s+(\d+.\d+)/g){$total_intensity[$i]= $3;}

I know this is a “hack” but it works. I’m sure there is a way to get rid of the leading spaces but none came to mind. I tried the chomp function but all this does is get rid of trailing new lines. Does anyone have an idea how to get rid of these leading spaces?

9 Responses to “Average CEL Perl Script”


  1. 1 johnny

    hyr,
    Dunno if this will help or not, this will take a string & chop the leading whitespace (tab, space, newline, etc.) off of it, returning the thus-beheaded string:

    sub rm_ws_head {
    my $str = shift;
    $str =~ s/^\s+//;
    return $str;
    }

    This will get rid of trailing whitespace:

    sub rm_ws_tail {
    my $str = shift;
    $str =~ s/\s+$//;
    return $str;
    }

    And this will get ‘em all:

    sub rm_ws_all {
    my $str = shift;
    $str =~ s/\s+//g;
    return $str;
    }

    Here’s code to test it:

    sub main {
    # use the subs we’ve defined:
    $str = ” Oh, woe is me! “;
    print “Now str == …\n”;
    $str = &rm_ws_head($str);
    print “Now str == …\n”;
    $str = &rm_ws_tail($str);
    print “Now str == …\n”;
    $str = &rm_ws_all($str);
    print “Now str == …\n”;

    print “Now gonna use splits to do it…\n”;

    # do it brute force using splits:
    $str = ” Oh, woe is me! “;
    print “Now str == …\n”;
    my @wrds = split(/\s/, $str);
    my $newstr = “”;
    foreach $w (@wrds) {
    $newstr .= $w;
    }
    print “Now str == …\n”;
    }

    &main();

    Err… I also snuck a bit of array madness in there at the end, you can split your line on whitespaces if you want then just fish out whatever fields you like, I actually do this a lot in my Perl-ing. For instance, if you know each such line produces an array with 5 “wrds” in it, and you know you want the 3rd wrd, or $wrds[2], then that can be very nice rather than dealing over & over with the whole string.

    Hope that makes some sort of sense & helps in some way. Take it easy & good luck with your Perl attacks.
    johnny

  2. 2 johnny

    Uh…. I guess I shoulda used spaces instead of tabs. My formatting got ate. I’ll send you another copy of this babble if you need it.
    johnny

  3. 3 johnny

    jye again,
    Crud, that main() got all messed up, it makes no sense. OK, I’m gonna try main() again with no tabs, and no lesser than - greater than symbols (I guess it got interpreted as HTML???):

    sub main {
    # use the subs we’ve defined:
    $str = ” Oh, woe is me! “;
    print “Now str == ($str)…\n”;
    $str = &rm_ws_head($str);
    print “Now str == ($str)…\n”;
    $str = &rm_ws_tail($str);
    print “Now str == ($str)…\n”;
    $str = &rm_ws_all($str);
    print “Now str == ($str)…\n”;

    print “Now gonna use splits to do it…\n”;

    # do it brute force using splits:
    $str = ” Oh, woe is me! “;
    print “Now str == ($str)…\n”;
    my @wrds = split(/\s/, $str);
    my $newstr = “”;
    foreach $w (@wrds) {
    $newstr .= $w;
    }
    print “Now str == ($newstr)…\n”;
    }

    &main();

  4. 4 johnny

    OK I give up. If you want whitespace removed, run it through this “Leave a Reply” box, it does a great job! If you can’t read the above code due to lack of indents, I’ll be happy to send you a copy.

  5. 5 Ryan Castillo

    Johnny - you’re a coding genius. Well as far as C++, C and Perl are concerned. But you’re html knowledge seems to be a bit off. You actually can’t include tabs in html. Just doesn’t work I’ve fooled with it. Thanks for the help bro!

  6. 6 johnny

    hya,
    lOl yah I sux at html, I’m a complete n00b I’ll confess. By the way why the heck Perl? (*suppressed shudders*). Anyway good luck with all that stuff.

  7. 7 GugaRedy

    Hi.
    Very useful resource. THE BEST. I liked your site.

    Thanks.

    Sincerely,

  8. 8 Matt

    In answer to ‘why heck Perl?’, because sometimes that is the only language available other than DOS bat files.

    And given the choice, Perl wins.

    Cheers,

    Matt

  9. 9 RAJASHEKAR

    please can u help me with programming in perl,
    iam just a bigginner , please helpp me with some examples for bioinformatics using perl
    awaiting ur reply
    raj

Leave a Reply




Subscribe

Subscribe to my RSS Feeds