
parsing script removing some lines help please
Hi,
I am lost in my script, and would need to basic help please.
I have got a file , separated by tabs, and the first column contain a
chromosome number, then several other column with different infos.
Basically I am trying to created a script that would take a file(see
example), parse line by line, and when the first column start by any of
the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
it doesn't start by the bad chromosomes , print all the line to a new
output file.
the script below, just reprint the same original file :(
thanks for any clues
Nat
#!/software/bin/perl
use warnings;
use strict;
open(IN, "<example.txt") or die( $! );
open(OUT, ">>removed.txt") or die( $! );
my [at] bad_chromosome=(6,8,14,16,18,Y);
while(<IN>){
chomp;
my [at] column=split /\t/;
foreach my $chr_no( [at] bad_chromosome){
if ($column[0]==$chr_no){
next;
}
}
print OUT
$column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4],"\t",$column[5],"\t",$column[6],"\t",$column[7 ],"\t",$column[8],"\t",$column[9],"\t",$column[10],"\t",$col umn[11],"\t",$column[12],"\t",$column[13],"\t",$column[14]," \n";
}
close IN; close OUT;
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parsing script removing some lines help please
On Fri, Sep 30, 2011 at 09:37, Nathalie Conte <nac [at] sanger.ac.uk> wrote:
> thanks for any clues
It's a simple one, really.. 8^)
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, "<example.txt") or die( $! );
> open(OUT, ">>removed.txt") or die( $! );
ObCorrectness: you should say something more like
open( my $IN , '<' , 'example.txt' ) or die( $! );
open( my $OUT , '>>' , removed.txt' ) or die( $! );
and then change the filehandles correspondingly -- but that's not your prob=
lem.
> my [at] bad_chromosome=3D(6,8,14,16,18,Y);
> while(<IN>){
> =C2=A0 chomp;
> =C2=A0 my [at] column=3Dsplit /\t/;
> =C2=A0 =C2=A0 =C2=A0 foreach my $chr_no( [at] bad_chromosome){
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if ($column[0]=3D=3D$chr_no){
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 next;
here's your problem -- next always applies to the innermost loop -- so
you're jumping to the next $chr_no, not the next $_.
you solve this with a loop label:
LINE: while( <IN> ) {
chomp;
my [at] column =3D split /\t/;
foreach my $chr_no ( [at] bad_chromosome ) {
if( $column[0] =3D=3D $chr_no ) {
next LINE;
and then the rest is all the same.
You _may_ want to switch that comparison to 'eq' instead of '=3D=3D' --
didn't you have 'Y' as one of the chromosomes to drop?
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4]=
,"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$colum n[8],"\t",$colu=
mn[9],"\t",$column[10],"\t",$column[11],"\t",$column[12],"\t ",$column[13],"=
\t",$column[14],"\n";
Oh, and this? Try something like
print (join '\t' , [at] column), "\n"
chrs,
john.
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
RE: parsing script removing some lines help please
> From: Nathalie Conte [mailto:nac [at] sanger.ac.uk]
> Sent: Friday, September 30, 2011 9:38 AM
> To: beginners [at] perl.org
> Subject: parsing script removing some lines help please
>
>
>
> Hi,
> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and when the first column start by any
> of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
>
>
>
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, "<example.txt") or die( $! );
> open(OUT, ">>removed.txt") or die( $! );
> my [at] bad_chromosome=(6,8,14,16,18,Y);
> while(<IN>){
> chomp;
> my [at] column=split /\t/;
> foreach my $chr_no( [at] bad_chromosome){
> if ($column[0]==$chr_no){
> next;
> }
> }
> print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[
> 4],"\t",$column[5],"\t",$column[6],"\t",$column[7],"\t",$col umn[8],"\t"
> ,$column[9],"\t",$column[10],"\t",$column[11],"\t",$column[1 2],"\t",$co
> lumn[13],"\t",$column[14],"\n";
> }
>
>
>
> close IN; close OUT;
>
John has provided good advice on this problem, but I wanted to add a couple
of things.
To avoid explicitly coding the foreach loop for [at] bad_chromosome, you could
use the 'grep' function.
Also, if you are just reprinting the input line, print $_.
unless ( grep {$column[0] eq $_} [at] bad_chromosome ){
print OUT "$_\n"; # or print $OUT if declared as John suggested
The grep call will return the number of times $column[0] matched an element
of [at] bad_chromosome.
Thus, if there is a match the grep call will evaluate to 'true'. Otherwise,
it will evaluate to 'false'.
Using grep does have a drawback (but not that much unless you have a lot of
values in [at] bad_chromosome). It checks all the values of [at] bad_chromosome for
a match. Using the 'if ... next' stops looking for a match when a match is
found.
If you wonder about the use of $_ in the grep function - that is a localized
copy of $_ and does not affect the $_ that contains the data read from the
file.
If you are using Perl 5.10 or higher, you can use the 'smart match'
operators instead of grep.
HTH, Ken
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parsing script removing some lines help please
Nathalie Conte wrote:
>
> Hi,
Hello,
> I am lost in my script, and would need to basic help please.
> I have got a file , separated by tabs, and the first column contain a
> chromosome number, then several other column with different infos.
> Basically I am trying to created a script that would take a file(see
> example), parse line by line, and when the first column start by any of
> the chromosomes I don't want (6,8,14,16,18,Y), go the next line, and if
> it doesn't start by the bad chromosomes , print all the line to a new
> output file.
> the script below, just reprint the same original file :(
> thanks for any clues
> Nat
>
>
>
> #!/software/bin/perl
> use warnings;
> use strict;
> open(IN, "<example.txt") or die( $! );
> open(OUT, ">>removed.txt") or die( $! );
> my [at] bad_chromosome=(6,8,14,16,18,Y);
> while(<IN>){
> chomp;
> my [at] column=split /\t/;
> foreach my $chr_no( [at] bad_chromosome){
> if ($column[0]==$chr_no){
> next;
> }
> }
> print OUT
> $column[0],"\t",$column[1],"\t",$column[2],"/",$column[3],"\ t",$column[4],"\t",$column[5],"\t",$column[6],"\t",$column[7 ],"\t",$column[8],"\t",$column[9],"\t",$column[10],"\t",$col umn[11],"\t",$column[12],"\t",$column[13],"\t",$column[14]," \n";
>
> }
>
> close IN; close OUT;
#!/software/bin/perl
use warnings;
use strict;
open my $IN, '<', 'example.txt' or die "Cannot open 'example.txt'
because: $!";
open my $OUT, '>>', 'removed.txt' or die "Cannot open 'removed.txt'
because: $!";
my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;
while ( <$IN> ) {
print $OUT $_ if !/$bad_chromosomes/;
}
close $IN;
close $OUT;
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parsing script removing some lines help please
Mariano Loza Coll wrote:
> Hi John,
Hello,
> I'm trying to learn a little bit more of Perl everyday, and I was
> intrigued about your earlier suggestion in a thread.
>
>
>> my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;
>>
>> while (<$IN> ) {
>> print $OUT $_ if !/$bad_chromosomes/;
>> }
>>
>
> I get the spirit of what you suggested, but I was curious about the use of "?:"
() are capturing parentheses and
(?:) are non-capturing parentheses.
> If I got it right, the "?" will make the search non-greedy. But what
> is the ":" for?
perldoc perlre
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: parsing script removing some lines help please
John W. Krahn wrote:
>Mariano Loza Coll wrote:
>> Hi John,
>
>Hello,
>
>> I'm trying to learn a little bit more of Perl everyday, and I was
>> intrigued about your earlier suggestion in a thread.
>>
>>
>>> my $bad_chromosomes = qr/^(?:6|8|14|16|18|Y)\t/;
>>>
>>> while (<$IN> ) {
>>> print $OUT $_ if !/$bad_chromosomes/;
>>> }
>>>
>>
>> I get the spirit of what you suggested, but I was curious about the use
>> of "?:"
>
>() are capturing parentheses and
>
>(?:) are non-capturing parentheses.
>
>
>> If I got it right, the "?" will make the search non-greedy. But what
>> is the ":" for?
The (?:... ) notation is non-capturing. The ?: makes it non capturing.
In this context, the '?' is not used to control greediness.
In perlre, look for the section 'Extended Patterns', then down a page look
for:
(?:pattern)
(?adluimsx-imsx:pattern)
(?^aluimsx:pattern)
Chris
>
>perldoc perlre
>
>
>
>John
>--
>Any intelligent fool can make things bigger and
>more complex... It takes a touch of genius -
>and a lot of courage to move in the opposite
>direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/