trouble matching words with parentheses using grep
Hello everyone,
Here's what I need to do: I need to take each of the words in one
List1, search for their presence (or not) in a second List2 and make a
new list with all the words in both. These are lists of gene names,
which can often include numbers and symbols in addition to letters.
The very newbie code that I wrote performs well, but it's missing the
words that have parentheses in them (for instance "Su(var)2-5").
Below are examples of the lists that I'm working with, the code that
I'm using, and the output.
I realize that there may be a number of ways to do what I need to do
(most of them better, I bet), and I'd love to learn about them. But
now I'm mostly curious about why grep// cannot "see" words with
parentheses in either (or both lists). I suspect the trick may be
somehow escaping the (), but I tried a number of ways of doing that to
no avail.
Any help will be greatly appreciated! And as usual, if you ever have
questions about molecular biology and genetics, fire away - I'd love
to pay the favor back.
Thanks in advance,
Mariano
Example of List 1
numb
Dl
cad99C
ham
esg
Stat92E
Hh
l(2)Lg
neur
CG32150
sox15N
Su(var)2-5
E(spl)-m4
ci
brd
vvl
Example of List 2
Src42A
cad99C
ham
Hh
l(2)Lg
neur
sox15N
numb
ubx
Su(var)2-5
esg
E(spl)-m4
ci
ttk
egfr
brd
ocho
vvl
CG32150
############## SCRIPT BEGINS
#!usr/bin/perl
use warnings;
print "\nEnter path to List 1\n\n";
$path1 = <STDIN>; chomp $path1;
print "\nEnter path to List 2\n\n";
$path2 = <STDIN>; chomp $path2;
open (INPUT1, "$path1"); open (INPUT2, "$path2"); open (INPUT3, "$path3");
[at] array1 = <INPUT1>;
[at] array2 = <INPUT2>;
my ($array1,$array2,$in1and2);
my $ctr1 = 0;
while ($ctr1 < [at] array1){
if(grep(/^$array1[$ctr1]$/, [at] array2)){
push ( [at] in1and2,$array1[$ctr1]);
}
$ctr1 +=1;
}
print "in 1= ". [at] array1."\nin 2= ". [at] array2."in 1 and 2= ". [at] in1and2."\n\nDone!";
exit;
######
The output is...
in 1= 16
in 2= 19
in 1 and 2= 11
Done!
....when it should have been...
in 1= 16
in 2= 19
in 1 and 2= 14
but it's not "seeing" l(2)Lg, Su(var)2-5, E(spl)-m4 in both lists...
(I know this based on troubleshooting)
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: trouble matching words with parentheses using grep
Hello Mariano,
> I realize that there may be a number of ways to do what I need to do
> (most of them better, I bet), and I'd love to learn about them. But
> now I'm mostly curious about why grep// cannot "see" words with
> parentheses in either (or both lists). I suspect the trick may be
> somehow escaping the (), but I tried a number of ways of doing that to
> no avail.
Characters having specific meanings in regular expressions have to be
escaped. You could either use the quotemeta function or enclose the
regular expression within \Q and \E:
grep /^ \Q $array1[$ctr1] \E $/x, [at] array2;
Some comments about the code:
> #!usr/bin/perl
For increased portability, use the shebang #!/usr/bin/env perl
> use warnings;
Also use strict;.
> $path1 =<STDIN>; chomp $path1;
Declare variables before they are used.
> open (INPUT1, "$path1"); open (INPUT2, "$path2"); open (INPUT3, "$path3");
Try to use the 3-argument version of open.
> my ($array1,$array2,$in1and2);
Avoid naming scalars and arrays (or hashes) the same.
An alternative solution:
=pod code
#!/usr/bin/env perl
use strict;
use warnings;
use File::Slurp qw( slurp );
use List::MoreUtils qw( any each_array );
my %path;
print "Enter path to list 1: ";
chomp( $path{'list_1'} = <STDIN> );
print "Enter path to list 2: ";
chomp( $path{'list_2'} = <STDIN> );
chomp( my [at] list_1 = slurp( $path{'list_1'} ) );
chomp( my [at] list_2 = slurp( $path{'list_2'} ) );
my [at] common_list;
for my $word ( [at] list_1) {
any { $_ eq $word } [at] list_2 and push [at] common_list, $_;
}
printf "in 1 = %d
in 2 = %d
in 1 and 2 = %d\n", scalar [at] list_1, scalar [at] list_2, scalar [at] common_list;
=cut
> Any help will be greatly appreciated! And as usual, if you ever have
> questions about molecular biology and genetics, fire away - I'd love
> to pay the favor back.
Thank you.
Hope this message helps. :-)
Regards,
Alan Haggai Alavi.
--
The difference makes the difference
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: trouble matching words with parentheses using grep
Hello Mariano,
> use List::MoreUtils qw( any each_array );
I was experimenting and forgot to take off `each_array` from the import
list. `each_array` is not used in the alternative solution and is hence
not required.
Regards,
Alan Haggai Alavi.
--
The difference makes the difference
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: trouble matching words with parentheses using grep
Mariano Loza Coll wrote:
> Hello everyone,
Hello,
> Here's what I need to do: I need to take each of the words in one
> List1, search for their presence (or not) in a second List2 and make a
> new list with all the words in both. These are lists of gene names,
> which can often include numbers and symbols in addition to letters.
>
> The very newbie code that I wrote performs well, but it's missing the
> words that have parentheses in them (for instance "Su(var)2-5").
> Below are examples of the lists that I'm working with, the code that
> I'm using, and the output.
>
> I realize that there may be a number of ways to do what I need to do
> (most of them better, I bet), and I'd love to learn about them. But
> now I'm mostly curious about why grep// cannot "see" words with
> parentheses in either (or both lists). I suspect the trick may be
> somehow escaping the (), but I tried a number of ways of doing that to
> no avail.
>
> Any help will be greatly appreciated! And as usual, if you ever have
> questions about molecular biology and genetics, fire away - I'd love
> to pay the favor back.
>
> Thanks in advance,
> Mariano
>
> Example of List 1
>
> numb
> Dl
> cad99C
> ham
> esg
> Stat92E
> Hh
> l(2)Lg
> neur
> CG32150
> sox15N
> Su(var)2-5
> E(spl)-m4
> ci
> brd
> vvl
>
> Example of List 2
>
> Src42A
> cad99C
> ham
> Hh
> l(2)Lg
> neur
> sox15N
> numb
> ubx
> Su(var)2-5
> esg
> E(spl)-m4
> ci
> ttk
> egfr
> brd
> ocho
> vvl
> CG32150
>
> ############## SCRIPT BEGINS
> #!usr/bin/perl
> use warnings;
use strict;
> print "\nEnter path to List 1\n\n";
> $path1 =<STDIN>; chomp $path1;
>
> print "\nEnter path to List 2\n\n";
> $path2 =<STDIN>; chomp $path2;
>
> open (INPUT1, "$path1"); open (INPUT2, "$path2"); open (INPUT3, "$path3");
You should *always* verify that a file opened correctly:
open INPUT1, '<', $path1 or die "Cannot open '$path1' because: $!";
open INPUT2, '<', $path2 or die "Cannot open '$path2' because: $!";
open INPUT3, '<', $path3 or die "Cannot open '$path3' because: $!";
> [at] array1 =<INPUT1>;
> [at] array2 =<INPUT2>;
>
> my ($array1,$array2,$in1and2);
You never use those variables anywhere.
> my $ctr1 = 0;
> while ($ctr1< [at] array1){
> if(grep(/^$array1[$ctr1]$/, [at] array2)){
Change that line to this to get your program to work correclty:
if ( grep $_ eq $array1[ $ctr1 ], [at] array2 ) {
> push ( [at] in1and2,$array1[$ctr1]);
> }
> $ctr1 +=1;
> }
>
> print "in 1= ". [at] array1."\nin 2= ". [at] array2."in 1 and 2= ". [at] in1and2."\n\nDone!";
>
> exit;
>
> ######
>
> The output is...
>
> in 1= 16
> in 2= 19
> in 1 and 2= 11
>
> Done!
>
> ...when it should have been...
>
> in 1= 16
> in 2= 19
> in 1 and 2= 14
>
> but it's not "seeing" l(2)Lg, Su(var)2-5, E(spl)-m4 in both lists...
> (I know this based on troubleshooting)
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: trouble matching words with parentheses using grep
On Apr 9, 1:04=A0pm, alanhag... [at] alanhaggai.org (Alan Haggai Alavi)
wrote:
> > ...
> > #!usr/bin/perl
>
> For increased portability, use the shebang #!/usr/bin/env perl
>
Hm, portable only in limited situations, risky,
and always slower.
From:
http://www.webmasterkb.com/Uwe/Forum.aspx/perl/3968/env-perl -or-simply-per=
l
Randal Schwartz's response from above thread:
...
Seconded on the "reduced portability". The shebang
path has to be accurate. Throwing "env" into the mix
means that env has to exist at a known location. Some
systems don't have it, some system have it at /bin/env,
and some systems have it at /usr/bin/env... so you're
only portable within a subset of machines.
Also, you're then also doing a double-launch. First, the
kernel launches env, then env has to wake up and figure
out where Perl is, and then launch that.
And, if that wasn't enough, you risk that your script will be
run by a privately installed Perl when someone else runs
it... which might not have the right version or the right
installed modules. An explicit #! path never depends on
PATH, so that's not an issue.
So, in general, avoid this strategy when you can.
--
Charles DeRykus
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/