
string substitution command question
--20cf307d02760b2263049d34fc8b
Content-Type: text/plain; charset=ISO-8859-1
Hi Perl users, Quick question, I have a one long string with tab delimited
values separated by a newline character (in rows)
Here is a snippet of the the string:
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
I am trying to perform substitution on some values at the end of each rows,
for example, I'm trying to replace the above string with the following:
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
Here is the substitution command I am trying to use:
$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
I don't know why I am not able to substitute at the end of each row in the
string.
Any suggestions folks have are muchly appreciated. Thanks -Rich
--20cf307d02760b2263049d34fc8b--
Re: string substitution command question
--001636c5ac815df302049d354b56
Content-Type: text/plain; charset=UTF-8
use strict;
use warnings;
while(<DATA>){
chomp;
if ($_ =~ /NM_(\d+)/){
my $found = $1;
$_ =~ s/$found/$found:12345/g;
print "$_\n";
} else {
print "$_\n";
}
}
__DATA__
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
~Parag
On Sat, Feb 26, 2011 at 12:06 PM, Richard Green <greener [at] uw.edu> wrote:
> Hi Perl users, Quick question, I have a one long string with tab delimited
> values separated by a newline character (in rows)
> Here is a snippet of the the string:
>
> chr1 ucsc exon 226488874 226488906 0.000000
> - . gene_id "NM_173083"; transcript_id "NM_173083";
> chr1 ucsc exon 226496810 226497198 0.000000
> - . gene_id "NM_173083"; transcript_id "NM_173083";
> chr1 ucsc exon 2005086 2005368 0.000000 + .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
> chr1 ucsc exon 2066701 2066786 0.000000 + .
> gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> I am trying to perform substitution on some values at the end of each rows,
> for example, I'm trying to replace the above string with the following:
>
> chr1 ucsc exon 226488874 226488906 0.000000
> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1 ucsc exon 226496810 226497198 0.000000
> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
> chr1 ucsc exon 2005086 2005368 0.000000 + .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
> chr1 ucsc exon 2066701 2066786 0.000000 + .
> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>
> Here is the substitution command I am trying to use:
>
> $data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>
> $data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>
> I don't know why I am not able to substitute at the end of each row in the
> string.
> Any suggestions folks have are muchly appreciated. Thanks -Rich
>
--001636c5ac815df302049d354b56--
Re: string substitution command question
At 12:06 -0800 26/02/2011, Richard Green wrote:
>chr1 ucsc exon 226488874 226488906 0.000000
>- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
>chr1 ucsc exon 226496810 226497198 0.000000
>- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
>chr1 ucsc exon 2005086 2005368 0.000000 + .
>gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>chr1 ucsc exon 2066701 2066786 0.000000 + .
>gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>
>Here is the substitution command I am trying to use:
>
>$data_string=~ s/$gene_id\"NM_173083\"\; transcript_id
>\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>
>$data_string=~ s/$gene_id\"NM_001033581\"\; transcript_id
>\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>
>I don't know why I am not able to substitute at the end of each row in the
>string.
What is $gene_id? Are you by any chance using '$' at the beginning
of your search pattern instead of the end?
Why are you escaping the quote marks?
Why is there no space after 'gene_id'?
JD
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
>>>>> "PK" == Parag Kalra <paragkalra [at] gmail.com> writes:
PK> use strict;
PK> use warnings;
PK> while(<DATA>){
PK> chomp;
why are you chomping here when you add in the \n later?
PK> if ($_ =~ /NM_(\d+)/){
PK> my $found = $1;
PK> $_ =~ s/$found/$found:12345/g;
many issues there. why do you test the match before making the s///? you
can ALWAYS do an s/// as it will just fail if it doesn't match.
why are you doing s/// against $_? by default it does that.
PK> print "$_\n";
PK> } else {
PK> print "$_\n";
PK> }
why are you printing the same thing in each clause? just print AFTER the
change is made?
why do you top post when you have been told to bottom post and edit the
quoted email?
uri
--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
--0015173ff7fa9f28bd049d359dd7
Content-Type: text/plain; charset=UTF-8
On Sat, Feb 26, 2011 at 12:34 PM, Uri Guttman <uri [at] stemsystems.com> wrote:
> >>>>> "PK" == Parag Kalra <paragkalra [at] gmail.com> writes:
>
> PK> use strict;
> PK> use warnings;
> PK> while(<DATA>){
> PK> chomp;
>
> why are you chomping here when you add in the \n later?
>
Agreed and corrected in the example at the bottom.
> PK> if ($_ =~ /NM_(\d+)/){
> PK> my $found = $1;
> PK> $_ =~ s/$found/$found:12345/g;
>
> many issues there. why do you test the match before making the s///? you
> can ALWAYS do an s/// as it will just fail if it doesn't match.
>
Rectified in the example at the bottom.
>
> why are you doing s/// against $_? by default it does that.
>
> PK> print "$_\n";
> PK> } else {
> PK> print "$_\n";
> PK> }
>
> why are you printing the same thing in each clause? just print AFTER the
> change is made?
>
Big mistake. I accept it. Modified in the example at the bottom.
>
>
> why do you top post when you have been told to bottom post and edit the
> quoted email?
>
Sorry. Hope this reply is better and so as the following code:
use strict;
use warnings;
while(<DATA>){
$_ =~ s/NM_(\d+)/$1:12345/g;
print;
}
__DATA__
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> uri
>
Thanks for the review
~Parag
>
> --
> Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com--
> ----- Perl Code Review , Architecture, Development, Training, Support
> ------
> --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com---------
>
--0015173ff7fa9f28bd049d359dd7--
Re: string substitution command question
--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
>>>>> "PK" == Parag Kalra <paragkalra [at] gmail.com> writes:
>> why are you doing s/// against $_? by default it does that.
you didn't rectify this one.
PK> Sorry. Hope this reply is better and so as the following code:
much better.
PK> use strict;
PK> use warnings;
PK> while(<DATA>){
PK> $_ =~ s/NM_(\d+)/$1:12345/g;
i didn't follow the request carefully. that is dropping the NM_ part.
uri
--
Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
> What is $gene_id?
> Are you by any chance using '$' at the beginning of your search pattern in=
stead of the end?
I have $ to designate the end of the row
$gene_id
>
> Why are you escaping the quote marks?
I thought it would be easier to perform substitution without them
>
> Why is there no space after 'gene_id'?
I guess there should be
On Feb 26, 2011, at 12:30 PM, John Delacour <johndelacour [at] gmail.com> wrote:
> At 12:06 -0800 26/02/2011, Richard Green wrote:
>
>> chr1 ucsc exon 226488874 226488906 0.000000
>> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345=
";
>> chr1 ucsc exon 226496810 226497198 0.000000
>> - . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345=
";
>> chr1 ucsc exon 2005086 2005368 0.000000 + .
>> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>> chr1 ucsc exon 2066701 2066786 0.000000 + .
>> gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
>>
>> Here is the substitution command I am trying to use:
>>
>> $data_string=3D~ s/$gene_id\"NM_173083\"\; transcript_id
>> \"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
>>
>> $data_string=3D~ s/$gene_id\"NM_001033581\"\; transcript_id
>> \"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
>>
>> I don't know why I am not able to substitute at the end of each row in th=
e
>> string.
>
> What is $gene_id? Are you by any chance using '$' at the beginning of you=
r search pattern instead of the end?
>
> Why are you escaping the quote marks?
>
> Why is there no space after 'gene_id'?
>
> JD
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
--001636eef065751067049d35cb01
Content-Type: text/plain; charset=UTF-8
On Sat, Feb 26, 2011 at 12:56 PM, Uri Guttman <uri [at] stemsystems.com> wrote:
> >>>>> "PK" == Parag Kalra <paragkalra [at] gmail.com> writes:
>
> >> why are you doing s/// against $_? by default it does that.
>
> you didn't rectify this one.
>
Oops. Missed that.
>
>
> PK> Sorry. Hope this reply is better and so as the following code:
>
> much better.
>
Thanks.
>
> PK> use strict;
> PK> use warnings;
> PK> while(<DATA>){
> PK> $_ =~ s/NM_(\d+)/$1:12345/g;
>
> i didn't follow the request carefully. that is dropping the NM_ part.
>
Good catch.
use strict;
use warnings;
while(<DATA>){
s/NM_(\d+)/NM_$1:12345/g;
print;
}
__DATA__
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
>
> uri
>
>
Thanks once again.
~Parag
> --
> Uri Guttman ------ uri [at] stemsystems.com -------- http://www.sysarch.com--
> ----- Perl Code Review , Architecture, Development, Training, Support
> ------
> --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com---------
>
--001636eef065751067049d35cb01--
Re: string substitution command question
At 12:57 -0800 26/02/2011, Richard Green wrote:
> > What is $gene_id?
>> Are you by any chance using '$' at the beginning of your search
>>pattern instead of the end?
>I have $ to designate the end of the row
>$gene_id
$gene_id designates $gene_id period.
> > Why are you escaping the quote marks?
>I thought it would be easier to perform substitution without them
What made you think that?
> > Why is there no space after 'gene_id'?
>I guess there should be
You can guess as much as you like but Perl Regular Expressions don't
care what you think or what you guess. Read perlvar and pelretut.
JD
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: string substitution command question
Ok JD thanks
On Feb 26, 2011, at 3:46 PM, John Delacour <johndelacour [at] gmail.com> wrote:
> At 12:57 -0800 26/02/2011, Richard Green wrote:
>
>
>> > What is $gene_id?
>>> Are you by any chance using '$' at the beginning of your search pattern i=
nstead of the end?
>> I have $ to designate the end of the row
>> $gene_id
>
> $gene_id designates $gene_id period.
>
>> > Why are you escaping the quote marks?
>> I thought it would be easier to perform substitution without them
>
> What made you think that?
>
>> > Why is there no space after 'gene_id'?
>> I guess there should be
>
> You can guess as much as you like but Perl Regular Expressions don't care w=
hat you think or what you guess. Read perlvar and pelretut.
>
> JD
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
> For additional commands, e-mail: beginners-help [at] perl.org
> http://learn.perl.org/
>
>
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
RE: string substitution command question
Hi,
What about this solution:
use warnings;
use strict;
my $str =3D ' chr1 ucsc exon 226488874 226488906 0.000=
000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";';
my [at] patterns =3D map {/(NM_\d+)"/; $1} grep(/NM_\d+"/, split(/\n+/, $str));
my $additional =3D 12345;
foreach ( [at] patterns) {
$str =3D~ s/($_)\"/$1:$additional\"/g and $additional++;
}
print "$str\n";
Regards,
Katya
-----Original Message-----
From: Richard Green [mailto:greener [at] uw.edu]
Sent: Saturday, February 26, 2011 10:07 PM
To: beginners [at] perl.org
Subject: string substitution command question
Hi Perl users, Quick question, I have a one long string with tab delimited
values separated by a newline character (in rows)
Here is a snippet of the the string:
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083"; transcript_id "NM_173083";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581"; transcript_id "NM_001033581";
I am trying to perform substitution on some values at the end of each rows,
for example, I'm trying to replace the above string with the following:
chr1 ucsc exon 226488874 226488906 0.000000
- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1 ucsc exon 226496810 226497198 0.000000
- . gene_id "NM_173083:12345"; transcript_id "NM_173083:12345";
chr1 ucsc exon 2005086 2005368 0.000000 + .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
chr1 ucsc exon 2066701 2066786 0.000000 + .
gene_id "NM_001033581:12346"; transcript_id "NM_001033581:12346";
Here is the substitution command I am trying to use:
$data_string=3D~ s/$gene_id\"NM_173083\"\; transcript_id
\"NM_173083\"\;/\"NM_173083:12345\"\; \"NM_173083:12345\"\;/g;
$data_string=3D~ s/$gene_id\"NM_001033581\"\; transcript_id
\"NM_001033581\"\;/\"NM_001033581:12346\"\; \"NM_001033581:12346\"\;/g;
I don't know why I am not able to substitute at the end of each row in the
string.
Any suggestions folks have are muchly appreciated. Thanks -Rich
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/