
Text Manipulation
Hi, I am trying to modify a LaTex file which is plain text.
The file contains lines similar to the following, but each line is
followed by text, so that:
Article 1 Cats
Article 2 Dogs
Article 3 Fish
Article 4 Ferrets
etc.
I would like to modify the file so that each referenced line is
changed as follows:
\subsection*{Article 1 Cats}
\subsection*{Article 2 Dogs}
\subsection*{Article 3 Fish}
\subsection*{Article 4 Ferrets}
Here's code which was suggested to me, but when I execute it I'm
returned to the command line and nothing happens:
#!/usr/bin/perl
s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
I called this script "Article" and saved it as article.pl
The usage then was $perl article.pl oldfile.tex newfile.tex
and I got the following error message:
Missing braces on \N{} at article.pl line 2, within pattern
Nested quantifiers in regex; marked by <-- HERE in m/^(Article\s+[0-9]+
\s+* <-- HERE \S)/ at article.pl line 2.
So I added braces as follows: \{N}
I ran the script again; there was no error message but no newfile.tex
either.
I then ran the program as follows:
$perl article.pl oldfile
And the response is the command prompt and the file hasn't been
modified.
any suggestions?
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
--0-1685276802-1297007529=:79715
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Dear zavierz
probably you can try this
$I=3D~s/^(Article\s+[0-9]+\s+[\A-Z_a-z]+\S)/\\subsection*{$1 }/g;
print "$I";
--- On Sat, 5/2/11, zavierz <zavierz [at] gmail.com> wrote:
From: zavierz <zavierz [at] gmail.com>
Subject: Text Manipulation
To: beginners [at] perl.org
Date: Saturday, 5 February, 2011, 8:41 PM
Hi, I am trying to modify a LaTex file which is plain text.
The file contains lines similar to the following, but each line is
followed by text, so that:
Article 1=A0 Cats
Article 2=A0 Dogs
Article 3=A0 Fish
Article 4=A0 Ferrets
etc.
I would like to modify the file so that each referenced line is
changed as follows:
\subsection*{Article 1 Cats}
\subsection*{Article 2 Dogs}
\subsection*{Article 3 Fish}
\subsection*{Article 4 Ferrets}
Here's code which was suggested to me, but when I execute it I'm
returned to the command line and nothing happens:
#!/usr/bin/perl
s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
I called this script "Article" and saved it as article.pl
The usage then was $perl article.pl oldfile.tex newfile.tex
and I got the following error message:
Missing braces on \N{} at article.pl line 2, within pattern
Nested quantifiers in regex; marked by <-- HERE in m/^(Article\s+[0-9]+
\s+* <-- HERE \S)/ at article.pl line 2.
So I added braces as follows:=A0 \{N}
I ran the script again; there was no error message but no newfile.tex
either.
I then ran the program as follows:
$perl article.pl oldfile
And the response is the command prompt and the file hasn't been
modified.
any suggestions?
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
=0A=0A
--0-1685276802-1297007529=:79715--
Re: Text Manipulation
On 2011-02-05 16:11, zavierz wrote:
> s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
Simplified:
s/^( Article \s+ [0-9]+ .* \S )
/\\subsection*{$1}/gmx;
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
On 2011-02-06 16:52, mani kandan wrote:
> $I=~s/^(Article\s+[0-9]+\s+[\A-Z_a-z]+\S)/\\subsection*{$1}/ g;
> print "$I";
That is weird advice in many ways:
1. A variable $I, what is it? Why capital?
2. No spaces around '=~', why?
3. You took out the m-modifier, why? (now it will only match (and
substitute) the first)
4. [\A-Z_a-z], why did you backslash the A? If \A ever becomes special,
that could change the meaning. Why is the underscore included?
5. You made the \S superfluous, why didn't you remove it?
6. Why did you quote $I for printing?
All in all quite ugly.
--
Ruud
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
On Feb 5, 7:11=A0am, zavi... [at] gmail.com (zavierz) wrote:
> Hi, I am trying to modify a LaTex file which is plain text.
> The file contains lines similar to the following, but each line is
> followed by text, so that:
>
> Article 1 =A0Cats
> Article 2 =A0Dogs
> Article 3 =A0Fish
> Article 4 =A0Ferrets
>
> etc.
>
> I would like to modify the file so that each referenced line is
> changed as follows:
>
> \subsection*{Article 1 Cats}
> \subsection*{Article 2 Dogs}
> \subsection*{Article 3 Fish}
> \subsection*{Article 4 Ferrets}
>
> Here's code which was suggested to me, but when I execute it I'm
> returned to the command line and nothing happens:
>
> #!/usr/bin/perl
> s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
^^^
s/^(Article\s+[0-9]+\s+\N*)/\\subsection*{$1}/gm"
I believe the problem is the extraneous \S. The regex
should be:
either: ^(Article\s+[0-9]+\s+\N*)
or: ^(Article\s+[0-9]+\s+\S*)
BTW, \N is a newer regex experimental escape.
From perldoc perlre:
\N Any character but \n (experimental)
> ...
--
Charles DeRykus
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
On 05/02/2011 15:11, zavierz wrote:
> Hi, I am trying to modify a LaTex file which is plain text.
> The file contains lines similar to the following, but each line is
> followed by text, so that:
>
> Article 1 Cats
> Article 2 Dogs
> Article 3 Fish
> Article 4 Ferrets
>
> etc.
>
> I would like to modify the file so that each referenced line is
> changed as follows:
>
> \subsection*{Article 1 Cats}
> \subsection*{Article 2 Dogs}
> \subsection*{Article 3 Fish}
> \subsection*{Article 4 Ferrets}
>
> Here's code which was suggested to me, but when I execute it I'm
> returned to the command line and nothing happens:
>
> #!/usr/bin/perl
> s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
>
> I called this script "Article" and saved it as article.pl
> The usage then was $perl article.pl oldfile.tex newfile.tex
>
> and I got the following error message:
>
> Missing braces on \N{} at article.pl line 2, within pattern
> Nested quantifiers in regex; marked by<-- HERE in m/^(Article\s+[0-9]+
> \s+*<-- HERE \S)/ at article.pl line 2.
>
> So I added braces as follows: \{N}
>
> I ran the script again; there was no error message but no newfile.tex
> either.
> I then ran the program as follows:
>
> $perl article.pl oldfile
>
> And the response is the command prompt and the file hasn't been
> modified.
>
> any suggestions?
I would propose an altogether different solution. Take a look at the
program below and see if it helps you.
Rob
use strict;
use warnings;
while(<DATA>) {
my [at] data = split;
print "\\subsection*{ [at] data}\n"
}
__DATA__
Article 1 Cats
Article 2 Dogs
Article 3 Fish
Article 4 Ferrets
**OUTPUT**
\subsection*{Article 1 Cats}
\subsection*{Article 2 Dogs}
\subsection*{Article 3 Fish}
\subsection*{Article 4 Ferrets}
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
At 07:11 -0800 05/02/2011, zavierz wrote:
>Here's code which was suggested to me, but when I execute it I'm
>returned to the command line and nothing happens:
>
>#!/usr/bin/perl
>s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
>
>I called this script "Article" and saved it as article.pl
>The usage then was $perl article.pl oldfile.tex newfile.tex
What was "suggested to you" is simply a substitution; it can't do
anything unless it has something to do it to.
Your command gives two arguments ( [at] ARGV, or, if you like, $ARGV[0],
$ARGV[1]) to your script but the script itself makes no reference to
these arguments and they are completely ignored.
You need to open your in-file (oldfile.tex = $ARGV[0] => $fin) for
reading and open/create your out-file for writing (newfile.tex =
$ARGV[1] => $fout). Each line you read from $fin in the while loop
becomes $_ and you do the substitutions before writing it to $fout.
#!/usr/bin/perl
use strict;
my ($fin, $fout) = [at] ARGV;
open FIN, $fin;
open FOUT, ">$fout";
while (<FIN>) {
s/^( Article \s+ [0-9]+ .* \S )
/\\subsection*{$1}/gmx;
print FOUT;
# + if you want to see what you've written
# displayed in terminal/console:
print STDOUT;
}
# JD
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
Hi John,
a few comments on your code.
On Monday 07 Feb 2011 13:50:57 John Delacour wrote:
> At 07:11 -0800 05/02/2011, zavierz wrote:
> >Here's code which was suggested to me, but when I execute it I'm
> >returned to the command line and nothing happens:
> >
> >#!/usr/bin/perl
> >s/^(Article\s+[0-9]+\s+\N*\S)/\\subsection*{$1}/gm
> >
> >I called this script "Article" and saved it as article.pl
> >
> >
> >The usage then was $perl article.pl oldfile.tex newfile.tex
>
> What was "suggested to you" is simply a substitution; it can't do
> anything unless it has something to do it to.
>
> Your command gives two arguments ( [at] ARGV, or, if you like, $ARGV[0],
> $ARGV[1]) to your script but the script itself makes no reference to
> these arguments and they are completely ignored.
>
> You need to open your in-file (oldfile.tex = $ARGV[0] => $fin) for
> reading and open/create your out-file for writing (newfile.tex =
> $ARGV[1] => $fout). Each line you read from $fin in the while loop
> becomes $_ and you do the substitutions before writing it to $fout.
>
Actually, see perldoc perlrun - http://perldoc.perl.org/perlrun.html - by
giving -p and -i (untested) you can replace the contents of a file "in-place".
Now for some comments on your code.
>
> #!/usr/bin/perl
> use strict;
Add "use warnings;" too.
> my ($fin, $fout) = [at] ARGV;
It's great that you unpack [at] ARGV like that instead of using $ARGV[0], $ARGV[1]
etc. But please say something like $in_fn and $out_fn or something like that.
> open FIN, $fin;
> open FOUT, ">$fout";
1. Don't use bareword filehandles.
2. Use three-args-open.
3. Always append or die.
See:
http://perl-begin.org/tutorials/bad-elements/#open-function- style
> while (<FIN>) {
To avoid $_ getting tempered with it's a good idea to use an explicit $line
variable.
> s/^( Article \s+ [0-9]+ .* \S )
> /\\subsection*{$1}/gmx;
> print FOUT;
> # + if you want to see what you've written
> # displayed in terminal/console:
> print STDOUT;
> }
>
Regards,
Shlomi Fish
> # JD
--
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
Escape from GNU Autohell - http://www.shlomifish.org/open-
source/anti/autohell/
Chuck Norris can make the statement "This statement is false" a true one.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
At 14:22 +0200 07/02/2011, Shlomi Fish wrote:
>Hi John,
>
>a few comments on your code.
>Actually, see perldoc perlrun - http://perldoc.perl.org/perlrun.html - by
>giving -p and -i (untested) you can replace the contents of a file "in-place".
untested?! Why don't you test it before recommending it to others?
>Now for some comments on your code.
You said that already.
> > #!/usr/bin/perl
>> use strict;
>
>Add "use warnings;" too.
I certainly will if I need them, Shlomi.
> > my ($fin, $fout) = [at] ARGV;
>
>It's great that you unpack [at] ARGV like that instead of using $ARGV[0], $ARGV[1]
>etc. But please say something like $in_fn and $out_fn or something like that.
Thanks for the kudos. I fail to see how "something like $in_fn" is
any more or less obscure than $fin.
>
>> open FIN, $fin;
>> open FOUT, ">$fout";
>
>1. Don't use bareword filehandles.
Don't just tell us; give us an example of an unbareword filehandle
and tell us why it's universally preferable.
>2. Use three-args-open.
Ditto. If you want arguments you've come to the right place.
>3. Always append or die.
Fair enough, I normally do even when, as in my example, I am not appending.
> > while (<FIN>) {
>
>To avoid $_ getting tempered with it's a good idea to use an explicit $line
>variable.
You may think so and thousands of others may not. And while you're
at it, get yourself a decent mail program that does format=flowed.
<http://www.ietf.org/rfc/rfc3676.txt>
and since I'm subscribed to the list, I don't need a private copy of
the message either.
JD
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/
Re: Text Manipulation
Hi John,
On Monday 07 Feb 2011 16:18:32 John Delacour wrote:
> At 14:22 +0200 07/02/2011, Shlomi Fish wrote:
> >Hi John,
> >
> >a few comments on your code.
> >
> >
> >Actually, see perldoc perlrun - http://perldoc.perl.org/perlrun.html - by
> >giving -p and -i (untested) you can replace the contents of a file
> >"in-place".
>
> untested?! Why don't you test it before recommending it to others?
>
Well, I apologise for not testing the exact incantation. I overlooked that.=
I've used the -p and -i flags for a long time now, having utilised this
knowledge in Perl golf competitions, and found it of use in my command line=
work.
Here's a working example, although a bit silly one:
[quote]
shlomif:~$ cat test.txt
Hello World!
shlomif:~$ perl -p -i.bak -e 's/World/Friend/' test.txt
shlomif:~$ cat test.txt
Hello Friend!
[/quote]
cat is a UNIX command, so if you're working on a non-UNIX system, then use =
the
equivalent command (and single-quotes are not supported by CMD.EXE, etc.).
I should note that it's not the first time I've seen people use ("untested"=
)
when giving code samples. "Beware of bugs in the above code - I have only
proven it correct, not tested it." -- Don Knuth.
> >Now for some comments on your code.
>
> You said that already.
>
I apologise for being redundant.
> > > #!/usr/bin/perl
> >>
> >> use strict;
> >
> >Add "use warnings;" too.
>
> I certainly will if I need them, Shlomi.
Why do you feel they are not needed in this case?
>
> > > my ($fin, $fout) =3D [at] ARGV;
> >
> >It's great that you unpack [at] ARGV like that instead of using $ARGV[0],
> >$ARGV[1] etc. But please say something like $in_fn and $out_fn or
> >something like that.
>
> Thanks for the kudos. I fail to see how "something like $in_fn" is
> any more or less obscure than $fin.
Well, fn is a common abbreviation for filename. On the other hand the f in=
"fin" can mean file-handle, file name, file, contents of a file, etc. I
recommend avoiding calling things file:
http://perl-begin.org/tutorials/bad-elements/#calling-variab les-file
>
> >> open FIN, $fin;
> >> open FOUT, ">$fout";
> >
> >1. Don't use bareword filehandles.
>
> Don't just tell us; give us an example of an unbareword filehandle
> and tell us why it's universally preferable.
>
> >2. Use three-args-open.
>
> Ditto. If you want arguments you've come to the right place.
>
> >3. Always append or die.
>
> Fair enough, I normally do even when, as in my example, I am not appendin=
g.
>
I meant add the text after the open the =ABopen my $fh, ">", $filename or d=
ie
"Cannot open the file '$filename' for appending";=BB , not necessarily when=
opening a file in append mode.
> > > while (<FIN>) {
> >
> >To avoid $_ getting tempered with it's a good idea to use an explicit
> >$line variable.
>
> You may think so and thousands of others may not.
Granted. I agree that in quick and dirty scripts, I sometimes use $_ (but
often end up regretting it), but I've explained why using a lexical variabl=
e
is better than using $_ which can be changed implicitly and explicitly in m=
any
ways. $_ is useful when being used in built-in functions such as grep or ma=
p,
and in related functions as those provided by List::Util , List::MoreUtils =
and
List::UtilBy , but even then sometimes it's a good idea to assign it to a
lexical (especially if you nest such functions.)
> And while you're
> at it, get yourself a decent mail program that does format=3Dflowed.
>
> <http://www.ietf.org/rfc/rfc3676.txt>
>
It's the first time I've heard of format=3Dflowed. I've heard very few
complaints about my E-mails even from very netiquette conscious people, sin=
ce
I'm using KDE's KMail, which is very standards compliant. Nevertheless, I'l=
l
try to investigate about this format=3Dflowed thing and see if KMail can be=
made
to support it. Part of KMail's appeal to me is that it handles Unicode and=
Bidirectionality very well, which is important for me as an Israeli who nee=
ds
to communicate in both the Hebrew and Latin alphabets (and may sometimes ne=
ed
to handle Arabic as well). And I've seen many "crimes" against E-mail
netiquette recently from many people.
> and since I'm subscribed to the list, I don't need a private copy of
> the message either.
>
Well, I apologise for that. I realise that whether to CC one on a reply to =
his
post to a mailing list is an ongoing debate and that some people prefer it=
this way and the others prefer it the other way. The reason I CC people on=
their replies is primarily because by default I hit "A" on KMail to reply t=
o
all which is useful in case there are several mailing lists in the recipien=
ts,
or it's a private E-mail with several recipients. If I want to reply in
private, I hover over the email address with my mouse, invoke the context
menu, and hit reply to all. One thing that annoys me about the GMail.com we=
b-
interface is that by default it replies only to the original recipient and =
not
to all recipients, which tends to end up in my inbox, even if it was intend=
ed
for public consumption. This is why I have the last line in my signature, a=
nd
why when I find a need to reply in private, I normally mention it in my E-m=
ail
so people won't think I did it by accident (and would recommend everyone to=
do
the same.)
I also actually appreciate a copy arriving at my inbox and many people will=
prefer it too, and I don't feel this issue is as clear-cut as most others.
Regards,
Shlomi Fish
=2D-
=2D--------------------------------------------------------- -------
Shlomi Fish http://www.shlomifish.org/
"Humanity" - Parody of Modern Life - http://shlom.in/humanity
Chuck Norris can make the statement "This statement is false" a true one.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe [at] perl.org
For additional commands, e-mail: beginners-help [at] perl.org
http://learn.perl.org/