Multiline preg_match
According to the manual, the default for preg_match is to treat the subject
string as consisting of a single "line" of characters (even if it actually
contains several newlines).
I want to match the "string" below to extract everyting from <strong> to
</div> (not inclusive).
However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns only
<strong>RCR002</strong>
Any suggestions welcome.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>
<body>
<div align="center"><img
src="../../../images/stories/rhinestone_stock/crowns/RCR002. gif" width="200"
height="200" />
<strong>RCR002</strong>
2mm & 3mm
~5.25”W x 3.25”H</div>
</body>
</html>
Re: Multiline preg_match
On Apr 16, 2:04 pm, "Paul Lautman" <paul.laut... [at] btinternet.com>
wrote:
> According to the manual, the default for preg_match is to treat the subject
> string as consisting of a single "line" of characters (even if it actually
> contains several newlines).
>
> I want to match the "string" below to extract everyting from <strong> to
> </div> (not inclusive).
> However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns only
> <strong>RCR002</strong>
>
> Any suggestions welcome.
>
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
> <title>Untitled Document</title>
> </head>
>
> <body>
> <div align="center"><img
> src="../../../images/stories/rhinestone_stock/crowns/RCR002. gif" width="200"
> height="200" />
>
> <strong>RCR002</strong>
> 2mm & 3mm
> ~5.25”W x 3.25”H</div>
> </body>
> </html>
Try,
$arr = explode("\n", $string); // Where $string is your block of text
foreach($arr as $key => $value) {
preg_match('pattern', $value, $matches[$key]);
}
Re: Multiline preg_match
Mike Camden wrote:
> On Apr 16, 2:04 pm, "Paul Lautman" <paul.laut... [at] btinternet.com>
> wrote:
>> According to the manual, the default for preg_match is to treat the
>> subject string as consisting of a single "line" of characters (even
>> if it actually contains several newlines).
>>
>> I want to match the "string" below to extract everyting from
>> <strong> to </div> (not inclusive).
>> However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns
>> only <strong>RCR002</strong>
>>
>> Any suggestions welcome.
>>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <head>
>> <meta http-equiv="Content-Type" content="text/html;
>> charset=iso-8859-1" /> <title>Untitled Document</title>
>> </head>
>>
>> <body>
>> <div align="center"><img
>> src="../../../images/stories/rhinestone_stock/crowns/RCR002. gif"
>> width="200" height="200" />
>>
>> <strong>RCR002</strong>
>> 2mm & 3mm
>> ~5.25”W x 3.25”H</div>
>> </body>
>> </html>
>
> Try,
>
> $arr = explode("\n", $string); // Where $string is your block of text
> foreach($arr as $key => $value) {
> preg_match('pattern', $value, $matches[$key]);
> }
I have managed to do it using str_replace to change all the newlines to
spaces.
However I'd really like to understand why preg_match does not behave as the
manual suggests.
Re: Multiline preg_match
On Wed, 16 Apr 2008 23:04:14 +0200, Paul Lautman =
<paul.lautman [at] btinternet.com> wrote:
> According to the manual, the default for preg_match is to treat the =
> subject
> string as consisting of a single "line" of characters (even if it =
> actually
> contains several newlines).
>
> I want to match the "string" below to extract everyting from <strong> =
to
> </div> (not inclusive).
> However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns o=
nly
> <strong>RCR002</strong>
From the preg match portion of the manual:
http://nl2.php.net/manual/en/regexp.reference.php
. =3D match any character _except_newline_ (by default)
Solution:
http://nl2.php.net/manual/en/reference.pcre.pattern.modifier s.php
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern
matches all characters, including newlines. Without it,
newlines are excluded.
In conclusion (giving you the bonus 'untill </div> non-inclusive'):
preg_match('%<strong(.*)(?=3D</div>)%s',$fc,$match);
Which leaves me to say that while I'm a fan of regexes, I've given up =
using them on HTML, because a parser does a far more reliable, clearer, =
=
and most important more robust job.
-- =
Rik Wasmus
Re: Multiline preg_match
On 17 Apr, 02:03, "Rik Wasmus" <luiheidsgoe... [at] hotmail.com> wrote:
> On Wed, 16 Apr 2008 23:04:14 +0200, Paul Lautman
>
> <paul.laut... [at] btinternet.com> wrote:
> > According to the manual, the default for preg_match is to treat the
> > subject
> > string as consisting of a single "line" of characters (even if it
> > actually
> > contains several newlines).
>
> > I want to match the "string" below to extract everyting from <strong> to
> > </div> (not inclusive).
> > However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns only
> > <strong>RCR002</strong>
>
> From the preg match portion of the manual:http://nl2.php.net/manual/en/regexp.reference.php
> . = match any character _except_newline_ (by default)
>
> Solution:http://nl2.php.net/manual/en/reference.pcre.pattern .modifiers.php
> s (PCRE_DOTALL)
> If this modifier is set, a dot metacharacter in the pattern
> matches all characters, including newlines. Without it,
> newlines are excluded.
>
> In conclusion (giving you the bonus 'untill </div> non-inclusive'):
> preg_match('%<strong(.*)(?=</div>)%s',$fc,$match);
>
> Which leaves me to say that while I'm a fan of regexes, I've given up
> using them on HTML, because a parser does a far more reliable, clearer,
> and most important more robust job.
> --
> Rik Wasmus
Thanks Rik.
In this case I don't want to parse the HTML. I want to extract a
particular chunk from many files.
Re: Multiline preg_match
Paul Lautman wrote:
>>> However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns
>>> only <strong>RCR002</strong>
>>>
.....
>>> <strong>RCR002</strong>
>>> 2mm & 3mm
>>> ~5.25”W x 3.25”H</div>
> I have managed to do it using str_replace to change all the newlines to
> spaces.
> However I'd really like to understand why preg_match does not behave as the
> manual suggests.
Please read manual about 's' modifier:
http://www.php.net/manual/en/reference.pcre.pattern.modifier s.php
Try '/<strong[^>]*>(.*)</div>/s' to extract this part
Re: Multiline preg_match
On 18 Apr, 08:03, Alexey Kulentsov <a... [at] inbox.ru> wrote:
> Paul Lautman wrote:
> >>> However my attempt at preg_match('/<strong(.*)/',$fc,$match) returns
> >>> only <strong>RCR002</strong>
>
> ....
> >>> <strong>RCR002</strong>
> >>> 2mm & 3mm
> >>> ~5.25”W x 3.25”H</div>
> > I have managed to do it using str_replace to change all the newlines to
> > spaces.
> > However I'd really like to understand why preg_match does not behave as the
> > manual suggests.
>
> Please read manual about 's' modifier:http://www.php.net/manual/en/reference.pcre.pattern .modifiers.php
>
> Try '/<strong[^>]*>(.*)</div>/s' to extract this part
Thanks, Rik pointed out that one. It was the paragraph right below the
one that I read which suggested the opposite!