Grep RE for extracting XML attribute value

If I have XML file like say

<test1>
<tagA test="hello" test1="world">
......
</test1>

What grep regular expression should I use to get the attribute test of
the tagA.
Note that their could be white space variations in the XML formatting.
<tagA test="hello" test1="world">
<tagA test ="hello" test1="world">
<tagA test= "hello" test1="world">
<tagA{one or more tab or spaces}test{multiple tab or spaces}={multiple
tab or spaces}"hello"{one or more tab or spaces} test1="world">

Thanks,
Hemant
hemant.gaur [ Do, 17 Januar 2008 14:19 ] [ ID #1910196 ]

Re: Grep RE for extracting XML attribute value

egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
extraction. I think there is no more robust way as the tag itself
might be in the separate line.
can we acheive the above with any grep (not egrep form)
hemant.gaur [ Do, 17 Januar 2008 14:34 ] [ ID #1910198 ]

Re: Grep RE for extracting XML attribute value

hemant.gaur [at] gmail.com wrote:

> egrep "([ ]+|[^|]+)test([ ]*|[^|])=" my.xml does the test attribute
> extraction.

This extracts *the whole line* where the test attribute is present, which is
not exactly what you asked in the first place (at least as I understood
it).
PK [ Do, 17 Januar 2008 15:23 ] [ ID #1910200 ]

Re: Grep RE for extracting XML attribute value

On 2008-01-17, hemant.gaur [at] gmail.com <hemant.gaur [at] gmail.com> wrote:
>
>
> If I have XML file like say
>
><test1>
><tagA test="hello" test1="world">
> .....
></test1>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
><tagA test="hello" test1="world">
><tagA test ="hello" test1="world">
><tagA test= "hello" test1="world">
><tagA{one or more tab or spaces}test{multiple tab or spaces}={multiple
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>
awk -F'>' '/<tagA/'
Bill Marcum [ Do, 17 Januar 2008 15:18 ] [ ID #1910202 ]

Re: Grep RE for extracting XML attribute value

hemant.gaur [at] gmail.com wrote:
> If I have XML file like say
>
> <test1>
> <tagA test="hello" test1="world">
> .....
> </test1>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
> <tagA test="hello" test1="world">
> <tagA test ="hello" test1="world">
> <tagA test= "hello" test1="world">
> <tagA{one or more tab or spaces}test{multiple tab or spaces}={multiple
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>

If you have a recent version of Perl installed, you may also
have "xml_grep" (a Perl script) which "greps" for XPath
expressions in XML files. Maybe that will do what you want.

-Wayne
wayne [ Do, 17 Januar 2008 18:02 ] [ ID #1910209 ]

Re: Grep RE for extracting XML attribute value

On 1/17/2008 7:19 AM, hemant.gaur [at] gmail.com wrote:
> If I have XML file like say
>
> <test1>
> <tagA test="hello" test1="world">
> .....
> </test1>
>
> What grep regular expression should I use to get the attribute test of
> the tagA.
> Note that their could be white space variations in the XML formatting.
> <tagA test="hello" test1="world">
> <tagA test ="hello" test1="world">
> <tagA test= "hello" test1="world">
> <tagA{one or more tab or spaces}test{multiple tab or spaces}={multiple
> tab or spaces}"hello"{one or more tab or spaces} test1="world">
>
> Thanks,
> Hemant
>

You might want to take a look at XMLgawk
(http://home.vrweb.de/~juergen.kahrs/gawk/XML/) especially if you forsee
yourself doing other XML processing in future.

Ed.
Ed Morton [ Do, 17 Januar 2008 22:09 ] [ ID #1910221 ]

Re: Grep RE for extracting XML attribute value

Actually, we don't require lot of XML parsing this is only the one
parse required.
I think we need awk, grep combination to get "hello" extracted from
the "test" attribute of the "tagA"
<test1>
<tagA test="hello" test1="world">
......
</test1>

Any pointers on this ?
--hemant
hemant.gaur [ Fr, 18 Januar 2008 08:10 ] [ ID #1911084 ]

Re: Grep RE for extracting XML attribute value

On 1/18/2008 1:10 AM, hemant.gaur [at] gmail.com wrote:
> Actually, we don't require lot of XML parsing this is only the one
> parse required.
> I think we need awk, grep combination to get "hello" extracted from
> the "test" attribute of the "tagA"
> <test1>
> <tagA test="hello" test1="world">
> .....
> </test1>
>
> Any pointers on this ?
> --hemant

You rarely need grep if you're using awk since awk can do anything grep can do
(albeit slower on very large files). If your awk supports REs as RSs (e.g. GNU
awk) and neither "<" nor ">" appear within tagged areas and you dont have spaces
in the values inside double quotes (e.g. test="hello world" would fail), then
you could try this:

$ cat file
<test1>
<tagA test="hello" test1="world">
<tagA test ="hello" test1="world">
<tagA test= "hello" test1="world">
<tagA test


=

hello test1="world">
</test1>
$ awk -v RS='[<>]' '/^tagA/{ gsub(/[[:space:]]*=[[:space:]]*/,"="); for
(i=1;i<=NF;i++) if (sub(/^test=/,"",$i)) print $i }' file
"hello"
"hello"
"hello"
hello

Regards,

Ed.
Ed Morton [ Fr, 18 Januar 2008 15:40 ] [ ID #1911098 ]
Linux » comp.unix.shell » Grep RE for extracting XML attribute value

Vorheriges Thema: Filtering two files with uncommon column
Nächstes Thema: Direct execution of standard output