Help extracting something from a string

I am having a hard time figuring this one out as the records I am
asked to work with "seem" rather arbitrary.

I have a stream of text and I need to extract a filename in the form
(bash wildcards) "*-*-*-*-*-*.pdf"
including the double-quotes, the characters surrounding it could be
anything at all.

I won't get into how I have tried to do this so far but let's just say
cut isn't cutting it and I am pretty unskilled with sed apparently.

any help is appreciated.
bone [ Mo, 19 November 2007 18:03 ] [ ID #1873838 ]

Re: Help extracting something from a string

On 11/19/2007 11:03 AM, bone wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.
>
> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> any help is appreciated.

man grep

If that doesn't do it, post some sample input and expected output.

Ed.
Ed Morton [ Mo, 19 November 2007 23:02 ] [ ID #1873853 ]

Re: Help extracting something from a string

On Nov 19, 10:03 am, bone <dropdeads... [at] gmail.com> wrote:
> I am having a hard time figuring this one out as the records I am
> asked to work with "seem" rather arbitrary.
>
> I have a stream of text and I need to extract a filename in the form
> (bash wildcards) "*-*-*-*-*-*.pdf"
> including the double-quotes, the characters surrounding it could be
> anything at all.

Do you have more that one per line? If so, that pattern will not work.
Consider that the pattern is:
"*-*.pdf"

Then, the whole line will match the pattern:

"a-b.pdf" junk junk junk junk junk junk "b-c.pdf"

> I won't get into how I have tried to do this so far but let's just say
> cut isn't cutting it and I am pretty unskilled with sed apparently.


If you insist on space separation, and disallow spaces in the
filename, the following will work:

while read i
do
case "$i" in
\"*-*-*-*-*.pdf\")
echio $i;;
esac
done

If you want to allow spaces, and you have only one per line, this sed
script will do:
sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'
Edward Rosten [ Mo, 19 November 2007 23:06 ] [ ID #1873855 ]

Re: Help extracting something from a string

On Nov 19, 5:06 pm, Edward Rosten <Edward.Ros... [at] gmail.com> wrote:
> On Nov 19, 10:03 am, bone <dropdeads... [at] gmail.com> wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> Do you have more that one per line? If so, that pattern will not work.
> Consider that the pattern is:
> "*-*.pdf"
>
> Then, the whole line will match the pattern:
>
> "a-b.pdf" junk junk junk junk junk junk "b-c.pdf"
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> If you insist on space separation, and disallow spaces in the
> filename, the following will work:
>
> while read i
> do
> case "$i" in
> \"*-*-*-*-*.pdf\")
> echio $i;;
> esac
> done

I don't control the input, it will not be space delimited generally
though.

>
> If you want to allow spaces, and you have only one per line, this sed
> script will do:
> sed -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

this doesn't seem to work:

$ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
-ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

doesn't return anything
bone [ Di, 20 November 2007 20:10 ] [ ID #1874661 ]

Re: Help extracting something from a string

On Nov 19, 5:02 pm, Ed Morton <mor... [at] lsupcaemnt.com> wrote:
> On 11/19/2007 11:03 AM, bone wrote:
>
> > I am having a hard time figuring this one out as the records I am
> > asked to work with "seem" rather arbitrary.
>
> > I have a stream of text and I need to extract a filename in the form
> > (bash wildcards) "*-*-*-*-*-*.pdf"
> > including the double-quotes, the characters surrounding it could be
> > anything at all.
>
> > I won't get into how I have tried to do this so far but let's just say
> > cut isn't cutting it and I am pretty unskilled with sed apparently.
>
> > any help is appreciated.
>
> man grep
>
> If that doesn't do it, post some sample input and expected output.
>
> Ed.

I don't think grep is what I need at all.

sample input:

ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"sdg*^&(6262646294626&^*& [at] ^$*4":":#2sg


expected output:

ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf

record length is arbitrary, record seperator character is nonexistent,
what the "matching filename" is is unknown.
bone [ Di, 20 November 2007 20:18 ] [ ID #1874664 ]

Re: Help extracting something from a string

bone <dropdeadster [at] gmail.com> wrote:
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*& [at] ^$*4":":#2sg
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.

" is not a record separator?
in case it is:
awk -F\" '{print $2}'

in case it is not a separator, how do you know the file name is
'ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf' and not
'sdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf'OA ?


--
pgas [at] SDF Public Access UNIX System - http://sdf.lonestar.org
pgas [ Mi, 21 November 2007 08:25 ] [ ID #1875471 ]

Re: Help extracting something from a string

bone <dropdeadster [at] gmail.com> wrote:
> I don't think grep is what I need at all.
>
> sample input:
>
> ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
> ddf.pdf"sdg*^&(6262646294626&^*& [at] ^$*4":":#2sg
>
>
> expected output:
>
> ddfd-dfdf-dfdf-dfdf-dfd-dd-ddf.pdf
>
> record length is arbitrary, record seperator character is nonexistent,
> what the "matching filename" is is unknown.

Actually it's the first thing you should try... like

grep -o -e '"[^"]*\.pdf"'
or
tr '"' '\n' | grep '\.pdf$'

--
William Park <opengeometry [at] yahoo.ca>, Toronto, Canada
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
William Park [ Fr, 23 November 2007 00:22 ] [ ID #1876978 ]

Re: Help extracting something from a string

try sed:

echo 'ksdj68248*&^*6862834fglsdfg"ddfd-dfdf-dfdf-dfdf-dfd-dd-
ddf.pdf"hjkdlha' | sed 's/\(.*\)\(\"\)\(.*\)\(.pdf\)\(\"\)\(.*\)/\3\4/
g'

?
thomasriise [ Mo, 26 November 2007 23:37 ] [ ID #1878688 ]

Re: Help extracting something from a string

On Nov 20, 12:10 pm, bone <dropdeads... [at] gmail.com> wrote:
> On Nov 19, 5:06 pm, Edward Rosten <Edward.Ros... [at] gmail.com> wrote:

> $ echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg| sed
> -ne's/.*\(".*-.*-.*-.*-.*-.*\.pdf"\).*/\1/;tp;d;:p;p'

To see why this does not work, type:

echo ksdjfglsdfg"ddfd-dfdf-dfdf-dfdf-dfdf-dfdfd-dfdf.pdf"sdgsg

The shell is eating your "s

-Ed
Edward Rosten [ Di, 27 November 2007 22:38 ] [ ID #1879563 ]
Linux » comp.unix.shell » Help extracting something from a string

Vorheriges Thema: formatting question...
Nächstes Thema: shell redirect