Convert literature string via Regular Expressions
--=-0pYwQIU1EBYW409OcWV6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Hi all,
I'm having difficulties getting the following literature strings ripped
to prepare it to be inserted into the database.
Here 2 example strings:
Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die Weibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distinguish the females of the Sphegina species known from Germany (Diptera, Syrphidae).] - Entomology 2(1/2), 3-19. London.
Mazánek, L., Láska, P., Bicik, V. (1999): Two new Palaearctic species of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucella 4, 1-9. Stuttgart.
Pattern is like this:
Author(s) (year): Title in German or English. [If filled than former
title was a German one and this one is the English translation.] -
Source issue, pages. City.
Author:
Year:
Title EN or DE:
Title EN:
Source:
Issue:
Pages:
Press City:
I tried something like this:
preg_match ("/^[..something..]+/", $string, $regs);
echo ("Author: ".$regs[1]."
");
echo ("Year: ".$regs[2]."
");
echo ("Title EN or DE: ".$regs[3]."
");
echo ("Title EN: ".$regs[4]."
");
echo ("Source: ".$regs[5]."
");
echo ("Issue: ".$regs[6]."
");
echo ("Pages: ".$regs[7]."
");
echo ("Press City: ".$regs[8]."
");
But I'm having problems with the spaces and the parentheses that I
somehow can't use in the matching...
Any idea how to split the string in the appropriate parts?
Many thanks,
Bastiaan
--
Bastiaan Wakkie <bastiaaw [at] dds.nl>
www.syrphidae.com
--=-0pYwQIU1EBYW409OcWV6
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/1.1.8">
</HEAD>
<BODY>
Hi all,<BR>
<BR>
I'm having difficulties getting the following literature strings ripped to prepare it to be inserted into the database.<BR>
<BR>
Here 2 example strings:<BR>
<BR>
<PRE>Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die Weibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distinguish the females of the Sphegina species known from Germany (Diptera, Syrphidae).] - Entomology 2(1/2), 3-19. London.
Mazánek, L., Láska, P., Bicik, V. (1999): Two new Palaearctic species of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucella 4, 1-9. Stuttgart.</PRE>
<BR>
Pattern is like this:<BR>
Author(s) (year): Title in German or English. [If filled than former title was a German one and this one is the English translation.] - Source issue, pages. City.<BR>
<BR>
Author: <BR>
Year: <BR>
Title EN or DE: <BR>
Title EN: <BR>
Source: <BR>
Issue: <BR>
Pages: <BR>
Press City:<BR>
<BR>
I tried something like this:
<PRE> preg_match ("/^[..something..]+/", $string, $regs);
echo ("Author: ".$regs[1]."<br />");
echo ("Year: ".$regs[2]."<br />");
echo ("Title EN or DE: ".$regs[3]."<br />");
echo ("Title EN: ".$regs[4]."<br />");
echo ("Source: ".$regs[5]."<br />");
echo ("Issue: ".$regs[6]."<br />");
echo ("Pages: ".$regs[7]."<br />");
echo ("Press City: ".$regs[8]."<br />");</PRE>
But I'm having problems with the spaces and the parentheses that I somehow can't use in the matching...<BR>
Any idea how to split the string in the appropriate parts?<BR>
<BR>
Many thanks,<BR>
Bastiaan<BR>
<BR>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
-- <BR>
Bastiaan Wakkie <<A HREF="mailto:bastiaaw [at] dds.nl">bastiaaw [at] dds.nl</A>><BR>
www.syrphidae.com
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
--=-0pYwQIU1EBYW409OcWV6--
Re: Convert literature string via Regular Expressions
--=-cwXYTIUkJeGTL5ZmxROV
Content-Type: multipart/alternative; boundary="=-pDU6lV73QCltmLDT065o"
--=-pDU6lV73QCltmLDT065o
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by hosting.commandprompt.com id hACJlMu04076
Hey,
I Found it out at the end! Thanks anyway.
Here the code to find out for someone that is interested:
<?php
$filename=3D"Literature.txt";
$handle =3D fopen ($filename, "r");
while (!feof ($handle)) {
$buffer =3D fgets($handle, 1024);
echo ("<hr><p>".$buffer."<>");
//.....................Author.................Year.......... .......Titl=
e DE or EN.............EN Title..............................Source.....i=
ssue...........pages......Press
$match=3Dpreg_match ("/^([\w=DC=D6äüß=E9=F1=F3=ED=E1,.-\s]*)\(([\=
d]{4})\)*:\s([\)\(=DC=D6äüß=E9=F1=F3=ED=E1,.:&?!-\w\s] *)\[([\)\(=DC=
=D6äüß=E9=F1=F3=ED=E1,.:&?!-=E9\w\s]*)\]\s-\s([\s\w]*) \s([\(\/\)\d]=
*),\s([\d-]*).\s([\w]*)/", $buffer, $regs);
if ($match){
echo ("<p>Matched till now in: $line <br>------------------------> <i=
>".$regs[0]."</i>");
echo ("<table border=3D\"2\"><tr><td>Author:</td><td><i>".$regs[1]."</=
i></td></tr>");
echo ("<tr><td>Year:</td><td><i>".$regs[2]."</i></td></tr>");
echo ("<tr><td>Title EN or DE:</td><td><i>".$regs[3]."</i></td></tr>")=
;
echo ("<tr><td>Title EN:</td><td><i>".$regs[4]."</i></td></tr>");
echo ("<tr><td>Source:</td><td><i>".$regs[5]."</i></td></tr>");
echo ("<tr><td>Issue:</td><td><i>".$regs[6]."</i></td></tr>");
echo ("<tr><td>Pages:</td><td><i>".$regs[7]."</i></td></tr>");
echo ("<tr><td>Press City:</td><td><i>".$regs[8]."</i></td></tr></tabl=
e>");
}
else{
echo "<div style=3D\"color:red\">String did not match!</div></p>";
}
}
fclose ($handle);
?>
Cool he! ;-) I'm starting to like regular expressions. So now I can
happily import 1000 new rows without any problem.
bye,
Bastiaan
On Mon, 2003-11-10 at 16:50, Bastiaan Wakkie wrote:
> Hi all,
>
> I'm having difficulties getting the following literature strings
> ripped to prepare it to be inserted into the database.
>
> Here 2 example strings:
>
>
>
> Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die W=
eibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distin=
guish the females of the Sphegina species known from Germany (Diptera, Sy=
rphidae).] - Entomology 2(1/2), 3-19. London.
> Maz=E1nek, L., L=E1ska, P., Bicik, V. (1999): Two new Palaearctic speci=
es of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucel=
la 4, 1-9. Stuttgart.
>
> Pattern is like this:
> Author(s) (year): Title in German or English. [If filled than former
> title was a German one and this one is the English translation.] -
> Source issue, pages. City.
>
> Author:
> Year:
> Title EN or DE:
> Title EN:
> Source:
> Issue:
> Pages:
> Press City:
>
> I tried something like this:
>
> preg_match ("/^[..something..]+/", $string, $regs);
>
> echo ("Author: ".$regs[1]."
");
> echo ("Year: ".$regs[2]."
");
> echo ("Title EN or DE: ".$regs[3]."
");
> echo ("Title EN: ".$regs[4]."
");
> echo ("Source: ".$regs[5]."
");
> echo ("Issue: ".$regs[6]."
");
> echo ("Pages: ".$regs[7]."
");
> echo ("Press City: ".$regs[8]."
");
>
> But I'm having problems with the spaces and the parentheses that I
> somehow can't use in the matching...
> Any idea how to split the string in the appropriate parts?
>
> Many thanks,
> Bastiaan
>
> --
> Bastiaan Wakkie <bastiaaw [at] dds.nl>
> www.syrphidae.com
--
Bastiaan Wakkie <bastiaaw [at] dds.nl>
www.syrphidae.com
--=-pDU6lV73QCltmLDT065o
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/1.1.8">
</HEAD>
<BODY>
Hey,<BR>
<BR>
I Found it out at the end! Thanks anyway.<BR>
<BR>
Here the code to find out for someone that is interested:
<PRE><?php
$filename="Literature.txt";
$handle = fopen ($filename, "r");
while (!feof ($handle)) {
$buffer = fgets($handle, 1024);
echo ("<hr><p>".$buffer."<>");
//.....................Author.................Year.......... .......Title DE or EN.............EN Title..............................Source.....issue......... ..pages......Press
$match=preg_match ("/^([\wÜÖäüßéñóíá,.-\s]*)\(([\d]{4})\)*:\s([\)\(ÜÖäüßéñóí á,.:&?!-\w\s]*)\[([\)\(ÜÖäüßéñóíá,.:&?!-é \w\s]*)\]\s-\s([\s\w]*)\s([\(\/\)\d]*),\s([\d-]*).\s([\w]*)/ ", $buffer, $regs);
if ($match){
echo ("<p>Matched till now in: $line <br>------------------------> <i>".$regs[0]."</i>");
echo ("<table border=\"2\"><tr><td>Author:</td><td><i>".$regs[1]." </i></td></tr>");
echo ("<tr><td>Year:</td><td><i>".$regs[2]."</i></td></tr >");
echo ("<tr><td>Title EN or DE:</td><td><i>".$regs[3]."</i></td></tr>");
echo ("<tr><td>Title EN:</td><td><i>".$regs[4]."</i></td></tr>");
echo ("<tr><td>Source:</td><td><i>".$regs[5]."</i></td></tr >");
echo ("<tr><td>Issue:</td><td><i>".$regs[6]."</i></td></tr >");
echo ("<tr><td>Pages:</td><td><i>".$regs[7]."</i></td></tr >");
echo ("<tr><td>Press City:</td><td><i>".$regs[8]."</i></td></tr></table>");
}
else{
echo "<div style=\"color:red\">String did not match!</div></p>";
}
}
fclose ($handle);
?></PRE>
<BR>
<BR>
Cool he! <IMG SRC="cid:1068580529.32507.1.camel [at] localhost.localdomain" ALIGN="middle" ALT=";-)" BORDER="0"> I'm starting to like regular expressions. So now I can happily import 1000 new rows without any problem. <BR>
<BR>
bye,<BR>
Bastiaan<BR>
<BR>
<BR>
On Mon, 2003-11-10 at 16:50, Bastiaan Wakkie wrote:
<BLOCKQUOTE TYPE=CITE>
<FONT COLOR="#737373" SIZE="3"><I>Hi all,<BR>
<BR>
I'm having difficulties getting the following literature strings ripped to prepare it to be inserted into the database.<BR>
<BR>
Here 2 example strings:<BR>
<BR>
<BR>
<PRE>Hauser, M., Geller-Grimm, F. (1995): Bestimmungsschlüssel für die Weibchen der deutschen Sphegina-Arten (Diptera, Syrphidae). [Key to distinguish the females of the Sphegina species known from Germany (Diptera, Syrphidae).] - Entomology 2(1/2), 3-19. London.
Mazánek, L., Láska, P., Bicik, V. (1999): Two new Palaearctic species of Eupeodes similar to E. bucculatus (Diptera, Syrphidae) [] - Volucella 4, 1-9. Stuttgart.</PRE>
Pattern is like this:<BR>
Author(s) (year): Title in German or English. [If filled than former title was a German one and this one is the English translation.] - Source issue, pages. City.<BR>
<BR>
Author: <BR>
Year: <BR>
Title EN or DE: <BR>
Title EN: <BR>
Source: <BR>
Issue: <BR>
Pages: <BR>
Press City:<BR>
<BR>
I tried something like this:
<PRE> preg_match ("/^[..something..]+/", $string, $regs);
echo ("Author: ".$regs[1]."<br />");
echo ("Year: ".$regs[2]."<br />");
echo ("Title EN or DE: ".$regs[3]."<br />");
echo ("Title EN: ".$regs[4]."<br />");
echo ("Source: ".$regs[5]."<br />");
echo ("Issue: ".$regs[6]."<br />");
echo ("Pages: ".$regs[7]."<br />");
echo ("Press City: ".$regs[8]."<br />");</PRE>
But I'm having problems with the spaces and the parentheses that I somehow can't use in the matching...<BR>
Any idea how to split the string in the appropriate parts?<BR>
<BR>
Many thanks,<BR>
Bastiaan<BR>
</I></FONT><BR>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
-- <BR>
Bastiaan Wakkie <<A HREF="mailto:bastiaaw [at] dds.nl">bastiaaw [at] dds.nl</A>><BR>
www.syrphidae.com
</TD>
</TR>
</TABLE>
</BLOCKQUOTE>
<TABLE CELLSPACING="0" CELLPADDING="0" WIDTH="100%">
<TR>
<TD>
-- <BR>
Bastiaan Wakkie <<A HREF="mailto:bastiaaw [at] dds.nl">bastiaaw [at] dds.nl</A>><BR>
www.syrphidae.com
</TD>
</TR>
</TABLE>
<BR>
<BR>
</BODY>
</HTML>
--=-pDU6lV73QCltmLDT065o--
--=-cwXYTIUkJeGTL5ZmxROV
Content-ID: <1068580529.32507.1.camel [at] localhost.localdomain>
Content-Disposition: attachment; filename=smiley-4.png
Content-Type: image/png; name=smiley-4.png
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAC/UlEQVR42n2T
f2iUdRzHX9/v8zy3u91ulnO5pXGXUUsK1yIY9ke0CDQiodbQKCOnNieaIkKI
/yRhf0QgCeEqViKUYgUOnTE9G5WtjlqBTV1bsnl0tV93u93ux3PP93meb/8s
KBNf8P7jw/v9+fz1/sD/iQI7gB7AWdApoGPB+w/ihvnI/bHKbfu2NLF2zUrM
cAw8G2d+mIGBi+w/nGE46XQBnTc7cOFAZ+MTO1/bgDBCuIVxnPwIQhgYwXqM
QA2qMMahd+O8fWLmAvAkgLGw/MGbOx55fueuF/FKKVRxHF/NIaQFgO+kcQtj
gGZ1cwyjOLXiuyG7DuiVQLSx4fat2zvX4dmTeOVphDCwvUqkWcVk2mfDnkGE
tPCdLEJabH95CfctN7YBUQk83XNoMUJaTE+McuT476x9pZfmZ3sI1D7D8fMe
P/2a4a3uGaqWt6Ly1zBDdbzadhvAowLomrn0eocqXMeoqCVU+xgaE2mYlKb6
sSINtG58gzOnTzGf/BR7+hukWUni52u07p04aS5bdudmAN/NURZ3oSe+RBhh
hLTQGsqZBKd7Pic3fhSVGwLAtaeoX2IAPGem02kQJrP5EHJRHYb0CHhZtG+C
dtFeiezIO2h3Hu17OI6DqxzyRR8AadvlXu0WqDSm6O87xkzWx6p+AF/l8N08
vmejvRLaV7jKJjNnEwwvJTXtASQk0FecOEsoUs/69RuJx8+RGOgjuLgZK7IS
zBo8X9L/w1+07RnmnqZ2RpNlrg7lAU4IILrq3sh437GnMCruIFD9IL8MfsvQ
pUE+OiO4fGWYRVUGhw+2sablIZJXPiN+cYwPT2b4LaUb/mli94Etsfatm5pw
i0kisU1Y4RUIIfHsSTQalR/lcuJj/ph0Od8/y9G46gI6/13lrzrW1bTs3Rwm
EKxGyAp8lWN2rkShpCkrzfU/FYkf53j/rPM18PjNnqn77jrZ/kJLFasaQ4SD
glJZk533GUspvjiXZSTFJ8BL3IIosBv4HlBAGrgKvAc8fGP4b4kpTGRKdd96
AAAAAElFTkSuQmCC
--=-cwXYTIUkJeGTL5ZmxROV--