postgres 9.0 crash when bringing up hot standby

--_000_09B23E7BF70425478C1330D893A722C602FEC019BDMailSVRinve ra_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable


Hello,

OS level =3D AIX 5.3 ML-8
Postgres version =3D 9.0 beta-4

I'm testing "hot standby" using "streaming WAL=2 0records". On=
trying to bring up the hot standby, I see the follow=
ing error in the log:

LOG: database system was interrupted; last kn own up at=
2010-08-05 14:46:36 EDT
LOG: entering standby mode
LOG: restored log file "000000010000000000000007" from archi=
ve
LOG: redo starts at 0/7000020
LOG: consistent recovery state reached at 0/8 000000
LOG: database system is ready to accept rea d only conn=
ections
cp: /pgarclog/pg1/000000010000000000000008: A file o r directory=
in the path name does not exist.
LOG: WAL receiver process (PID 1073206) was=2 0terminated b=
y signal 11
LOG: terminating any other active server proc esses

There is a core dump. The debugger indicates=2 0the crash =
sequence as follows:

(dbx) where
_alloc_initial_pthread(??) at 0x90000000049567c
__pth_init(??) at 0x900000000493ba4
uload(??, ??, ??, ??, ??, ??, ??, ??) at 0 x9fffffff0001954
load_64.load(??, ??, ??) at 0x90000000004686c
loadAndInit() at 0x90000000047bd7c
dlopen(??, ??) at 0x90000000011cc4c
internal_load_library(libname =3D "/apps/pg_9.0_b4/lib/p ostgresql/libpq=
walreceiver.so"), line 234 in "dfmgr.c"
load_file(filename =3D "libpqwalreceiver", restricted= 20=3D '\0'),=
line 156 in "dfmgr.c"
WalReceiverMain(), line 248 in "walreceiver.c"
AuxiliaryProcessMain(argc =3D 2, argv =3D 0x0fffff ffffffa8b8), =
line 428 in "bootstrap.c"
StartChildProcess(type =3D WalReceiverProcess), line=2 04405 in "p=
ostmaster.c"
sigusr1_handler(postgres_signal_arg =3D 30), line 42 27 in "post=
master.c"
__fd_select(??, ??, ??, ??, ??) at 0x90000000011 805c
postmaster.select(__fds =3D 5, __readlist =3D 0x0f ffffffffffd0a8,=
__writelist =3D (nil), __exceptlist =3D (nil),= 20__timeout =
=3D 0x0ffffffffffff0c0), line 229 in "time.h"
unnamed block in ServerLoop(), line 1391 in "p ostmaster.c"
unnamed block in ServerLoop(), line 1391 in "p ostmaster.c"
ServerLoop(), line 1391 in "postmaster.c"
PostmasterMain(argc =3D 1, argv =3D 0x00000001102a a4b0), line=
1092 in "postmaster.c"
main(argc =3D 1, argv =3D 0x00000001102aa4b0), l ine 188 in=
"main.c"


Any pointers on how to resolve the issue wil l be much=
appreciated.

Thanks.

Alanoly Andrews (alanolya [at] invera.com<mailto:alanolya [at] invera.com>)
Senior Software Engineer
Invera Inc.
Montreal, QC

****************************************************
This e-mail may be privileged and/or confidentia l, and the=
sender does not waive any related rights an d obligations=
.. Any distribution, use or copying of this e-mail or th=
e information it contains by other than an i ntended recip=
ient is unauthorized. If you received this e-m ail in erro=
r, please advise me (by return e-mail or oth erwise) immed=
iately.

Ce courriel est confidentiel et prot=E9g=E9. L'e xp=E9diteur n=
e renonce pas aux droits et obligations qui s'y rapporten=
t=2E Toute diffusion, utilisation ou copie de ce message =
ou des renseignements qu'il contient par une p ersonne autre=
que le (les) destinataire(s) d=E9sign=E9(s) est interdite.=
Si vous recevez ce courriel par erreur, veu illez m'en =
aviser imm=E9diatement, par retour de courriel o u par un =
autre moyen.
****************************************************


--_000_09B23E7BF70425478C1330D893A722C602FEC019BDMailSVRinve ra_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-=
microsoft-com:office:office" xmlns:w=3D"urn:schemas-micros oft-com:office:=
word" xmlns:m=3D"http://schemas.microsoft.com/office/2004/ 12/omml" xmln=
s=3D"http://www.w3.org/TR/REC-html40">

<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; charset=3Dus-=
ascii">
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered m=
edium)">
<style>
<!--
/* Font Definitions */
[at] font-face
=09{font-family:"Cambria Math";
=09panose-1:2 4 5 3 5 4 6 3 2 4;}
[at] font-face
=09{font-family:Calibri;
=09panose-1:2 15 5 2 2 2 4 3 2 4;}
[at] font-face
=09{font-family:Garamond;
=09panose-1:2 2 4 4 3 3 1 1 8 3;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
=09{margin:0in;
=09margin-bottom:.0001pt;
=09font-size:11.0pt;
=09font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
=09{mso-style-priority:99;
=09color:blue;
=09text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
=09{mso-style-priority:99;
=09color:purple;
=09text-decoration:underline;}
span.EmailStyle17
=09{mso-style-type:personal;
=09font-family:"Garamond","serif";
=09color:windowtext;
=09font-weight:bold;}
span.EmailStyle18
=09{mso-style-type:personal-reply;
=09font-family:"Times New Roman","serif";
=09color:black;
=09font-weight:normal;
=09font-style:normal;}
..MsoChpDefault
=09{mso-style-type:export-only;
=09font-size:10.0pt;}
[at] page Section1
=09{size:8.5in 11.0in;
=09margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
=09{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal><o:p> </o:p></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Hello,<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif";color:black'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>OS level =3D AIX 5.3 ML-8<o:p></o:p></span></=
b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Postgres version =3D 9.0 beta-4<o:p></o:p></spa=
n></b></p>

<p class=3DMsoNormal><span lang=3DEN-CA style=3D'font-size:14.0pt;fon=
t-family:"Times New Roman","serif";
color:black'><o:p> </o:p></span></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>I’m testing “hot standby” usi=
ng “streaming WAL records”. On
trying to bring up the <span style=3D'color:black'>hot standb=
y</span>, I see the
following error in the log:<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
database system was interrupted; last known up=2 0at 2010-08-0=
5 14:46:36 EDT<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
entering standby mode<span style=3D'color:black;background:black;mso-=
highlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
restored log file "000000010000000000000007" from archi=
ve<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
redo starts at 0/7000020<span style=3D'color:black;background:black=
;mso-highlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
consistent recovery state reached at 0/8000000<span style=3D'co=
lor:black;
background:black;mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
database system is ready to accept read only=2 0connections<sp=
an style=3D'color:
black;background:black;mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>cp:
/pgarclog/pg1/000000010000000000000008: A file or di rectory in =
the path name
does not exist.<span style=3D'color:black;background:black;mso-highli=
ght:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
WAL receiver process (PID 1073206) was terminate d by signal=
11<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>LOG: 
terminating any other active server processes<span style=3D'col=
or:black;
background:black;mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>There is a core dump. The de bugger indica=
tes the crash sequence
as follows:<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>(dbx)
where<span style=3D'color:black;background:black;mso-highlight:black'><o:=
p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>_alloc_initial_pthread(??)
at 0x90000000049567c<span style=3D'color:black;background:black;mso-hig=
hlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>__pth_init(??)
at 0x900000000493ba4<span style=3D'color:black;background:black;mso-hig=
hlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>uload(??,
??, ??, ??, ??, ??, ??, ??) at 0x9fffffff000 1954<span sty=
le=3D'color:black;
background:black;mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>load_64.load(??,
??, ??) at 0x90000000004686c<span style=3D'color:black;background:b=
lack;
mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>loadAndInit()
at 0x90000000047bd7c<span style=3D'color:black;background:black;mso-hig=
hlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>dlopen(??,
??) at 0x90000000011cc4c<span style=3D'color:black;background:black;m=
so-highlight:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>internal_load_library(libname
=3D "/apps/pg_9.0_b4/lib/postgresql/libpqwalreceiver.so"), li=
ne 234 in
"dfmgr.c"<span style=3D'color:black;background:black;mso-highli=
ght:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>load_file(filename
=3D "libpqwalreceiver", restricted =3D '\0'), line 15=
6 in
"dfmgr.c"<span style=3D'color:black;background:black;mso-highli=
ght:
black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>WalReceiverMain(),
line 248 in "walreceiver.c"<span style=3D'color:black;bac=
kground:black;
mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>AuxiliaryProcessMain(argc
=3D 2, argv =3D 0x0fffffffffffa8b8), line 428 in "boot=
strap.c"<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>StartChildProcess(type
=3D WalReceiverProcess), line 4405 in "postmaster.c "<sp=
an
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>sigusr1_handler(postgres_signal_ar g
=3D 30), line 4227 in "postmaster.c"<span style=3D'co=
lor:black;
background:black;mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>__fd_select(??,
??, ??, ??, ??) at 0x90000000011805c<span style=3D'color:black;=
background:black;
mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>postmaster.select(__fds
=3D 5, __readlist =3D 0x0fffffffffffd0a8, __writel ist =3D (ni=
l), __exceptlist =3D
(nil), __timeout =3D 0x0ffffffffffff0c0), line 229 in "t=
ime.h"<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>unnamed
block in ServerLoop(), line 1391 in "postmaster.c "<sp=
an
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>unnamed
block in ServerLoop(), line 1391 in "postmaster.c "<sp=
an
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>ServerLoop(),
line 1391 in "postmaster.c"<span style=3D'color:black;bac=
kground:black;
mso-highlight:black'><o:p></o:p></span></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>PostmasterMain(argc
=3D 1, argv =3D 0x00000001102aa4b0), line 1092=2 0in "pos=
tmaster.c"<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal style=3D'text-autospace:none'><span style=3D'fon=
t-family:"Courier New"'>main(argc
=3D 1, argv =3D 0x00000001102aa4b0), line 188 in "main=
..c"<span
style=3D'color:black;background:black;mso-highlight:black'><o:p></o:p></spa=
n></span></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Any pointers on how to resolve the issue=
will be much
appreciated.<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Thanks.<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'><o:p> </o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Alanoly Andrews (<a href=3D"mailto:alanolya [at] inver=
a=2Ecom">alanolya [at] invera.com</a>)<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Senior Software Engineer<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Invera Inc.<o:p></o:p></span></b></p>

<p class=3DMsoNormal><b><span lang=3DEN-CA style=3D'font-size:14.0pt;=
font-family:
"Garamond","serif"'>Montreal, QC<o:p></o:p></span></b></p>

</div>

<br>
<hr>
<span style=3D"font-weight: bold; font-family: arial;">
This e-mail may be privileged and/or confidentia l, and the=
sender does not waive any related rights an d obligations=
.. Any distribution, use or copying of this e-mail or th=
e information it contains by other than an i ntended recip=
ient is unauthorized. If you received this e-m ail in erro=
r, please advise me (by return e-mail or oth erwise) immed=
iately.<br><br>
Ce courriel est confidentiel et prot=E9g=E9. L'e xp=E9diteur n=
e renonce pas aux droits et obligations qui s'y rapporten=
t=2E Toute diffusion, utilisation ou copie de ce message =
ou des renseignements qu'il contient par une p ersonne autre=
que le (les) destinataire(s) d=E9sign=E9(s) est interdite.=
Si vous recevez ce courriel par erreur, veu illez m'en =
aviser imm=E9diatement, par retour de courriel o u par un =
autre moyen.<br><br>

<a href=3Dhttp://abaca.com/customers.html>Mail sent via the Abaca=
EPG</a>
</span>
<hr>
<br>

</body>

</html>

--_000_09B23E7BF70425478C1330D893A722C602FEC019BDMailSVRinve ra_--
Alanoly Andrews [ Fr, 06 August 2010 15:10 ] [ ID #2045672 ]

Re: postgres 9.0 crash when bringing up hot standby

On Fri, Aug 6, 2010 at 10:10 PM, Alanoly Andrews <alanolya [at] invera.com> wrot=
e:
> I=92m testing =93hot standby=94 using =93streaming WAL records=94. On try=
ing to bring
> up the hot standby, I see the following error in the log:

Thanks for the report!

> LOG:=A0 database system was interrupted; last known up at 2010-08-05 14:4=
6:36
> LOG:=A0 entering standby mode
> LOG:=A0 restored log file "000000010000000000000007" from archive
> LOG:=A0 redo starts at 0/7000020
> LOG:=A0 consistent recovery state reached at 0/8000000
> LOG:=A0 database system is ready to accept read only connections
> cp: /pgarclog/pg1/000000010000000000000008: A file or directory in the pa=
th
> name does not exist.
> LOG:=A0 WAL receiver process (PID 1073206) was terminated by signal 11
> LOG:=A0 terminating any other active server processes
>
> There is a core dump. The debugger indicates the crash sequence as follow=
s:
>
> (dbx) where
> _alloc_initial_pthread(??) at 0x90000000049567c
> __pth_init(??) at 0x900000000493ba4
> uload(??, ??, ??, ??, ??, ??, ??, ??) at 0x9fffffff0001954
> load_64.load(??, ??, ??) at 0x90000000004686c
> loadAndInit() at 0x90000000047bd7c
> dlopen(??, ??) at 0x90000000011cc4c
> internal_load_library(libname =3D
> "/apps/pg_9.0_b4/lib/postgresql/libpqwalreceiver.so"), line 234 in "dfmgr=
..c"
> load_file(filename =3D "libpqwalreceiver", restricted =3D '\0'), line 156=
in
> "dfmgr.c"
> WalReceiverMain(), line 248 in "walreceiver.c"
> AuxiliaryProcessMain(argc =3D 2, argv =3D 0x0fffffffffffa8b8), line 428 in
> "bootstrap.c"
> StartChildProcess(type =3D WalReceiverProcess), line 4405 in "postmaster.=
c"
> sigusr1_handler(postgres_signal_arg =3D 30), line 4227 in "postmaster.c"
> __fd_select(??, ??, ??, ??, ??) at 0x90000000011805c
> postmaster.select(__fds =3D 5, __readlist =3D 0x0fffffffffffd0a8, __write=
list =3D
> (nil), __exceptlist =3D (nil), __timeout =3D 0x0ffffffffffff0c0), line 22=
9 in
> "time.h"
> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
> ServerLoop(), line 1391 in "postmaster.c"
> PostmasterMain(argc =3D 1, argv =3D 0x00000001102aa4b0), line 1092 in
> "postmaster.c"
> main(argc =3D 1, argv =3D 0x00000001102aa4b0), line 188 in "main.c"
>
> Any pointers on how to resolve the issue will be much appreciated.

Sorry, I have no idea what's wrong :(

Is the simple LOAD command successful on your AIX?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Fujii Masao [ Fr, 06 August 2010 16:31 ] [ ID #2045673 ]

Re: postgres 9.0 crash when bringing up hot standby

Thanks. Yes, the LOAD command does work, on another database cluster on the=
same AIX machine.

-----Original Message-----
From: Fujii Masao [mailto:masao.fujii [at] gmail.com]
Sent: Friday, August 06, 2010 10:31 AM
To: Alanoly Andrews
Cc: pgsql-admin [at] postgresql.org; PostgreSQL-development
Subject: Re: [ADMIN] postgres 9.0 crash when bringing up hot standby

On Fri, Aug 6, 2010 at 10:10 PM, Alanoly Andrews <alanolya [at] invera.com> wrot=
e:
> I'm testing "hot standby" using "streaming WAL records". On trying to bri=
ng
> up the hot standby, I see the following error in the log:

Thanks for the report!

> LOG:=A0 database system was interrupted; last known up at 2010-08-05 14:4=
6:36
> LOG:=A0 entering standby mode
> LOG:=A0 restored log file "000000010000000000000007" from archive
> LOG:=A0 redo starts at 0/7000020
> LOG:=A0 consistent recovery state reached at 0/8000000
> LOG:=A0 database system is ready to accept read only connections
> cp: /pgarclog/pg1/000000010000000000000008: A file or directory in the pa=
th
> name does not exist.
> LOG:=A0 WAL receiver process (PID 1073206) was terminated by signal 11
> LOG:=A0 terminating any other active server processes
>
> There is a core dump. The debugger indicates the crash sequence as follow=
s:
>
> (dbx) where
> _alloc_initial_pthread(??) at 0x90000000049567c
> __pth_init(??) at 0x900000000493ba4
> uload(??, ??, ??, ??, ??, ??, ??, ??) at 0x9fffffff0001954
> load_64.load(??, ??, ??) at 0x90000000004686c
> loadAndInit() at 0x90000000047bd7c
> dlopen(??, ??) at 0x90000000011cc4c
> internal_load_library(libname =3D
> "/apps/pg_9.0_b4/lib/postgresql/libpqwalreceiver.so"), line 234 in "dfmgr=
..c"
> load_file(filename =3D "libpqwalreceiver", restricted =3D '\0'), line 156=
in
> "dfmgr.c"
> WalReceiverMain(), line 248 in "walreceiver.c"
> AuxiliaryProcessMain(argc =3D 2, argv =3D 0x0fffffffffffa8b8), line 428 in
> "bootstrap.c"
> StartChildProcess(type =3D WalReceiverProcess), line 4405 in "postmaster.=
c"
> sigusr1_handler(postgres_signal_arg =3D 30), line 4227 in "postmaster.c"
> __fd_select(??, ??, ??, ??, ??) at 0x90000000011805c
> postmaster.select(__fds =3D 5, __readlist =3D 0x0fffffffffffd0a8, __write=
list =3D
> (nil), __exceptlist =3D (nil), __timeout =3D 0x0ffffffffffff0c0), line 22=
9 in
> "time.h"
> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
> ServerLoop(), line 1391 in "postmaster.c"
> PostmasterMain(argc =3D 1, argv =3D 0x00000001102aa4b0), line 1092 in
> "postmaster.c"
> main(argc =3D 1, argv =3D 0x00000001102aa4b0), line 188 in "main.c"
>
> Any pointers on how to resolve the issue will be much appreciated.

Sorry, I have no idea what's wrong :(

Is the simple LOAD command successful on your AIX?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
****************************************************
This e-mail may be privileged and/or confidential, and the sender does not =
waive any related rights and obligations. Any distribution, use or copying =
of this e-mail or the information it contains by other than an intended rec=
ipient is unauthorized. If you received this e-mail in error, please advise=
me (by return e-mail or otherwise) immediately.

Ce courriel est confidentiel et prot=E9g=E9. L'exp=E9diteur ne renonce pas =
aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation =
ou copie de ce message ou des renseignements qu'il contient par une personn=
e autre que le (les) destinataire(s) d=E9sign=E9(s) est interdite. Si vous =
recevez ce courriel par erreur, veuillez m'en aviser imm=E9diatement, par r=
etour de courriel ou par un autre moyen.
****************************************************


--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Alanoly Andrews [ Fr, 06 August 2010 17:17 ] [ ID #2045674 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

On 06/08/10 17:31, Fujii Masao wrote:
> On Fri, Aug 6, 2010 at 10:10 PM, Alanoly Andrews<alanolya [at] invera.com> =
wrote:
>> I=92m testing =93hot standby=94 using =93streaming WAL records=94. On =
trying to bring
>> (dbx) where
>> _alloc_initial_pthread(??) at 0x90000000049567c
>> __pth_init(??) at 0x900000000493ba4
>> uload(??, ??, ??, ??, ??, ??, ??, ??) at 0x9fffffff0001954
>> load_64.load(??, ??, ??) at 0x90000000004686c
>> loadAndInit() at 0x90000000047bd7c
>> dlopen(??, ??) at 0x90000000011cc4c
>> internal_load_library(libname =3D
>> "/apps/pg_9.0_b4/lib/postgresql/libpqwalreceiver.so"), line 234 in "df=
mgr.c"
>> load_file(filename =3D "libpqwalreceiver", restricted =3D '\0'), line =
156 in
>> "dfmgr.c"
>> WalReceiverMain(), line 248 in "walreceiver.c"
>> AuxiliaryProcessMain(argc =3D 2, argv =3D 0x0fffffffffffa8b8), line 42=
8 in
>> "bootstrap.c"
>> StartChildProcess(type =3D WalReceiverProcess), line 4405 in "postmast=
er.c"
>> sigusr1_handler(postgres_signal_arg =3D 30), line 4227 in "postmaster.=
c"
>> __fd_select(??, ??, ??, ??, ??) at 0x90000000011805c
>> postmaster.select(__fds =3D 5, __readlist =3D 0x0fffffffffffd0a8, __wr=
itelist =3D
>> (nil), __exceptlist =3D (nil), __timeout =3D 0x0ffffffffffff0c0), line=
229 in
>> "time.h"
>> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
>> unnamed block in ServerLoop(), line 1391 in "postmaster.c"
>> ServerLoop(), line 1391 in "postmaster.c"
>> PostmasterMain(argc =3D 1, argv =3D 0x00000001102aa4b0), line 1092 in
>> "postmaster.c"
>> main(argc =3D 1, argv =3D 0x00000001102aa4b0), line 188 in "main.c"
>>
>> Any pointers on how to resolve the issue will be much appreciated.

So, loading libpqwalreceiver library crashes. It looks like it might be
pthread-related. Perhaps something wrong with our makefiles, causing
libpqwalreceiver to be built with wrong flags? Does contrib/dblink work?
If you look at the build log, what is the command line used to compile
libpqwalreceiver, and what is the command line used to build other
libraries, like contrib/dblink?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Heikki Linnakangas [ Fr, 06 August 2010 21:53 ] [ ID #2045679 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

On Fri, Aug 6, 2010 at 3:53 PM, Heikki Linnakangas
<heikki.linnakangas [at] enterprisedb.com> wrote:
> So, loading libpqwalreceiver library crashes. It looks like it might be
> pthread-related. Perhaps something wrong with our makefiles, causing
> libpqwalreceiver to be built with wrong flags? Does contrib/dblink work? If
> you look at the build log, what is the command line used to compile
> libpqwalreceiver, and what is the command line used to build other
> libraries, like contrib/dblink?

I haven't seen any response to this from the OP, but it seems
worrisome. Has anyone else tested a Hot Standby configuraration -
successfully or otherwise - on AIX?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Robert Haas [ Mi, 11 August 2010 16:12 ] [ ID #2045866 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

On Wed, Aug 11, 2010 at 10:20 AM, Alanoly Andrews <alanolya [at] invera.com> wro=
te:
> Ok..in response to the questions from Heikki,
>
> 1. Yes, "contrib/dblink" does work. Here's the output from the command us=
ed to "make" dblink:
> =A0 =A0 =A0postgres:thimar> /usr/bin/gmake -C contrib/dblink install
> =A0 =A0 =A0gmake: Entering directory `/dinabkp/faouzis/postgresql-9.0beta=
1/contrib/dblink'
> =A0 =A0 =A0/bin/sh ../../config/install-sh -c -d '/dinabkp/faouzis/local2=
/pgsql/lib'
> =A0 =A0 =A0/bin/sh ../../config/install-sh -c -d '/dinabkp/faouzis/local2=
/pgsql/share/contrib'
> =A0 =A0 =A0/bin/sh ../../config/install-sh -c -m 755 =A0dblink.so '/dinab=
kp/faouzis/local2/pgsql/lib/dblink.so'
> =A0 =A0 =A0/bin/sh ../../config/install-sh -c -m 644 ./uninstall_dblink.s=
ql '/dinabkp/faouzis/local2/pgsql/share/contrib'
> =A0 =A0 =A0/bin/sh ../../config/install-sh -c -m 644 dblink.sql '/dinabkp=
/faouzis/local2/pgsql/share/contrib'
> =A0 =A0 =A0gmake: Leaving directory `/dinabkp/faouzis/postgresql-9.0beta1=
/contrib/dblink'

Unfortunately that only shows the install, not the link - it must have
been built earlier. Can you do "make clean" in that just that one
directory, and then "make install" again?

> 2. I don't have records of the build logs for the regular postgres execut=
ables (which contains the libpqwalreceiver) but can do a new compile/make i=
f that is required. But they were compiled and installed using the regular =
make files supplied along with the postgres source code. The following flag=
s were added during the compilation:
>
> =A0 --without-readline --without-zlib =A0 =A0--enable-debug --enable-cass=
ert --enable-thread-safety

It'd be nice to see the whole build log, if it's not too much trouble
to regenerate it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Robert Haas [ Mi, 11 August 2010 16:25 ] [ ID #2045867 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

Ok..in response to the questions from Heikki,

1. Yes, "contrib/dblink" does work. Here's the output from the command used=
to "make" dblink:
postgres:thimar> /usr/bin/gmake -C contrib/dblink install
gmake: Entering directory `/dinabkp/faouzis/postgresql-9.0beta1/contr=
ib/dblink'
/bin/sh ../../config/install-sh -c -d '/dinabkp/faouzis/local2/pgsql/=
lib'
/bin/sh ../../config/install-sh -c -d '/dinabkp/faouzis/local2/pgsql/=
share/contrib'
/bin/sh ../../config/install-sh -c -m 755 dblink.so '/dinabkp/faouzi=
s/local2/pgsql/lib/dblink.so'
/bin/sh ../../config/install-sh -c -m 644 ./uninstall_dblink.sql '/di=
nabkp/faouzis/local2/pgsql/share/contrib'
/bin/sh ../../config/install-sh -c -m 644 dblink.sql '/dinabkp/faouzi=
s/local2/pgsql/share/contrib'
gmake: Leaving directory `/dinabkp/faouzis/postgresql-9.0beta1/contri=
b/dblink'

2. I don't have records of the build logs for the regular postgres executab=
les (which contains the libpqwalreceiver) but can do a new compile/make if =
that is required. But they were compiled and installed using the regular ma=
ke files supplied along with the postgres source code. The following flags =
were added during the compilation:

--without-readline --without-zlib --enable-debug --enable-cassert --e=
nable-thread-safety

Thanks.

Alanoly.

-----Original Message-----
From: Robert Haas [mailto:robertmhaas [at] gmail.com]
Sent: Wednesday, August 11, 2010 10:13 AM
To: Heikki Linnakangas
Cc: Alanoly Andrews; pgsql-admin [at] postgresql.org; PostgreSQL-development
Subject: Re: [HACKERS] [ADMIN] postgres 9.0 crash when bringing up hot stan=
dby

On Fri, Aug 6, 2010 at 3:53 PM, Heikki Linnakangas
<heikki.linnakangas [at] enterprisedb.com> wrote:
> So, loading libpqwalreceiver library crashes. It looks like it might be
> pthread-related. Perhaps something wrong with our makefiles, causing
> libpqwalreceiver to be built with wrong flags? Does contrib/dblink work? =
If
> you look at the build log, what is the command line used to compile
> libpqwalreceiver, and what is the command line used to build other
> libraries, like contrib/dblink?

I haven't seen any response to this from the OP, but it seems
worrisome. Has anyone else tested a Hot Standby configuraration -
successfully or otherwise - on AIX?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
****************************************************
This e-mail may be privileged and/or confidential, and the sender does not =
waive any related rights and obligations. Any distribution, use or copying =
of this e-mail or the information it contains by other than an intended rec=
ipient is unauthorized. If you received this e-mail in error, please advise=
me (by return e-mail or otherwise) immediately.

Ce courriel est confidentiel et prot=E9g=E9. L'exp=E9diteur ne renonce pas =
aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation =
ou copie de ce message ou des renseignements qu'il contient par une personn=
e autre que le (les) destinataire(s) d=E9sign=E9(s) est interdite. Si vous =
recevez ce courriel par erreur, veuillez m'en aviser imm=E9diatement, par r=
etour de courriel ou par un autre moyen.
****************************************************


--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Alanoly Andrews [ Mi, 11 August 2010 16:20 ] [ ID #2045868 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

On Wed, Aug 11, 2010 at 10:25 AM, Robert Haas <robertmhaas [at] gmail.com> wrote:
> [request form more information]

Per off-list discussion with Alanoly, we've determined the following:

dblink was compiled with the same flags as libpqwalreciever
dblink works
libpqwalreceiver crashes

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Robert Haas [ Do, 12 August 2010 22:37 ] [ ID #2045954 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

Robert Haas <robertmhaas [at] gmail.com> writes:
> Per off-list discussion with Alanoly, we've determined the following:

> dblink was compiled with the same flags as libpqwalreciever
> dblink works
> libpqwalreceiver crashes

I wonder if the problem is not so much libpqwalreceiver as the
walreceiver process. Maybe an ordinary backend process does some
prerequisite initialization that walreceiver is missing. Hard to
guess what, though ... I can't think of anything dlopen() depends on
that should be under our control.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane [ Do, 12 August 2010 23:40 ] [ ID #2045955 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

I wrote:
> I wonder if the problem is not so much libpqwalreceiver as the
> walreceiver process. Maybe an ordinary backend process does some
> prerequisite initialization that walreceiver is missing. Hard to
> guess what, though ... I can't think of anything dlopen() depends on
> that should be under our control.

Actually, that idea is easily tested: try doing
LOAD 'libpqwalreceiver';
in a regular backend process.

If that still crashes, it might be useful to truss or strace the backend
while it runs the command, and compare that to the trace of
LOAD 'dblink';

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tom Lane [ Do, 12 August 2010 23:54 ] [ ID #2045956 ]

Re: [HACKERS] postgres 9.0 crash when bringing up hot standby

On Thu, Aug 12, 2010 at 5:54 PM, Tom Lane <tgl [at] sss.pgh.pa.us> wrote:
> I wrote:
>> I wonder if the problem is not so much libpqwalreceiver as the
>> walreceiver process. =A0Maybe an ordinary backend process does some
>> prerequisite initialization that walreceiver is missing. =A0Hard to
>> guess what, though ... I can't think of anything dlopen() depends on
>> that should be under our control.
>
> Actually, that idea is easily tested: try doing
> =A0 =A0 =A0 =A0LOAD 'libpqwalreceiver';
> in a regular backend process.

Alanoly, is this something you can try?

> If that still crashes, it might be useful to truss or strace the backend
> while it runs the command, and compare that to the trace of
> =A0 =A0 =A0 =A0LOAD 'dblink';

And if necessary, this too?

Thanks for your help debugging this....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Robert Haas [ Fr, 13 August 2010 17:11 ] [ ID #2045996 ]
Datenbanken » gmane.comp.db.postgresql.admin » postgres 9.0 crash when bringing up hot standby

Vorheriges Thema: NEW + tableOID
Nächstes Thema: Autovacuum daemon internal handling