Re: Avoid duplicated rows when restoring data from pg_dumpall

--0015175cf86abb093004720605bd
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi Tom, thanks for your fast reply,

- As regards the duplicated rows, no, I don't get duplicated rows in all the
tables stored in the database because
some tables have primary-keys (and/or UNIQUE) constraints. These constraints
don't allow the restore process to
duplicate rows. In fact, it is a kind of "solution" I've tried...(add an
extra column with a primary-key or unique constraint, to this tables), and
it "works". The restore process doesn't generate duplicated rows, because
the constraint does not allow the insertion
of new duplicated data. Anyway..it looks like a kind of 'poor solution'
:-)

- Ok, thanks for the info, I thought pg_dumpall would work as I desired even
on non-empty clusters.
Of course...if there is no previous data, the restore process will never
create duplicated rows.

- Yes, the restore process generates errors, because it tries to re-generate
data structures that exist in the database server
at that moment. Even if I delete my own databases, some errors will appear
(because I cannot delete the internal stuff of
of the server -> the 'postgres' database for example) . Those errors, could
be ignored in most cases I think, but perhaps create a kind of "bad feeling"
about the result of the restore process, or can "hide" other more important
errors when you get a huge ammount of info on the screen either.

- I forgot to mention it. I'm using PostgreSQL 8.3 on Windows xp.

Thanks a lot for your advices,

Regards, Pablo

2009/8/24 Tom Lane <tgl [at] sss.pgh.pa.us>

> Pablo Alonso-Villaverde Roza <pavroza [at] gmail.com> writes:
> > I'm getting duplicated rows in some of my tables when I try to restore
> data
> > from a dump file generated with 'pg_dumpall'.
>
> Probably all of them, actually ...
>
> > The only "way" I have found to solve this problem is deleting my database
> > before restoring the data, so everything is 're-created' in the restore
> > process without generating duplicated rows.
>
> A pg_dumpall script expects to be restored into an empty cluster. This
> is not a bug.
>
> > I thought, that the "-c" flag on pg_dumpall would force a DROP of any
> > previous data structures on the server but...it looks like it doesn't
> work
> > as I expected and ...when I restore data I get duplicated rows.
>
> The -c flag should cause the script to DROP all your databases first.
> But that switch has been known to have bugs in the past, and in any case
> it could fail if there are other sessions connected to those databases.
> Did you pay attention to whether the restore reported any errors?
> What PG version is this?
>
> regards, tom lane
>

--0015175cf86abb093004720605bd
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Tom, thanks for your fast reply,<br><br>- As regards the duplicated rows=
, no, I don't get duplicated rows in all the tables stored in the datab=
ase because<br>
some tables have primary-keys (and/or UNIQUE) constraints. These constraint=
s don't allow the restore process to<br>
duplicate rows.=A0 In fact, it is a kind of "solution" I've t=
ried...(add an extra column with a primary-key or
unique constraint, to this tables), and it "works". The restore p=
rocess doesn't generate duplicated rows, because the constraint does no=
t allow the insertion<br>of new duplicated data.=A0 Anyway..it looks like a=
kind of 'poor solution'=A0 :-)<br>
<br>- Ok, thanks for the info, I thought pg_dumpall would work as I desired=
even on non-empty clusters.<br>Of course...if there is no previous data, t=
he restore process will never create duplicated rows.<br><br>- Yes, the res=
tore process generates errors, because it tries to re-generate data structu=
res that exist in the database server<br>
at that moment. Even if I delete my own databases, some errors will appear =
(because I cannot delete the internal stuff of <br>of the server -> the =
'postgres' database for example) . Those errors, could be ignored i=
n most cases I think, but perhaps create a kind of "bad feeling" =
about the result of the restore process,=A0 or can "hide" other m=
ore important errors when you get a huge ammount of info on the screen eith=
er.<br>
<br>- I forgot to mention it. I'm using PostgreSQL 8.3 on Windows xp. <=
br><br>Thanks a lot for your advices,<br><br>Regards, Pablo<br>
<br><div class=3D"gmail_quote">2009/8/24 Tom Lane <span dir=3D"ltr"><<a =
href=3D"mailto:tgl [at] sss.pgh.pa.us" target=3D"_blank">tgl [at] sss.pgh.pa.us</a>&g=
t;</span><br><blockquote class=3D"gmail_quote" style=3D"border-left: 1px so=
lid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div>Pablo Alonso-Villaverde Roza <<a href=3D"mailto:pavroza [at] gmail.com" =
target=3D"_blank">pavroza [at] gmail.com</a>> writes:<br>
> I'm getting duplicated rows in some of my tables when I try to res=
tore data<br>
> from a dump file generated with 'pg_dumpall'.<br>
<br>
</div>Probably all of them, actually ...<br>
<div><br>
> The only "way" I have found to solve this problem is deletin=
g my database<br>
> before restoring the data, so everything is 're-created' in th=
e restore<br>
> process without generating duplicated rows.<br>
<br>
</div>A pg_dumpall script expects to be restored into an empty cluster. =A0=
This<br>
is not a bug.<br>
<div><br>
> I thought, that the "-c" flag on pg_dumpall would force a DR=
OP of any<br>
> previous data structures on the server but...it looks like it doesn=
9;t work<br>
> as I expected and ...when I restore data I get duplicated rows.<br>
<br>
</div>The -c flag should cause the script to DROP all your databases first.=
<br>
But that switch has been known to have bugs in the past, and in any case<br=
>
it could fail if there are other sessions connected to those databases.<br>
Did you pay attention to whether the restore reported any errors?<br>
What PG version is this?<br>
<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane<br>
</blockquote></div><br>

--0015175cf86abb093004720605bd--
Pablo Alonso-Villaver [ Mi, 26 August 2009 09:01 ] [ ID #2013528 ]

Re: Avoid duplicated rows when restoring data from pg_dumpall ??

Hi Pablo

> - As regards the duplicated rows, no, I don't get duplicated rows in
> all the tables stored in the database because
> some tables have primary-keys (and/or UNIQUE) constraints. These
> constraints don't allow the restore process to
> duplicate rows. In fact, it is a kind of "solution" I've tried...(add
> an extra column with a primary-key or unique constraint, to this
> tables), and it "works". The restore process doesn't generate
> duplicated rows, because the constraint does not allow the insertion
> of new duplicated data. Anyway..it looks like a kind of 'poor
> solution' :-)

Yeah, that's one solution, only trouble being if the data in the
existing table is different to what's in the restore script (for a
record with the same ID) it won't be updated.

e.g.: in this example your restored database will be inconsistent with
the backup.
your table: Field1 = 1 (ID), Field2 = A, Field3 = B
restore script: Field1 = 1 (ID), Field2 = B, Field3 = B

>
> - Ok, thanks for the info, I thought pg_dumpall would work as I
> desired even on non-empty clusters.
> Of course...if there is no previous data, the restore process will
> never create duplicated rows.

Exactly. If you're looking for some form of replication (i.e.
master-to-slave) look at Slony - it fires triggers on the master that
insert data into the slave. It has its limitations but AFAIK it's a
workable solution.
>
> - Yes, the restore process generates errors, because it tries to
> re-generate data structures that exist in the database server
> at that moment. Even if I delete my own databases, some errors will
> appear (because I cannot delete the internal stuff of
> of the server -> the 'postgres' database for example) . Those errors,
> could be ignored in most cases I think, but perhaps create a kind of
> "bad feeling" about the result of the restore process, or can "hide"
> other more important errors when you get a huge ammount of info on the
> screen either.

You can delete the "postgres" database - it's an empty database that's
created when the server is initialised so you've got something to
connect to. It's safe to delete, as long as you have another database
you can connect to, but there's no real reason to unless it's in your
restore script (e.g. from pg_dumpall.)

http://www.postgresql.org/docs/8.3/static/manage-ag-template dbs.html

Regards,
Andy

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Andy Shellam [ Mi, 26 August 2009 18:51 ] [ ID #2013530 ]
Datenbanken » gmane.comp.db.postgresql.admin » Re: Avoid duplicated rows when restoring data from pg_dumpall

Vorheriges Thema: Primary key on existing table?
Nächstes Thema: warm standby and reciprocating failover.