REINDEX disk space requirements
--000e0cdf97d680ca890477b5efc7
Content-Type: text/plain; charset=ISO-8859-1
Hello,
I'm a developer on a product that includes a built-in PostgreSQL DB;
currently 8.3.5. One of our tables is very active - it can see in the tens
of millions of rows inserted and deleted per day. Generally speaking, new
rows arrive throughout the day, and older rows from previous days are
periodically deleted. A task runs VACUUM ANALYZE after those deletions, to
keep space available. We noticed that the index on this table sometimes
grew larger over time, so we added a REINDEX at a low-activity time of day.
One large installation we're working with is seeing 'out of disk space'
errors when performing the REINDEX. I don't have precise numbers at the
moment, but here's what I know:
- Total DB size is ~100 GB
- Size of the main table is ~60 GB (~1B rows)
- Size of the main table PK index is ~20 GB
- Free space on disk is ~35 GB
- Disk quotas are not an issue
could not extend relation 1663/16384/5881417: wrote only 4096 of 8192 bytes
at block 1631971
HINT: Check free disk space.
'REINDEX TABLE <redacted>'
Could this be caused by anything other than actually running out of space?
If not, is there a way calculate, based on the existing index size, table
size or number of rows, roughly how much space the REINDEX requires, or get
an upper-bound on that value? Thanks,
David
--000e0cdf97d680ca890477b5efc7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Hello,<div><br></div><div>I'm a developer on a product that includes a =
built-in PostgreSQL DB; currently 8.3.5. =A0One of our tables is very activ=
e - it can see in the tens of millions of rows inserted and deleted per day=
.. =A0Generally speaking, new rows arrive throughout the day, and older rows=
from previous days are periodically deleted. =A0A task runs VACUUM ANALYZE=
after those deletions, to keep space available. =A0We noticed that the ind=
ex on this table sometimes grew larger over time, so we added a REINDEX at =
a low-activity time of day.</div>
<div><br></div><div>One large installation we're working with is seeing=
'out of disk space' errors when performing the REINDEX. =A0I don=
39;t have precise numbers at the moment, but here's what I know:</div>
<div><br></div><div>- Total DB size is ~100 GB</div><div>- Size of the main=
table is ~60 GB (~1B rows)</div><div>- Size of the main table PK index is =
~20 GB</div><div>- Free space on disk is ~35 GB</div><div>- Disk quotas are=
not an issue</div>
<div><br></div><div>could not extend relation 1663/16384/5881417: wrote onl=
y 4096 of 8192 bytes at block 1631971</div><div>HINT: =A0Check free disk sp=
ace.</div><div>=A0'REINDEX TABLE <redacted>'</div><div><br></=
div>
<div>Could this be caused by anything other than actually running out of sp=
ace? =A0If not, is there a way calculate,=A0based on the existing index siz=
e, table size or number of rows,=A0roughly how much space the REINDEX requi=
res, or get an upper-bound on that value? =A0Thanks,</div>
<div><br></div><div>David</div>
--000e0cdf97d680ca890477b5efc7--
Re: REINDEX disk space requirements
David Schnur escribi=F3:
> I'm a developer on a product that includes a built-in PostgreSQL DB;
> currently 8.3.5. One of our tables is very active - it can see in the =
tens
> of millions of rows inserted and deleted per day. Generally speaking, =
new
> rows arrive throughout the day, and older rows from previous days are
> periodically deleted. A task runs VACUUM ANALYZE after those deletions=
, to
> keep space available.
Immediately after the deletions, or is there some delay? Keep in mind
that rows cannot be reclaimed until the oldest transaction that was open
when they were deleted is finished. So if you vacuum too quickly, the
deleted rows may not be deleted. It's better if you insert some delay
between delete and vacuum, the duration of which is dependent on the
duration of your transactions.
--
Alvaro Herrera http://www.CommandPrompt.co=
m/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Re: REINDEX disk space requirements
David Schnur <dnschnur [at] gmail.com> writes:
> One large installation we're working with is seeing 'out of disk space'
> errors when performing the REINDEX. I don't have precise numbers at the
> moment, but here's what I know:
> - Total DB size is ~100 GB
> - Size of the main table is ~60 GB (~1B rows)
> - Size of the main table PK index is ~20 GB
> - Free space on disk is ~35 GB
Out of disk space is 100% guaranteed here, because it'll take about
twice the index size to do a REINDEX --- there's a sort file that's
roughly the size of the index, plus the new index itself, and we
don't risk deleting the old index until the transaction commits.
Possibly you could drop and recreate the index instead of using REINDEX,
if you're going to have the table locked anyway. But it seems to me
that you're likely to need more disk pretty soon, unless this DB is
more static than most. Maybe just spring for more hardware now.
regards, tom lane
--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Re: REINDEX disk space requirements
An alternative to adding more hardware is to partition the table. This
may be your best solution for the long term too. The benefits are
1. Elimination of frequent vacuums
2. Instant space reclamation via partition deletes.
We had a similar situation as yours...we bit the bullet and
implemented partitioning..and that was the best decision we ever made.
On Fri, Nov 6, 2009 at 9:20 AM, Tom Lane <tgl [at] sss.pgh.pa.us> wrote:
> David Schnur <dnschnur [at] gmail.com> writes:
>> One large installation we're working with is seeing 'out of disk space'
>> errors when performing the REINDEX. =A0I don't have precise numbers at t=
he
>> moment, but here's what I know:
>
>> - Total DB size is ~100 GB
>> - Size of the main table is ~60 GB (~1B rows)
>> - Size of the main table PK index is ~20 GB
>> - Free space on disk is ~35 GB
>
> Out of disk space is 100% guaranteed here, because it'll take about
> twice the index size to do a REINDEX --- there's a sort file that's
> roughly the size of the index, plus the new index itself, and we
> don't risk deleting the old index until the transaction commits.
>
> Possibly you could drop and recreate the index instead of using REINDEX,
> if you're going to have the table locked anyway. =A0But it seems to me
> that you're likely to need more disk pretty soon, unless this DB is
> more static than most. =A0Maybe just spring for more hardware now.
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Re: REINDEX disk space requirements
--000e0ce0b1fab7ee340477b76e8a
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Nov 6, 2009 at 12:11 PM, Alvaro Herrera <alvherre [at] commandprompt.com>
wrote:
> Immediately after the deletions, or is there some delay? Keep in mind
> that rows cannot be reclaimed until the oldest transaction that was open
> when they were deleted is finished.
The VACUUM waits until after the DELETE has been committed. But when you
refer to the oldest transaction, do you mean any transaction at all?
Currently it's guaranteed that no other transaction is running when VACUUM
starts, but we were thinking of changing that behavior. It would then be
possible for an INSERT in a separate transaction to start running ~10
seconds before the DELETE is done, and continue running for ~10 seconds
after the VACUUM starts.
Is that the problem you were referring to? How does VACUUM behave in that
situation? It sounds like it returns without reclaiming anything?
On Fri, Nov 6, 2009 at 12:20 PM, Tom Lane <tgl [at] sss.pgh.pa.us> wrote:
> Out of disk space is 100% guaranteed here, because it'll take about
> twice the index size to do a REINDEX --- there's a sort file that's
> roughly the size of the index, plus the new index itself, and we
> don't risk deleting the old index until the transaction commits.
>
Aha; I assumed it would require free space equal to the size of the index.
If it needs double, that explains it pretty clearly. That's exactly what I
was looking for; thank you!
David
--000e0ce0b1fab7ee340477b76e8a
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div><br></div><div>On Fri, Nov 6, 2009 at 12:11 PM, Alvaro Herrera=A0<span=
dir=3D"ltr"><<a href=3D"mailto:alvherre [at] commandprompt.com">alvherre [at] com=
mandprompt.com</a>></span>=A0wrote:<br><blockquote class=3D"gmail_quote"=
style=3D"margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-le=
ft: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); b=
order-left-style: solid; padding-left: 1ex; ">
Immediately after the deletions, or is there some delay? =A0Keep in mind<br=
>that rows cannot be reclaimed until the oldest transaction that was open<b=
r>when they were deleted is finished.</blockquote><div><br></div><div>The V=
ACUUM waits until after the DELETE has been committed. =A0But when you refe=
r to the oldest transaction, do you mean any transaction at all? =A0Current=
ly it's guaranteed that no other transaction is running when VACUUM sta=
rts, but we were thinking of changing that behavior. =A0It would then be po=
ssible for an INSERT in a separate transaction to start running ~10 seconds=
before the DELETE is done, and continue running for ~10 seconds after the =
VACUUM starts.</div>
<div><br></div><div>Is that the problem you were referring to? =A0How does =
VACUUM behave in that situation? =A0It sounds like it returns without recla=
iming anything?</div><div><br></div></div><div class=3D"gmail_quote">On Fri=
, Nov 6, 2009 at 12:20 PM, Tom Lane <span dir=3D"ltr"><<a href=3D"mailto=
:tgl [at] sss.pgh.pa.us">tgl [at] sss.pgh.pa.us</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div class=3D"im">Out of disk space is 100%=
guaranteed here, because it'll take about</div>
twice the index size to do a REINDEX --- there's a sort file that's=
<br>
roughly the size of the index, plus the new index itself, and we<br>
don't risk deleting the old index until the transaction commits.<br></b=
lockquote><div><br></div><div>Aha; I assumed it would require free space eq=
ual to the size of the index. =A0If it needs double, that explains it prett=
y clearly. =A0That's exactly what I was looking for; thank you!</div>
<div><br></div><div>David</div></div>
--000e0ce0b1fab7ee340477b76e8a--
Re: REINDEX disk space requirements
David Schnur escribi=F3:
> On Fri, Nov 6, 2009 at 12:11 PM, Alvaro Herrera <alvherre [at] commandprompt=
..com>
> wrote:
>
> > Immediately after the deletions, or is there some delay? Keep in min=
d
> > that rows cannot be reclaimed until the oldest transaction that was o=
pen
> > when they were deleted is finished.
>
> The VACUUM waits until after the DELETE has been committed. But when y=
ou
> refer to the oldest transaction, do you mean any transaction at all?
Yes. But it goes back even further: is there any other transaction
running that was also running when the DELETE started? If there is,
vacuum won't be able to reclaim the rows.
> Currently it's guaranteed that no other transaction is running when VA=
CUUM
> starts, but we were thinking of changing that behavior. It would then =
be
> possible for an INSERT in a separate transaction to start running ~10
> seconds before the DELETE is done, and continue running for ~10 seconds
> after the VACUUM starts.
>
> Is that the problem you were referring to?
Yes. You'd have to wait until the INSERT is finished, and run VACUUM
then.
> How does VACUUM behave in that situation? It sounds like it returns
> without reclaiming anything?
Yes.
--
Alvaro Herrera http://www.CommandPrompt.co=
m/
The PostgreSQL Company - Command Prompt, Inc.
--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin