help tuning query

Hi all;

we've been fighting this query for a few days now. we bumped up the statistict
target for the a.id , c.url_hits_id and the b.id columns below to 250 and ran
an analyze on the relevant tables. we killed it after 8hrs.

Any suggestions?



$ psql -ef expl.sql pwreport
explain
select
a.id,
ident_id,
time,
customer_name,
extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
category_id
from
pwreport.url_hits a left outer join
pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
pwreport.ident b,
pwreport.timezone e
where
a.ident_id = b.id
and b.timezone_id = e.id
and time >= extract ('epoch' from timestamp '2009-08-12')
and time < extract ('epoch' from timestamp '2009-08-13' )
and direction = 'REQUEST'
;
QUERY
PLAN
------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------ --------------------------------
Merge Right Join (cost=47528508.61..180424544.59 rows=10409251 width=53)
Merge Cond: (c.url_hits_id = a.id)
-> Index Scan using mt_url_hits_category_jt_url_hits_id_index on
url_hits_category_jt c (cost=0.00..122162596.63 rows=4189283233 width=8)
-> Sort (cost=47528508.61..47536931.63 rows=3369210 width=49)
Sort Key: a.id
-> Hash Join (cost=2565.00..47163219.21 rows=3369210 width=49)
Hash Cond: (b.timezone_id = e.id)
-> Hash Join (cost=2553.49..47116881.07 rows=3369210
width=37)
Hash Cond: (a.ident_id = b.id)
-> Seq Scan on url_hits a (cost=0.00..47051154.89
rows=3369210 width=12)
Filter: ((direction =
'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=
1250035200::double precision) AND (("time")::double precision <
1250121600::double precision))
-> Hash (cost 20.44..2020.44 rows=42644 width=29)
-> Seq Scan on ident b (cost=0.00..2020.44
rows=42644 width=29)
-> Hash (cost=6.78..6.78 rows=378 width )
-> Seq Scan on timezone e (cost=0.00..6.78 rows=378
width )
(15 rows)


--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Kevin Kempter [ Mi, 19 August 2009 16:56 ] [ ID #2012778 ]

Re: help tuning query

First one, try to paste explain into http://explain.depesz.com/

> select
> a.id,
> ident_id,
> time,
> customer_name,
> extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
> category_id
> from
> pwreport.url_hits a left outer join
> pwreport.url_hits_category_jt c on (a.id =3D c.url_hits_id),

url_hits_category_jt has an index on url_hits_id?

> pwreport.ident b,
> pwreport.timezone e
> where
> a.ident_id =3D b.id
> and b.timezone_id =3D e.id
> and time >=3D extract ('epoch' from timestamp '2009-08-12')
> and time < extract ('epoch' from timestamp '2009-08-13' )
> and direction =3D 'REQUEST'
> ;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY
> PLAN
> ------------------------------------------------------------ -------------=
------------------------------------------------------------ ---------------=
------------------------------------------------------------ ----
> =C2=A0Merge Right Join =C2=A0(cost=3D47528508.61..180424544.59 rows=3D104=
09251 width=3D53)
> =C2=A0 Merge Cond: (c.url_hits_id =3D a.id)
> =C2=A0 -> =C2=A0Index Scan using mt_url_hits_category_jt_url_hits_id_inde=
x on
> url_hits_category_jt c =C2=A0(cost=3D0.00..122162596.63 rows=3D4189283233=
width=3D8)
> =C2=A0 -> =C2=A0Sort =C2=A0(cost=3D47528508.61..47536931.63 rows=3D336921=
0 width=3D49)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 Sort Key: a.id
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2=A0Hash Join =C2=A0(cost=3D2565.00..471=
63219.21 rows=3D3369210 width=3D49)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Hash Cond: (b.timezone_i=
d =3D e.id)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2=A0Hash Join =C2=
=A0(cost=3D2553.49..47116881.07 rows=3D3369210
> width=3D37)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Has=
h Cond: (a.ident_id =3D b.id)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =
=C2=A0Seq Scan on url_hits a =C2=A0(cost=3D0.00..47051154.89
> rows=3D3369210 width=3D12)

for some reason (think absence of index) this part of the query is heavy.

Do you run analyze to your database?
Vacuum and reindex?

> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 Filter: ((direction =3D
> 'REQUEST'::proxy_direction_enum) AND (("time")::double precision >=3D
> 1250035200::double precision) AND (("time")::double precision <
> 1250121600::double precision))
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =
=C2=A0Hash =C2=A0(cost=3D2020.44..2020.44 rows=3D42644 width=3D29)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 -> =C2=A0Seq Scan on ident b =C2=A0(cost=3D0.00..2020.44
> rows=3D42644 width=3D29)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2=A0Hash =C2=A0(cos=
t=3D6.78..6.78 rows=3D378 width=3D20)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =
=C2=A0Seq Scan on timezone e =C2=A0(cost=3D0.00..6.78 rows=3D378
> width=3D20)
> (15 rows)
>
>
> --



--
Emanuel Calvo Franco
Database consultant at:
www.siu.edu.ar
www.emanuelcalvofranco.com.ar

--
Sent via pgsql-admin mailing list (pgsql-admin [at] postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Emanuel Calvo Franco [ Do, 20 August 2009 15:59 ] [ ID #2012781 ]
Datenbanken » gmane.comp.db.postgresql.admin » help tuning query

Vorheriges Thema: Duplicated keys in PITR
Nächstes Thema: trying to alter column