After long semaphore waits MySQL becomes unresponsive.

After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 13:20:12 von Rik

Hi all,

yesterday the mysql process on a database server became totallly unresponsive
after some long semaphore waits (load/cpu/memory were OK, as were all other
processes). The debug output started with this:

InnoDB: Warning: a long semaphore wait:
--Thread 140054029002496 has waited at /btr/btr0sea.c line 631 for
241.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x78733f8 created in file /btr/btr0sea.c
line 182
a writer (thread id 140054029002496) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file /btr/btr0sea.c line 879
Last time write locked in file /btr/btr0cur.c line 189

.... followed by a bunch of other threads waiting for the same lock locked by
140054029002496

Version is 5.5.11.

I may be wrong here, but I tend to interpret this as '140054029002496' is
trying to get an exclusive lock on 0x78733f8, on which it already has an
exclusive lock, and hence is deadlocked in some manner. Am I right there? How
can this happen?

I certainly cannot reproduce a query which causes this, and I had to kill -9
the process, so nothing no running/long queries were written to the slow-query
log. (On a side note: not even root / superuser could connect to the MySQL
instance, so there was no way to check which queries were actually running) If
not, what should I look for in trying to determine the cause? (Added some
extra monitor output below sig in case it's needed).

Thanks,
--
Rik Wasmus

Some more InnoDB Monitor Output:

=====================================
110523 19:19:05 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 45 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1661111 1_second, 1661102 sleeps, 166100 10_second,
429 background, 413 flush
srv_master_thread log flush and writes: 1662059
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 3952510, signal count 31019262
--Thread 140054029002496 has waited at /btr/btr0sea.c line 631 for
243.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x78733f8 created in file /btr/btr0sea.c
line 182
a writer (thread id 140054029002496) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file /btr/btr0sea.c line 879
Last time write locked in file /btr/btr0cur.c line 1896
--Thread 140054051079936 has waited at /btr/btr0cur.c line 1896 for
243.00 seconds the semaphore:
X-lock on RW-latch at 0x78733f8 created in file /btr/btr0sea.c line 182
a writer (thread id 140054029002496) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file /btr/btr0sea.c line 879
Last time write locked in file /btr/btr0cur.c line 1896
--Thread 140054255666944 has waited at /row/row0sel.c line 3606 for
243.00 seconds the semaphore:
S-lock on RW-latch at 0x78733f8 created in file /btr/btr0sea.c line 182
a writer (thread id 140054029002496) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file /btr/btr0sea.c line 879
Last time write locked in file /btr/btr0cur.c line 1896
--Thread 140054294693632 has waited at /row/row0sel.c line 2902 for
243.00 seconds the semaphore:
S-lock on RW-latch at 0x7f60fa5b7f40 created in file /buf/buf0buf.c line
898
a writer (thread id 140054051079936) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file /row/row0sel.c line 2902
Last time write locked in file /row/row0upd.c line 2131
--Thread 140054005319424 has waited at /btr/btr0sea.c line 1054 for
243.00 seconds the semaphore:
S-lock on RW-latch at 0x78733f8 created in file /btr/btr0sea.c line 182
a writer (thread id 140054029002496) has reserved it in mode wait exclusive
number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
Last time read locked in file /btr/btr0sea.c line 879
Last time write locked in file /btr/btr0cur.c line 1896
--Thread 140054272325376 has waited at /trx/trx0trx.c line 208 for
243.00 seconds the semaphore:
Mutex at 0x1276a68 created file /srv/srv0srv.c line 1038, lock var 1
waiters flag 1



Mutex spin waits 117851637, rounds 89989545, OS waits 249362
RW-shared spins 11452811, rounds 158361139, OS waits 2122101
RW-excl spins 5229822, rounds 128114865, OS waits 1176460
Spin rounds per wait: 0.76 mutex, 13.83 RW-shared, 24.50 RW-excl
FAIL TO OBTAIN KERNEL MUTEX, SKIP LOCK INFO PRINTING
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
9947500 OS file reads, 105696752 OS file writes, 98371237 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 68, free list len 3667, seg size 3736, 997096 merges
merged operations:
insert 1421280, delete mark 897892, delete 86296
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 6375023, node heap has 13494 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 113572832858
Log flushed up to 113572832858
Last checkpoint at 113568853883
0 pending log writes, 0 pending chkp writes
95703706 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 3296722944; in additional pool allocated 0
Dictionary memory allocated 515016
Buffer pool size 196607
Free buffers 0
Database pages 183112
Old database pages 67574
Modified db pages 1548
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 30254129, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 9947464, created 56336, written 8910159
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s
LRU len: 183112, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
26 queries inside InnoDB, 0 queries in queue
3 read views open inside InnoDB
Main thread process no. 11983, id 140054344120064, state: doing background
drop tables
Number of rows inserted 2622442, updated 21677630, deleted 504511, read
297438002948
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 14:10:35 von Johan De Meersman

----- Original Message -----
> From: rik@grib.nl
>
> I may be wrong here, but I tend to interpret this as '140054029002496' is
> trying to get an exclusive lock on 0x78733f8, on which it already has an
> exclusive lock, and hence is deadlocked in some manner. Am I right there? How
> can this happen?

I'm not too hot on the internals, but yes, that seems likely.

> I certainly cannot reproduce a query which causes this,

You'd need at least two :-p

> and I had to kill -9
> the process, so nothing no running/long queries were written to the slow-query
> log. (On a side note: not even root / superuser could connect to the MySQL

check the max_user_connections setting, and set it a couple of notches lower than the max_connections one. It basically says "only this much non-super users may connect" and leaves the rest for super privileged users - which should only be admins, not applications.

> instance, so there was no way to check which queries were actually running) If
> not, what should I look for in trying to determine the cause? (Added some
> extra monitor output below sig in case it's needed).

Well... Your innodb status, if you can connect :-)

Can't be bothered to write down the reasoning, but the simple way to avoid deadlocks is to always, in all processes, lock all tables in the same order - alphabetically, for instance. that way deadlock gets pre-empted before it can occur.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 15:55:10 von Rik Wasmus

> > I may be wrong here, but I tend to interpret this as '140054029002496' is
> > trying to get an exclusive lock on 0x78733f8, on which it already has an
> > exclusive lock, and hence is deadlocked in some manner. Am I right
> > there? How can this happen?
>
> I'm not too hot on the internals, but yes, that seems likely.
>
> > I certainly cannot reproduce a query which causes this,
>
> You'd need at least two :-p

2 simultaneous queries with the same thread-id? Is that possible?

> > and I had to kill -9
> > the process, so nothing no running/long queries were written to the
> > slow-query log. (On a side note: not even root / superuser could connect
> > to the MySQL
>
> check the max_user_connections setting, and set it a couple of notches
> lower than the max_connections one. It basically says "only this much
> non-super users may connect" and leaves the rest for super privileged
> users - which should only be admins, not applications.

I always thought (based on http://dev.mysql.com/doc/refman/5.5/en/too-many-
connections.html) I'd get 1 super-connection without question, and I didn't
get any 'too many connections' errors, but timeouts on connecting. But let's
say that in this case (MySQL needing a kill -9) all bets about proper
connecting are sort of off ;) ). Nevertheless, according to the manual,
max_user_connections configures "The maximum number of simultaneous
connections permitted to any given MySQL user account.", which would mean (and
a test shows) it is per-user name, and it would only work if we have only 1
non-admin/non-super-user, while in reality, we have several users with
specific privileges. Not an incredible lot, but about 20 nonetheless, and the
only way I see your suggestion working is by allowing all users a
max_user_connections of ((max_connections/number of users) - 1), which isn't
really what we want (it severely limits the maximum for a specific user far
below the max-connections actually possible).

> > instance, so there was no way to check which queries were actually
> > running) If not, what should I look for in trying to determine the
> > cause? (Added some extra monitor output below sig in case it's needed).
>
> Well... Your innodb status, if you can connect :-)

Ack, I assume this means the 'SHOW ENGINE INNODB STATUS;' at the moment of the
problem, rather then just the output I get now?

> Can't be bothered to write down the reasoning, but the simple way to avoid
> deadlocks is to always, in all processes, lock all tables in the same
> order - alphabetically, for instance. that way deadlock gets pre-empted
> before it can occur.

Hmm, we're never using explicit (table) locks, only implicit by the innodb-
transaction, ans usually even then the transaction is just the single
statement (autocommit is usually on save for a few instances). Does order in
joins matter in terms of setting them alphabetically? That would mean checking
& rewriting a heck of a lot of codebase... Also, taking for instance the
latest detected deadlock from the innodb status, I don't know how to actually
prevent it on an SQL-query level:
------------------------
LATEST DETECTED DEADLOCK
------------------------
110524 1:30:47
*** (1) TRANSACTION:
TRANSACTION CE26925, ACTIVE 0 sec, process no 11546, OS thread id
139920988698368 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 16 lock struct(s), heap size 3112, 43 row lock(s), undo log entries
37
MySQL thread id 1022959, query id 5537985 192.168.20.7 cronscripts Updating
UPDATE connectie SET tracker=2, smsNotice=1 WHERE tracker<>0 and id IN
(5384,6276,5043,6969,5313,6155,6516,2017,6514,2711,9252,2848 ,5226,7178,7227,7228,5627,5224,48331,5250,33201,4650,5618,91 41,9140,7226,5590,5810,22011,8312,5634,5591,9242,9259,50341, 30151,5442,3743,38341,6864,6274,1065,1081,7214,5332,5654,614 1,1613,47433,4183,8042,7971,5143,6859,50001,6131,5388,6097,7 176,7317,9204,19051,2280,6085,8656,3575,7983,9248,2846,3365, 5822,5725,37741,9230,6320,9227,20191,6988,5126,30691,5374,53 78,7250,4458,6026,8107,51651,3904,39982,7126,5318,7181,7940, 8327,4251,4319,9041,4447,7394,5314,6439,5978,5979,4662,38861 ,17121,6403,6004,6088,6987,6019,6078,6759,5813,6964,5570,632 2,9671,7415,7969,6775,21881,38331,6151,6713,7944,6606,6245,6 767,6726,6980,31131,6479,36411,22001,47033,7407,6821,7234,90 27,7945,7398,7421,9118,12791,6903,7229,18071,7395,7169,7422, 7269,35601,14821,887
7,8927,8928,15971,8302,35291,8924,8733,17891,34201,8639,9501 ,8630,8965,16021
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 987 page no 53 n bits 328 index `PRIMARY` of table
`tname`.`connectie` trx id CE26925 lock_mode X locks rec but not gap waiting
Record lock, heap no 47 PHYSICAL RECORD: n_fields 12; compact format; info
bits 0
0: len 4; hex 800016b2; asc ;;
1: len 6; hex 00000c9a84b4; asc ;;
2: len 7; hex 2e000009722a52; asc . r*R;;
3: len 4; hex 80000010; asc ;;
4: len 9; hex 70697a7a6174657374; asc rtttest;;
5: len 9; hex 70697a7a6174657374; asc rtttest;;
6: len 3; hex 333839; asc 389;;
7: len 4; hex 80000003; asc ;;
8: len 4; hex 80000000; asc ;;
9: len 4; hex 80000000; asc ;;
10: len 4; hex 80000000; asc ;;
11: len 11; hex 76657273696f6e20322e30; asc version 2.0;;

*** (2) TRANSACTION:
TRANSACTION CE26923, ACTIVE 0 sec, process no 11546, OS thread id
139921021904640 starting index read, thread declared inside InnoDB 362
mysql tables in use 2, locked 2
81 lock struct(s), heap size 14776, 2663 row lock(s)
MySQL thread id 1022784, query id 5537983 46.144.148.174 utr-fax Sending data
UPDATE
`order` o,
connectie c
SET
o.datum=o.datum,
o.bcDeviceId=11,
o.bcStatus= IF(o.bcStatus = 9, 10, 8),
o.bcInBehandeling='2011-05-24
01:30:47'
WHERE
o.order_statusOrder_status_id=2
AND o.connectieId=c.id
AND (c.bellen=2 OR c.bellen=3)
AND o.bcDeviceId=0
AND ( o.bcStatus=0 OR o.bcStatus=9 )
AND o.bevestigingsId=0
AND o.datum >= '2011-05-23 01:30:47'
AND o.datum < '2011-05-24 01:30:37'
*** (2) HOLDS THE LOCK(S):


*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 987 page no 48 n bits 328 index `PRIMARY` of table
`tname`.`connectie` trx id CE26923 lock mode S locks rec but not gap waiting
Record lock, heap no 241 PHYSICAL RECORD: n_fields 12; compact format; info
bits 0
0: len 4; hex 8000146a; asc j;;
1: len 6; hex 00000ce26925; asc i%;;
2: len 7; hex 5c00001ba11c2a; asc \ *;;
3: len 4; hex 80000010; asc ;;
4: len 9; hex 70697a7a6174657374; asc rtttest;;
5: len 9; hex 70697a7a6174657374; asc rtttest;;
6: len 3; hex 333638; asc 368;;
7: len 4; hex 80000002; asc ;;
8: len 4; hex 80000000; asc ;;
9: len 4; hex 80000000; asc ;;
10: len 4; hex 80000001; asc ;;
11: len 11; hex 76657273696f6e20322e30; asc version 2.0;;

*** WE ROLL BACK TRANSACTION (1)

Thanks,
--
Rik Wasmus





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 16:30:13 von Johan De Meersman

----- Original Message -----
> From: "Rik Wasmus"
>
> 2 simultaneous queries with the same thread-id? Is that possible?

No, not with the same thread-id. Deadlock occurs when two threads attempt to lock the same resources, but do so in an order which causes both sides to hold part of the resources the other needs.

Say you have tables a and b, both wanted by threads 1 and 2. At roughly the same time, thread 1 locks a while thread 2 locks b. The next obvious step for both is that 1 tries to lock b and 2 tries to lock a; but those resources are both in use, so at this point both threads must wait for the other to free up the resources. Since neither of them can complete and free the held locks for the other to use, deadlock occurs.

In theory, this will last until one of them times out or gives up. In practice, the engine will (mostly...) notice that the second thread is trying to initiate a deadlock and unceremoniously shoot the bastard in the head.

> I always thought (based on
> http://dev.mysql.com/doc/refman/5.5/en/too-many-
> connections.html) I'd get 1 super-connection without question, and I

That should be the case, and I used to understand it that way, too; but experience has taught me not to rely on that :-)

> didn't get any 'too many connections' errors, but timeouts on connecting.
> But let's say that in this case (MySQL needing a kill -9) all bets about proper
> connecting are sort of off ;) ). Nevertheless, according to the
> manual, max_user_connections configures "The maximum number of simultaneous
> connections permitted to any given MySQL user account.", which would

Hmm. you're right. Still, I'd say that it's unlikely that connection swamping happens from more than one account at once, so it should help in most instances (and has saved me before).

> mean (and a test shows) it is per-user name, and it would only work if we have
> only 1 non-admin/non-super-user, while in reality, we have several users
> with specific privileges. Not an incredible lot, but about 20 nonetheless,

'super' refers to a very specific privilege, not a random combination of administrative stuff like create/drop, and 20 users is a whole lot to give that to. You should really consider if that many need it.

> Ack, I assume this means the 'SHOW ENGINE INNODB STATUS;' at the
> moment of the problem, rather then just the output I get now?

Yep. The output is a combination of current data and since-start counters.

> Hmm, we're never using explicit (table) locks, only implicit by the
> innodb-transaction, ans usually even then the transaction is just the single
> statement (autocommit is usually on save for a few instances). Does
> order in joins matter in terms of setting them alphabetically? That would mean

Uhh... Unsure, but it might.

> checking & rewriting a heck of a lot of codebase... Also, taking for instance
> the latest detected deadlock from the innodb status, I don't know how to
> actually prevent it on an SQL-query level:

Strictly speaking, the engine should detect it and kill the thread initiating deadlock - as seen in your output there.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 18:01:57 von Rik Wasmus

On 2011-05-24 16:30:13 you wrote:
> > 2 simultaneous queries with the same thread-id? Is that possible?
>
> No, not with the same thread-id. Deadlock occurs when two threads attempt
> to lock the same resources, but do so in an order which causes both sides
> to hold part of the resources the other needs.

OK, but that would mean that the answer to the question:

"I may be wrong here, but I tend to interpret this as '140054029002496' is
trying to get an exclusive lock on 0x78733f8, on which it already has an
exclusive lock, and hence is deadlocked in some manner" is 'no there is
another query' (i.e.: it isn't locked on mistakingly acquiring a lock it
already has) rather then 'that seems likely' :)

> Say you have tables a and b, both wanted by threads 1 and 2. At roughly the
> same time, thread 1 locks a while thread 2 locks b. The next obvious step
> for both is that 1 tries to lock b and 2 tries to lock a; but those
> resources are both in use, so at this point both threads must wait for the
> other to free up the resources. Since neither of them can complete and
> free the held locks for the other to use, deadlock occurs.
>
> In theory, this will last until one of them times out or gives up. In
> practice, the engine will (mostly...) notice that the second thread is
> trying to initiate a deadlock and unceremoniously shoot the bastard in the
> head.

And in my case, the server became unusable (kept running into semaphore locks
at 769 seconds before a kill & start was given). Query timeouts / crashes I
can live with, an unresponsive server I cannot...

> > didn't get any 'too many connections' errors, but timeouts on connecting.
> > But let's say that in this case (MySQL needing a kill -9) all bets about
> > proper connecting are sort of off ;) ). Nevertheless, according to the
> > manual, max_user_connections configures "The maximum number of
> > simultaneous connections permitted to any given MySQL user account.",
> > which would
>
> Hmm. you're right. Still, I'd say that it's unlikely that connection
> swamping happens from more than one account at once, so it should help in
> most instances (and has saved me before).

Ack, could be, however, the rest of the users quickly eat away at the safety
margin. If the max_connections is set to 5000, and mainly 1 user eats it away
with for instance 4500 max_user_connections, I still have 19 other users which
can quickly gobble up the remaining 500 between them. A mere 26-27 per user
would already achieve that, and seeing as the 'runaway user' already has 4500
connections, it is very likely something is going on like a load / visitors
spike that other users also may be requiring / asking for more connections.
All in all, I don't really think the max_user_connections would/will help the
current problem.

> > mean (and a test shows) it is per-user name, and it would only work if we
> > have only 1 non-admin/non-super-user, while in reality, we have several
> > users with specific privileges. Not an incredible lot, but about 20
> > nonetheless,
>
> 'super' refers to a very specific privilege, not a random combination of
> administrative stuff like create/drop, and 20 users is a whole lot to give
> that to. You should really consider if that many need it.

I know that, that's why i said NON-super users :). There are but 2 SUPER
users, 1 solely is used by the OS for stuff like 'flush-logs' after
logrotating (may connect and do something maybe 5 times a day max), 1 is
solely used by the likes of me monitoring / debugging / administrating the
server, never by code, and hopefully rarely connects :)

> > Ack, I assume this means the 'SHOW ENGINE INNODB STATUS;' at the
> > moment of the problem, rather then just the output I get now?
>
> Yep. The output is a combination of current data and since-start counters.

OK, let's hope I never get to show that output (i.e: that the problem doesn't
reoccur). Since the server has been restarted since-start counters will
probably be pretty useless.

> > Hmm, we're never using explicit (table) locks, only implicit by the
> > innodb-transaction, ans usually even then the transaction is just the
> > single statement (autocommit is usually on save for a few instances).
> > Does order in joins matter in terms of setting them alphabetically? That
> > would mean
>
> Uhh... Unsure, but it might.
>
> > checking & rewriting a heck of a lot of codebase... Also, taking for
> > instance the latest detected deadlock from the innodb status, I don't
> > know how to
>
> > actually prevent it on an SQL-query level:
> Strictly speaking, the engine should detect it and kill the thread
> initiating deadlock - as seen in your output there.

Yup, right there it did, And that's the way I like it: kill the/a query, which
issues an error somewhere else we know if and how to handle in some
application, rather then letting a database server with a light load grind to
a halt.

My main problem at hand is why the server did nothing but seize up
gracelessly, rather then either dying (a last resort, but something we have
failovers for) or killing queries (which we can handle).
--
Rik Wasmus

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 24.05.2011 18:25:07 von Johan De Meersman

----- Original Message -----
> From: "Rik Wasmus"
>
> OK, but that would mean that the answer to the question:
>
> "I may be wrong here, but I tend to interpret this as
> '140054029002496' is trying to get an exclusive lock on 0x78733f8, on which it already has
> an exclusive lock, and hence is deadlocked in some manner" is 'no there
> is another query' (i.e.: it isn't locked on mistakingly acquiring a lock
> it already has) rather then 'that seems likely' :)

Ah, misread that. Yes, the former behaviour seems more like a bug; which is not entirely impossible of course.

> And in my case, the server became unusable (kept running into
> semaphore locks at 769 seconds before a kill & start was given). Query timeouts /
> crashes I can live with, an unresponsive server I cannot...

Which is what kind of mystifies me - it should detect deadlocks as soon as they happen.

> OK, let's hope I never get to show that output (i.e: that the problem
> doesn't reoccur). Since the server has been restarted since-start counters
> will probably be pretty useless.

Yups. A trending database (munin, cacti or something) may or may not offer much hindsight in this case (mostly a matter of luck at when it last checked); but it's definitely something useful to have at hand for plenty of other purposes.

> Yup, right there it did, And that's the way I like it: kill the/a
> query, which issues an error somewhere else we know if and how to handle in some
> application, rather then letting a database server with a light load
> grind to a halt.
>
> My main problem at hand is why the server did nothing but seize up
> gracelessly, rather then either dying (a last resort, but something
> we have failovers for) or killing queries (which we can handle).

Uhuh. You may want to take this to the mysql-dev mailinglist, the good people there might have a bit more insight about the error runes you posted.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 25.05.2011 10:08:43 von Rik Wasmus

On 2011-05-24 18:25:07 Johan De Meersman wrote:
> > OK, but that would mean that the answer to the question:
> >
> > "I may be wrong here, but I tend to interpret this as
> > '140054029002496' is trying to get an exclusive lock on 0x78733f8, on
> > which it already has an exclusive lock, and hence is deadlocked in some
> > manner" is 'no there is another query' (i.e.: it isn't locked on
> > mistakingly acquiring a lock it already has) rather then 'that seems
> > likely' :)
>
> Ah, misread that. Yes, the former behaviour seems more like a bug; which is
> not entirely impossible of course.

Ack.

> > And in my case, the server became unusable (kept running into
> > semaphore locks at 769 seconds before a kill & start was given). Query
> > timeouts / crashes I can live with, an unresponsive server I cannot...
> Which is what kind of mystifies me - it should detect deadlocks as soon as
> they happen.

Well, usually it does :)

> > OK, let's hope I never get to show that output (i.e: that the problem
> > doesn't reoccur). Since the server has been restarted since-start
> > counters will probably be pretty useless.
>
> Yups. A trending database (munin, cacti or something) may or may not offer
> much hindsight in this case (mostly a matter of luck at when it last
> checked); but it's definitely something useful to have at hand for plenty
> of other purposes.

Cacti does store a lot of things by snmp, that's the way I know memory, CPU
usage & average load never showed a hitch, all's well according to the OS,
only MySQL is slowly dying...

> > Yup, right there it did, And that's the way I like it: kill the/a
> > query, which issues an error somewhere else we know if and how to handle
> > in some application, rather then letting a database server with a light
> > load grind to a halt.
> >
> > My main problem at hand is why the server did nothing but seize up
> > gracelessly, rather then either dying (a last resort, but something
> > we have failovers for) or killing queries (which we can handle).
>
> Uhuh. You may want to take this to the mysql-dev mailinglist, the good
> people there might have a bit more insight about the error runes you
> posted.

OK, will do, thanks for the help, maybe I'll also file a bug, seems something
that should be fixed :)
--
Rik Wasmus

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 25.05.2011 12:24:37 von Johan De Meersman

----- Original Message -----
> From: "Rik Wasmus"
>
> Cacti does store a lot of things by snmp, that's the way I know
> memory, CPU usage & average load never showed a hitch, all's well according to
> the OS, only MySQL is slowly dying...

So what stops you from writing MySQL plugins? Probably plenty out there already, too.


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 25.05.2011 12:44:35 von Rik Wasmus

On 2011-05-25 12:24:37 you wrote:
> > Cacti does store a lot of things by snmp, that's the way I know
> > memory, CPU usage & average load never showed a hitch, all's well
> > according to the OS, only MySQL is slowly dying...
>
> So what stops you from writing MySQL plugins? Probably plenty out there
> already, too.

A plugin what for? Also time, money, effort. I didn't get hired as a MySQL-
plugin writer here, and my list of pet-projects for my personal time is rather
full at the moment :)
--
Rik Wasmus

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: After long semaphore waits MySQL becomes unresponsive.

am 25.05.2011 13:04:35 von Johan De Meersman

----- Original Message -----
> From: "Rik Wasmus"
>
> A plugin what for? Also time, money, effort. I didn't get hired as a
> MySQL-plugin writer here, and my list of pet-projects for my personal time
> is rather full at the moment :)

For keeping track of various interesting variables like query cache hit rate, memory usage, et cetera.

As I said, most things you want are probably already out there. It's part of proper service management :-)


--
Bier met grenadyn
Is als mosterd by den wyn
Sy die't drinkt, is eene kwezel
Hy die't drinkt, is ras een ezel

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org