non-blocking connect and EAGAIN

non-blocking connect and EAGAIN

am 18.09.2007 19:26:28 von Dmitriy MiksIr

Hello!
I got a lot of mysql errors "Can't connect to local MySQL server through
socket '/var/lib/mysql/mysql.sock' (11)".

I trace one of this error and see, what non-blocking connect return
EAGAIN. See:
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_FILE, path="/var/lib/mysql/mysql.sock"}, 110) =
-1 EAGAIN (Resource temporarily unavailable)

Mysql's connect do not detect this error:
if ((res != 0) && (s_err != EINPROGRESS))
{
errno= s_err; /* Restore it */
return(-1);
}

Is this kernel bug (Linux 2.6.16-std26-smp-alt1)?... which return EAGAIN
instead of EINPROGRESS, or some other troubles can force EAGAIN on unix
socket connect?


--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals?unsub=gcdmd-internals@m.gma ne.org

Re: non-blocking connect and EAGAIN

am 19.09.2007 07:40:39 von Vladimir Shebordaev

Hi, Dmitriy,

would you please specify when you get those reconnects?

The Linux connect() system call on non-blocking AF_UNIX sockets
should return immediately with EAGAIN when the peer's backlog
queue is full. Otherwise connect() will block until there is some
room available on receiving end. MySQL client intention is to
literally follow that system call when there is no timeout option
explicitly specified (see the comments in my_connect() right
above the lines you've cited). So, what you get looks like
intended behavior from both kernel and MySQL side.

Please check out the MySQL 5.0 trouble shooting page at
.
You've probably got your server crashed or stalled due to some
real bug. If so, you should try to reproduce it and file a bug
report. But please upgrade to decent MySQL version first of all.

In the hope it helps.

Regards,
Vladimir


Dmitriy MiksIr wrote:
> Hello!
> I got a lot of mysql errors "Can't connect to local MySQL server through
> socket '/var/lib/mysql/mysql.sock' (11)".
>
> I trace one of this error and see, what non-blocking connect return
> EAGAIN. See:
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_FILE, path="/var/lib/mysql/mysql.sock"}, 110) =
> -1 EAGAIN (Resource temporarily unavailable)
>
> Mysql's connect do not detect this error:
> if ((res != 0) && (s_err != EINPROGRESS))
> {
> errno= s_err; /* Restore it */
> return(-1);
> }
>
> Is this kernel bug (Linux 2.6.16-std26-smp-alt1)?... which return EAGAIN
> instead of EINPROGRESS, or some other troubles can force EAGAIN on unix
> socket connect?
>
>

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals?unsub=gcdmd-internals@m.gma ne.org

Re: non-blocking connect and EAGAIN

am 19.09.2007 16:04:53 von Chad Miller

--Apple-Mail-1--697387002
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

Hi, Dmitriy, Vladimir!


On 19 Sep 2007, at 07:40, Vladimir Shebordaev wrote:

> Hi, Dmitriy,
>
> would you please specify when you get those reconnects?
>
> The Linux connect() system call on non-blocking AF_UNIX sockets
> should return immediately with EAGAIN when the peer's backlog queue
> is full.

Vladimir's right here. The Linux kernel doesn't normally send errno
EINPROGRESS, but it does send EAGAIN for this case:


if (skb_queue_len(&other->sk_receive_queue) >
other->sk_max_ack_backlog) {
err = -EAGAIN;
if (!timeo)
goto out_unlock;

timeo = unix_wait_for_peer(other, timeo);

err = sock_intr_errno(timeo);
if (signal_pending(current))
goto out;
sock_put(other);
goto restart;
}


Notably, the BSDs don't send EAGAIN, as far as I can tell.

> Otherwise connect() will block until there is some room available
> on receiving end. MySQL client intention is to literally follow
> that system call when there is no timeout option explicitly
> specified (see the comments in my_connect() right above the lines
> you've cited). So, what you get looks like intended behavior from
> both kernel and MySQL side.

Agreed, for the most part. (I don't know that the kernel sends
EAGAIN /only/ for no-timeout/non-blocking connect()ion attempts. I
didn't dig wider than the above.)

The Linux kernel truly couldn't accept the connect() syscall, and
this is a valid problem. The library code behaves correctly because
the library /should/ pass errors from the kernel up to the client.
This specific case isn't one I think we considered, but client code
should handle all errors the OS could generate; the library shouldn't
insulate the client from the kernel, but it should from the server.

> Please check out the MySQL 5.0 trouble shooting page at > dev.mysql.com/doc/refman/5.0/en/can-not-connect-to-server.ht ml>.
> You've probably got your server crashed or stalled due to some real
> bug. If so, you should try to reproduce it and file a bug report.
> But please upgrade to decent MySQL version first of all.

It could be a crashed server that's causing the problem, I suppose.
More likely, if it's not, please keep us included if there's another
bottleneck in connecting that you find.

- chad



> Dmitriy MiksIr wrote:
>> Hello!
>> I got a lot of mysql errors "Can't connect to local MySQL server
>> through socket '/var/lib/mysql/mysql.sock' (11)".
>> I trace one of this error and see, what non-blocking connect
>> return EAGAIN. See:
>> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>> connect(3, {sa_family=AF_FILE, path="/var/lib/mysql/mysql.sock"},
>> 110) = -1 EAGAIN (Resource temporarily unavailable)
>> Mysql's connect do not detect this error:
>> if ((res != 0) && (s_err != EINPROGRESS))
>> {
>> errno= s_err; /* Restore it */
>> return(-1);
>> }
>> Is this kernel bug (Linux 2.6.16-std26-smp-alt1)?... which return
>> EAGAIN instead of EINPROGRESS, or some other troubles can force
>> EAGAIN on unix socket connect?



--
Chad Miller, Software Developer chad@mysql.com
MySQL Inc., www.mysql.com
Orlando, Florida, USA 13-20z, UTC-0400
Office: +1 408 213 6740 sip:6740@sip.mysql.com



--Apple-Mail-1--697387002
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)

iD8DBQFG8SyG/peCpMTxrLsRAgfUAKCYKgUIISjEnAG5wBEcsFQ9No/tVACf Z7qI
V93Wbsk0rdhuBf9qdkdSYDQ=
=eM80
-----END PGP SIGNATURE-----

--Apple-Mail-1--697387002--

Re: non-blocking connect and EAGAIN

am 20.09.2007 01:34:11 von Vladimir Shebordaev

Hi, Chad!

It's nice to hear from you.

Chad MILLER wrote:
> Hi, Dmitriy, Vladimir!
>
>
> On 19 Sep 2007, at 07:40, Vladimir Shebordaev wrote:
>
>> Hi, Dmitriy,
>>
>> would you please specify when you get those reconnects?
>>
>> The Linux connect() system call on non-blocking AF_UNIX sockets should
>> return immediately with EAGAIN when the peer's backlog queue is full.
>
> Vladimir's right here. The Linux kernel doesn't normally send errno
> EINPROGRESS, but it does send EAGAIN for this case:
>
>
> if (skb_queue_len(&other->sk_receive_queue) >
> other->sk_max_ack_backlog) {
> err = -EAGAIN;
> if (!timeo)
> goto out_unlock;
>
> timeo = unix_wait_for_peer(other, timeo);
>
> err = sock_intr_errno(timeo);
> if (signal_pending(current))
> goto out;
> sock_put(other);
> goto restart;
> }
>
>
> Notably, the BSDs don't send EAGAIN, as far as I can tell.
>

Ok. Basically this Linux kernel behavior is obviously not POSIX
compliant. As of now. All of a sudden :)

Linux kernel guys say current POSIX specification is merely
broken, see the comment above sys_connect() implementation. The
comment in the net/unix/af_unix.c prologue states as to AF_UNIX
connect(), for performance and (very) old BSD compatibility
reasons the current behavior is intentional and not to be fixed
until checked against POSIX requirements. I also think proper
change would break some (very important) existing applications
since the relevant code - the one that aborts non-blocking
connection with EAGAIN - lives there since 2.2.6 at least.

This behavior is evidently discrepant from LSB specification
which is essentially SUSv3. So, I guess it is possible to
convince Linux kernel guys to comply with themselves, but that
would take a while and could be fixed not earlier than 2.6.23+ if
ever will be. Anyways it wouldn't be of any help to vast majority
of MySQL users at the moment.

>> Otherwise connect() will block until there is some room available on
>> receiving end. MySQL client intention is to literally follow that
>> system call when there is no timeout option explicitly specified (see
>> the comments in my_connect() right above the lines you've cited). So,
>> what you get looks like intended behavior from both kernel and MySQL
>> side.
>
> Agreed, for the most part. (I don't know that the kernel sends EAGAIN
> /only/ for no-timeout/non-blocking connect()ion attempts. I didn't dig
> wider than the above.)
>

Well, there is almost nothing to dig into. unix_stream_connect()
is immediately invoked by sys_connect() that returns its result
untouched to user space afterwards.

As for this code snippet, by default it sleeps indefinitely on
blocking sockets due to sk->sk_sndtimeo is set to
MAX_SCHEDULE_TIMEOUT (some kind of infinity) in sock_init_data()
and is to be changed just with SO_SNDTIMEO socket option. Btw,
vio_timeout() does set this option. Unfortunately, timeo here is
forced to be zero for non-blocking sockets in a few lines above.
So, my_connect() would return EINTR when timeout expires if it
blocked in connect().

> The Linux kernel truly couldn't accept the connect() syscall, and this
> is a valid problem. The library code behaves correctly because the
> library /should/ pass errors from the kernel up to the client. This
> specific case isn't one I think we considered, but client code should
> handle all errors the OS could generate; the library shouldn't insulate
> the client from the kernel, but it should from the server.
>

I daresay I could prepare a decent patch... But ain't you about
to bereave mysql customers of those tons messages when mysqld
hangs or crashes on Linux? ;)

>> Please check out the MySQL 5.0 trouble shooting page at
>> .
>> You've probably got your server crashed or stalled due to some real
>> bug. If so, you should try to reproduce it and file a bug report. But
>> please upgrade to decent MySQL version first of all.
>
> It could be a crashed server that's causing the problem, I suppose.
> More likely, if it's not, please keep us included if there's another
> bottleneck in connecting that you find.
>

Well, I'd better not be that hasty. Probably it's quite a long story.

> - chad
>

Regards,
Vladimir

>
>
>> Dmitriy MiksIr wrote:
>>> Hello!
>>> I got a lot of mysql errors "Can't connect to local MySQL server
>>> through socket '/var/lib/mysql/mysql.sock' (11)".
>>> I trace one of this error and see, what non-blocking connect return
>>> EAGAIN. See:
>>> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>>> connect(3, {sa_family=AF_FILE, path="/var/lib/mysql/mysql.sock"},
>>> 110) = -1 EAGAIN (Resource temporarily unavailable)
>>> Mysql's connect do not detect this error:
>>> if ((res != 0) && (s_err != EINPROGRESS))
>>> {
>>> errno= s_err; /* Restore it */
>>> return(-1);
>>> }
>>> Is this kernel bug (Linux 2.6.16-std26-smp-alt1)?... which return
>>> EAGAIN instead of EINPROGRESS, or some other troubles can force
>>> EAGAIN on unix socket connect?
>
>


--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals?unsub=gcdmd-internals@m.gma ne.org