Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM

From: Harris, Robert
Date: Wed Nov 13 2019 - 05:15:28 EST




> On 13 Nov 2019, at 09:04, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, 12 Nov 2019, Harris, Robert wrote:
>
>> I am investigating an issue on 4.9.184 in which futex() returns EPERM
>> intermittently for
>>
>> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>>
>> The failure affects an application in an AWS lambda; traditional
>> debugging approaches vary from difficult to impossible. I cannot
>> reproduce the problem at will, instrument the kernel, install a new
>> kernel or get an application core dump.
>>
>> Understanding the circumstances under which EPERM can be returned for
>> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
>> mode. I have spent some time looking through futex.c but have not
>> found anything yet. I would be grateful for a hint from someone more
>> knowledgeable.
>
> sys_futex(FUTEX_WAIT_PRIVATE) does not return -EPERM. Only the PI variants
> do that.

In that case I would appreciate a second pair of eyes. The error I see
(intermittently) is

pthread/ethr_event.c:164: Fatal error in wait__(): Operation not permitted (1)

which comes from

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/pthread/ethr_event.c#L152-L164

> res = ETHR_FUTEX__(&e->futex,
> ETHR_FUTEX_WAIT__,
> ETHR_EVENT_OFF_WAITER__,
> tsp);
> switch (res) {
> case EINTR:
> case ETIMEDOUT:
> return res;
> case 0:
> case EWOULDBLOCK:
> break;
> default:
> ETHR_FATAL_ERROR__(res);

where

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/ethread.h#L259-L260

> #define ETHR_FATAL_ERROR__(ERR) \
> ethr_fatal_error__(__FILE__, __LINE__, __func__, (ERR))

and

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/common/ethr_aux.c#L725-L741

> ETHR_IMPL_NORETURN__ ethr_fatal_error__(const char *file,
> int line,
> const char *func,
> int err)
> {
> char *errstr;
> if (err == ENOTSUP)
> errstr = "Operation not supported";
> else {
> errstr = strerror(err);
> if (!errstr)
> errstr = "Unknown error";
> }
> fprintf(stderr, "%s:%d: Fatal error in %s(): %s (%d)\n",
> file, line, func, errstr, err);
> ethr_abort__();
> }

and

https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/pthread/ethr_event.h#L38-L58

> #if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
> # define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
> # define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
> #else
> # define ETHR_FUTEX_WAIT__ FUTEX_WAIT
> # define ETHR_FUTEX_WAKE__ FUTEX_WAKE
> #endif
>
> typedef struct {
> ethr_atomic32_t futex;
> } ethr_event;
>
> #define ETHR_FUTEX__(FTX, OP, VAL, TIMEOUT)\
> (-1 == syscall(__NR_futex,\
> (void *) ethr_atomic32_addr((FTX)),\
> (OP),\
> (int) (VAL),\
> (TIMEOUT),\
> NULL,\
> 0)\
> ? errno : 0)

To be sure:

> 0x0000000000687e65 <+325>: mov $0x80,%edx
> 0x0000000000687e6a <+330>: mov $0xca,%edi
> 0x0000000000687e6f <+335>: callq 0x443ab0 <syscall@plt>

Thanks,

Robert


Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.