Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM
From: Harris, Robert
Date: Wed Nov 13 2019 - 05:15:28 EST
> On 13 Nov 2019, at 09:04, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Tue, 12 Nov 2019, Harris, Robert wrote:
>
>> I am investigating an issue on 4.9.184 in which futex() returns EPERM
>> intermittently for
>>
>> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>>
>> The failure affects an application in an AWS lambda; traditional
>> debugging approaches vary from difficult to impossible. I cannot
>> reproduce the problem at will, instrument the kernel, install a new
>> kernel or get an application core dump.
>>
>> Understanding the circumstances under which EPERM can be returned for
>> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
>> mode. I have spent some time looking through futex.c but have not
>> found anything yet. I would be grateful for a hint from someone more
>> knowledgeable.
>
> sys_futex(FUTEX_WAIT_PRIVATE) does not return -EPERM. Only the PI variants
> do that.
In that case I would appreciate a second pair of eyes. The error I see
(intermittently) is
pthread/ethr_event.c:164: Fatal error in wait__(): Operation not permitted (1)
which comes from
https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/pthread/ethr_event.c#L152-L164
> res = ETHR_FUTEX__(&e->futex,
> ETHR_FUTEX_WAIT__,
> ETHR_EVENT_OFF_WAITER__,
> tsp);
> switch (res) {
> case EINTR:
> case ETIMEDOUT:
> return res;
> case 0:
> case EWOULDBLOCK:
> break;
> default:
> ETHR_FATAL_ERROR__(res);
where
https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/ethread.h#L259-L260
> #define ETHR_FATAL_ERROR__(ERR) \
> ethr_fatal_error__(__FILE__, __LINE__, __func__, (ERR))
and
https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/lib_src/common/ethr_aux.c#L725-L741
> ETHR_IMPL_NORETURN__ ethr_fatal_error__(const char *file,
> int line,
> const char *func,
> int err)
> {
> char *errstr;
> if (err == ENOTSUP)
> errstr = "Operation not supported";
> else {
> errstr = strerror(err);
> if (!errstr)
> errstr = "Unknown error";
> }
> fprintf(stderr, "%s:%d: Fatal error in %s(): %s (%d)\n",
> file, line, func, errstr, err);
> ethr_abort__();
> }
and
https://github.com/erlang/otp/blob/348e328375fb774b3fa919ffd1c4811367406516/erts/include/internal/pthread/ethr_event.h#L38-L58
> #if defined(FUTEX_WAIT_PRIVATE) && defined(FUTEX_WAKE_PRIVATE)
> # define ETHR_FUTEX_WAIT__ FUTEX_WAIT_PRIVATE
> # define ETHR_FUTEX_WAKE__ FUTEX_WAKE_PRIVATE
> #else
> # define ETHR_FUTEX_WAIT__ FUTEX_WAIT
> # define ETHR_FUTEX_WAKE__ FUTEX_WAKE
> #endif
>
> typedef struct {
> ethr_atomic32_t futex;
> } ethr_event;
>
> #define ETHR_FUTEX__(FTX, OP, VAL, TIMEOUT)\
> (-1 == syscall(__NR_futex,\
> (void *) ethr_atomic32_addr((FTX)),\
> (OP),\
> (int) (VAL),\
> (TIMEOUT),\
> NULL,\
> 0)\
> ? errno : 0)
To be sure:
> 0x0000000000687e65 <+325>: mov $0x80,%edx
> 0x0000000000687e6a <+330>: mov $0xca,%edi
> 0x0000000000687e6f <+335>: callq 0x443ab0 <syscall@plt>
Thanks,
Robert
Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.