Re: [PATCH] futex: Fix ZERO_PAGE cause infinite loop

From: KOSAKI Motohiro
Date: Thu Dec 24 2009 - 20:51:40 EST


> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 8e3c3ff..79b89cc 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -254,6 +254,8 @@ again:
> >
> > page = compound_head(page);
> > lock_page(page);
> > + if (is_zero_pfn(page_to_pfn(page)))
> > + goto anon_key;
> > if (!page->mapping) {
> > unlock_page(page);
> > put_page(page);
> > @@ -268,6 +270,7 @@ again:
> > * the object not the particular process.
> > */
> > if (PageAnon(page)) {
> > + anon_key:
> > key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
> > key->private.mm = mm;
> > key->private.address = address;
>
> Doesn't it make more sense to simply fail the futex op?
>
> What I mean is, all (?) futex ops assume userspace actually wrote
> something to the address in question to begin with, therefore it can
> never be the zero page.
>
> So it being the zero page means someone goofed up, and we should bail,
> no?

Hm. probably we need to discuss more.

Firstly, if we assume current glibc implimentation, you are right,
we can assume userland always initialize the page explicitly before using
futex. then we never seen zero page in futex.

but, I don't think futex itself assume it now. at least man page
doesn't describe such limilation. so, if you prefer bail and man fix,
I'm acceptable maybe.

I don't know any library except glibc use futex directly, or not.
but we don't have any reason to assume futex is used only glibc.
(plus, glibc might change its implementation in future release, perhaps)

following testcase is more practice than last mail's one.
if we add zero page bail logic, this testcase mihgt be fail.

----------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <linux/futex.h>
#include <pthread.h>

void * futex_wait(void *arg)
{
void *addr = arg;
int ret;

fprintf(stderr, "futex wait\n");
ret = syscall( SYS_futex, addr, FUTEX_WAIT, 0, NULL, NULL, NULL);
if (ret != 0 && errno != EWOULDBLOCK) {
perror("futex error.\n");
exit(1);
}
fprintf(stderr, "futex_wait: ret = %d, errno = %d\n", ret, errno);

return NULL;
}

int main(int argc, char **argv)
{
long page_size;
pthread_t thr;
int ret;
void *buf;

page_size = sysconf(_SC_PAGESIZE);

buf = mmap(NULL, page_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (buf == (void *)-1) {
perror("mmap error.\n");
exit(1);
}

ret = pthread_create(&thr, NULL, futex_wait, buf);
if (ret < 0) {
fprintf(stderr, "pthread_create error\n");
exit(1);
}

sleep(10);

fprintf(stderr, "futex wake\n");
*((int*)buf) = 1;
ret = syscall( SYS_futex, buf, FUTEX_WAKE, 1, NULL, NULL, NULL);
fprintf(stderr, "futex_wake: ret = %d, errno = %d\n", ret, errno);

fprintf(stderr, "join\n");
pthread_join(thr, NULL);

return 0;
}
--------------------------------------------------------------------------

man page fix has another difficulty. in past days, zero pages was perfectly
transparent from userland. then any man page don't describe "what's zero page".
then some administrator don't know about zero page at all.

So, if your main disliked point is goto statement, following patch is ok
to me.

Finally, I agree this is really corner case issue. I can agree man page fix
too. but I hope to know which point you dislike in my patch.

Thanks.