Re: Using TASK_SIZE for kernel threads
From: Martin Schwidefsky
Date: Mon Feb 27 2017 - 04:43:29 EST
On Sat, 25 Feb 2017 10:19:04 -0800
Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Feb 24, 2017 at 8:15 AM, Martin Schwidefsky
> <schwidefsky@xxxxxxxxxx> wrote:
> >
> > Now I fixed this in the s390 code, the patch is queued and will be
> > included in next weeks please-pull. But I am wondering about the use
> > of TASK_SIZE in kernel threads. For x86 copy_mount_options works
> > because the size calculation will give a negative result for 'data'
> > pointing to kernel space. Which is corrected by the size limit:
> >
> > if (size > PAGE_SIZE)
> > size = PAGE_SIZE;
> >
> > Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
> > size=4096 in this case? The detour via TASK_SIZE does not make much
> > sense to me.
> >
> > To find out how big the problem is, I have added a warning to TASK_SIZE
> > to create a console messsage if it is called for a task without an mm.
> > The only hit has been copy_mount_options.
>
> So copy_mount_options() is a horrible hack. It doesn't have a size
> limit, and it can copy binary data, so our good auto-limiting code in
> strncpy_from_user() isn't usable either.
>
> It probably *should* use the same user_addr_max() logic that
> strncpy_from_user() uses, but that wouldn't actually have helped s390,
> because s390 doesn't use the generic strncpy_from_user(), and doesn't
> have that user_addr_max() thing.
I see, set_fs(KERNEL_DS) sets a different address for user_addr_max to
return. That would work but requires that all architectures have the
define.
> So from everything I see, I think this is actually a s390 bug in every
> way. Your TASK_SIZE_OF() implementation is simply bogus and broken,
> and that's the core problem.
>
> For example, you could have just had
>
> #define user_addr_max() (current_thread_info()->addr_limit.seg)
>
> like some other architectures, and it would have been all good.
The background is that TASK_SIZE on s390 is not a constant, it depends
on the layout of the mm. There are three, 2GB for 31-bit with a 2-level
page table, 4TB for a standard 64-bit process with a 3-level page table
and 8PB with 4 levels for a process that did a really large mmap.
The upgrade from 4TB to 8PB is at runtime, that is why the size
of the mm is stored in mm->context. It is an attribute of the mm, if
one thread changes it, it changes for all threads.
> If somebody is willing to add user_addr_max() to all architectures and
> make copy_mount_options() use the same logic as
> lib/strncpy_from_user.c, then that would certainly be acceptable to
> me. As it is, I think it uses TASK_SIZE in ways that are not pretty,
> but are what they are..
I guess that won't happen anytime soon. I will use the proposed fix
within the arch code. Thanks.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.