Re: [patch] mm: Fix SHM_HUGETLB to work with users inhugetlb_shm_group

From: Mel Gorman
Date: Tue Feb 10 2009 - 06:09:26 EST


On Thu, Feb 05, 2009 at 11:08:51AM -0800, Ravikiran G Thirumalai wrote:
> Thanks for your comments Mel.
>
> On Thu, Feb 05, 2009 at 01:25:29PM +0000, Mel Gorman wrote:
> >On Wed, Feb 04, 2009 at 04:41:57PM -0800, Ravikiran G Thirumalai wrote:
> >
> >========
> >> Fix hugetlb subsystem so that non root users belonging to hugetlb_shm_group
> >> can actually allocate hugetlb backed shm.
> >>
> >> Currently non root users cannot even map one large page using SHM_HUGETLB
> >> when they belong to the gid in /proc/sys/vm/hugetlb_shm_group.
> >> This is because allocation size is verified against RLIMIT_MEMLOCK resource
> >> limit even if the user belongs to hugetlb_shm_group.
> >>
> >> This patch
> >> 1. Fixes hugetlb subsystem so that users with CAP_IPC_LOCK and users
> >> belonging to hugetlb_shm_group don't need to be restricted with
> >> RLIMIT_MEMLOCK resource limits
> >> 2. If a user has sufficient memlock limit he can still allocate the hugetlb
> >> shm segment.
> >>
> >
> >Point 1 I'm happy with, point 2 less so. It alters the semantics of the
> >locked rlimit beyond what is necessary to fix the problem - i.e. a user
> >in the group should be allowed to use hugepages with shmget(). Minimally,
> >there should be two separate patches.
>
> I see your point, and I was initially leaning towards 1. only -- that is not
> validate against memlock rlimit at all. But, I kinda understand Bill's
> comments about still honoring the rlimit because this is the only way to map
> SHM_HUGETLB currently, and seems like all oracle users currently do that.

Yeah, Bill convinced me of same. I'm not happy that applications will basically
be depending on weird behaviour but breaking them because it's "cleaner"
is a bit rude and I wanted to move away from hugepages being a root-only thing.

> This is a compatibility issue and sysadmins will have to change from using
> /etc/security/limits.conf to a gid based sysctl in /etc/sysctl.conf
> (both based on distros) to let users use hugetlb backed shm. I agree this
> still keeps some inconsistency around, so how about letting people still use
> rlimit based checks, but marking it deprecated by adding this to
> feature-removal-schedule.txt?
>

I'm ok with that. Should we print a KERN_INFO message once when someone
is depending on rlimits without being part of the group warning that
this behaviour will not be allowed in a future release?

> >
> >> Signed-off-by: Ravikiran Thirumalai <kiran@xxxxxxxxxxxx>
> >>
> >> ---
> >>
> >> Documentation/vm/hugetlbpage.txt | 11 ++++++-----
> >> fs/hugetlbfs/inode.c | 18 ++++++++++++------
> >> include/linux/mm.h | 2 ++
> >> mm/mlock.c | 11 ++++++++---
> >> 4 files changed, 28 insertions(+), 14 deletions(-)
> >>
> >> Index: linux-2.6-tip/fs/hugetlbfs/inode.c
> >> ===================================================================
> >> --- linux-2.6-tip.orig/fs/hugetlbfs/inode.c 2009-02-04 15:21:45.000000000 -0800
> >> +++ linux-2.6-tip/fs/hugetlbfs/inode.c 2009-02-04 15:23:19.000000000 -0800
> >> @@ -943,8 +943,15 @@ static struct vfsmount *hugetlbfs_vfsmou
> >> static int can_do_hugetlb_shm(void)
> >> {
> >> return likely(capable(CAP_IPC_LOCK) ||
> >> - in_group_p(sysctl_hugetlb_shm_group) ||
> >> - can_do_mlock());
> >> + in_group_p(sysctl_hugetlb_shm_group));
> >> +}
> >> +
> >> +static void acct_huge_shm_lock(size_t size, struct user_struct *user)
> >> +{
> >> + unsigned long pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >> + spin_lock(&shmlock_user_lock);
> >> + acct_shm_lock(pages, user);
> >> + spin_unlock(&shmlock_user_lock);
> >> }
> >
> >This should be split into another patch (i.e. three in all). The first patch
> >allows users in thh shm_group to use huge pages. The second that accounts
> >for locked_shm properly. The third allows users with a high enough locked
> >rlimit to use shmget() with hugepages. However, my feeling right now would
> >be to ack 1, re-reread 2 and nak 3.
>
> I totally agree. In fact yesterday I was thinking of resending this patch
> to not account for shm memory when a user is not validated against rlimits
> (when he has CAP_IPC_LOCK or if he belongs to the sysctl_hugetlb_shm_group).
>

It would make it more consistent with the filesystem which doesn't check
rlimits.

> As I see it there must be two parts:
> 1. Free ticket to CAP_IPC_LOCK and users belonging to sysctl_hugetlb_shm_group
> 2. Patch to have users not having CAP_IPC_LOCK or sysctl_hugetlb_shm_group
> to check against memlock rlimits, and account it. Also mark this
> deprecated in feature-removal-schedule.txt
>
> Would this be OK?
>

Sounds good. Thanks a lot.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/