Re: 2.6.6-mm2
From: Andrew Morton
Date: Thu May 13 2004 - 06:28:18 EST
Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>
> On Thu, May 13, 2004 at 03:51:34AM -0700, Andrew Morton wrote:
> > Once I'm convinced that kernel.org kernels will be able to run applications
> > which vendor kernels will run, sure.
> >
> > We're nowhere near that, and your continual whining gets us no closer.
>
> Sorry, but this argumentation is utter bullshit.
Wim explained that any application changes now won't be widely deployed for
another year. During that period the ability to run existing Oracle setups
requires that hugepage allocation be available to unprivileged
applications.
> If $VENDORKERNEL/freebsd/sco/windows2000 runs $APP and we don't, what
> does this mean? Right, exactly nothing.
It means that if people install a kernel.org machine on their database
server, the database *just won't work*. This is not good for those users,
for the kernel developers or for Linux's reputation in general.
It's worth a very small, extremely easily maintainable patch to fix all
this up.
And this is not just any old application.
> I've talked to three persons at Oracle and neither likes it at all, in
> fact en Oracle employee is working on doing quota for hugetlbfs which
> fixes this properly.
One year.
> Merging some horrible hacks that completly change
> the authorization model (for a special case, that is)
If you need to exaggerate this much to make your point, it isn't a very
good point.
> in the middle of
> stable series doesn't get us anywhere, except into a horrible unmaintable
> mess.
Here's your "horrible unmaintainable mess":
diff -puN fs/hugetlbfs/inode.c~hugetlb_shm_group-sysctl-patch fs/hugetlbfs/inode.c
--- 25/fs/hugetlbfs/inode.c~hugetlb_shm_group-sysctl-patch 2004-05-10 04:48:58.627456560 -0700
+++ 25-akpm/fs/hugetlbfs/inode.c 2004-05-10 04:48:58.640454584 -0700
@@ -43,6 +43,8 @@ static struct backing_dev_info hugetlbfs
.memory_backed = 1, /* Does not contribute to dirty memory */
};
+int sysctl_hugetlb_shm_group;
+
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file->f_dentry->d_inode;
@@ -718,6 +720,12 @@ static unsigned long hugetlbfs_counter(v
return ret;
}
+static int can_do_hugetlb_shm(void)
+{
+ return likely(capable(CAP_IPC_LOCK) ||
+ in_group_p(sysctl_hugetlb_shm_group));
+}
+
struct file *hugetlb_zero_setup(size_t size)
{
int error;
@@ -727,7 +735,7 @@ struct file *hugetlb_zero_setup(size_t s
struct qstr quick_string;
char buf[16];
- if (!capable(CAP_IPC_LOCK))
+ if (!can_do_hugetlb_shm())
return ERR_PTR(-EPERM);
if (!is_hugepage_mem_enough(size))
diff -puN include/linux/hugetlb.h~hugetlb_shm_group-sysctl-patch include/linux/hugetlb.h
--- 25/include/linux/hugetlb.h~hugetlb_shm_group-sysctl-patch 2004-05-10 04:48:58.628456408 -0700
+++ 25-akpm/include/linux/hugetlb.h 2004-05-10 04:48:58.641454432 -0700
@@ -32,6 +32,7 @@ void free_huge_page(struct page *);
extern unsigned long max_huge_pages;
extern const unsigned long hugetlb_zero, hugetlb_infinity;
+extern int sysctl_hugetlb_shm_group;
static inline void
mark_mm_hugetlb(struct mm_struct *mm, struct vm_area_struct *vma)
diff -puN include/linux/sysctl.h~hugetlb_shm_group-sysctl-patch include/linux/sysctl.h
--- 25/include/linux/sysctl.h~hugetlb_shm_group-sysctl-patch 2004-05-10 04:48:58.630456104 -0700
+++ 25-akpm/include/linux/sysctl.h 2004-05-10 04:48:58.643454128 -0700
@@ -163,6 +163,7 @@ enum
VM_MAX_MAP_COUNT=22, /* int: Maximum number of mmaps/address-space */
VM_LAPTOP_MODE=23, /* vm laptop mode */
VM_BLOCK_DUMP=24, /* block dump mode */
+ VM_HUGETLB_GROUP=25, /* permitted hugetlb group */
};
diff -puN kernel/sysctl.c~hugetlb_shm_group-sysctl-patch kernel/sysctl.c
--- 25/kernel/sysctl.c~hugetlb_shm_group-sysctl-patch 2004-05-10 04:48:58.632455800 -0700
+++ 25-akpm/kernel/sysctl.c 2004-05-10 04:48:58.645453824 -0700
@@ -738,6 +738,14 @@ static ctl_table vm_table[] = {
.extra1 = (void *)&hugetlb_zero,
.extra2 = (void *)&hugetlb_infinity,
},
+ {
+ .ctl_name = VM_HUGETLB_GROUP,
+ .procname = "hugetlb_shm_group",
+ .data = &sysctl_hugetlb_shm_group,
+ .maxlen = sizeof(gid_t),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#endif
{
.ctl_name = VM_LOWER_ZONE_PROTECTION,
_
Please, spare me the hyperbole.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/