Re: [PATCH 5/5] Account certain kmem allocations to memcg

From: Vladimir Davydov
Date: Tue Nov 10 2015 - 03:07:53 EST


On Mon, Nov 09, 2015 at 03:39:55PM +0100, Michal Hocko wrote:
> On Sat 07-11-15 23:07:09, Vladimir Davydov wrote:
> > This patch marks those kmem allocations that are known to be easily
> > triggered from userspace as __GFP_ACCOUNT, which makes them accounted to
> > memcg. For the list, see below:
> >
> > - threadinfo
> > - task_struct
> > - task_delay_info
> > - pid
> > - cred
> > - mm_struct
> > - vm_area_struct and vm_region (nommu)
> > - anon_vma and anon_vma_chain
> > - signal_struct
> > - sighand_struct
> > - fs_struct
> > - files_struct
> > - fdtable and fdtable->full_fds_bits
> > - dentry and external_name
> > - inode for all filesystems. This is the most tedious part, because
> > most filesystems overwrite the alloc_inode method. Looks like using
> > __GFP_ACCOUNT in alloc_inode is going to become a new rule, like
> > passing SLAB_RECLAIM_ACCOUNT on inode cache creation.
>
> I am wondering whether using a helper function to allocate an inode
> cache would help in that regards. It would limit __GFP_ACCOUNT
> penetration into fs code.

I'm afraid that wouldn't free fs code from the need to use
__GFP_ACCOUNT, because there are other things that we might want to
account AFAICS, e.g. ext4_crypt_info_cachep or ext4_es_cachep.

>
> pipe buffers are trivial to abuse (e.g. via fd passing) so we want to

You might also mention allocations caused by select/poll, page tables,
radix_tree_node, etc. They all might be abused, but the primary purpose
of this patch set is not catching abusers, but providing reasonable
level of isolation for most normal workloads. Let's add everything above
that in separate patches.

> cap those as well. The following should do the trick AFAICS.

Actually, no - you only account pipe metadata while anon pipe buffer
pages, which usually constitute most of memory consumed by a pipe, still
go unaccounted. I'm planning to make pipe accountable later.

> ---
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 8865f7963700..c4b7e8c08362 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -590,7 +590,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
>
> pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL);
> if (pipe) {
> - pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL);
> + pipe->bufs = kzalloc(sizeof(struct pipe_buffer) * PIPE_DEF_BUFFERS, GFP_KERNEL | __GFP_ACCOUNT);

GFP_KERNEL | __GFP_ACCOUNT are used really often, that's why I
introduced GFP_KERNEL_ACCOUNT.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/