Re: [PATCH 0/2] memcgroup: work better with tmpfs
From: Balbir Singh
Date: Thu Dec 20 2007 - 13:14:42 EST
Hugh Dickins wrote:
> On Wed, 19 Dec 2007, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>
>> We always call mem_cgroup_isolate_pages() from shrink_(in)active_pages
>> under spin_lock_irq of the zone's lru lock. That's the reason that we
>> don't explicitly use it in the routine.
>
> Indeed, thanks.
>
>>> 2. There's mem_cgroup_charge and mem_cgroup_cache_charge (wouldn't the
>>> former be better called mem_cgroup_charge_mapped? why does the latter
>> Yes, it would be. After we've refactored the code, the new name makes sense.
>>
>>> test MEM_CGROUP_TYPE_ALL instead of MEM_CGROUP_TYPE_CACHED? I still don't
>>> understand your enums there).
>> We do that to ensure that we charge page cache pages only when the
>> accounting type is set to MEM_CGROUP_TYPE_ALL. If the type is anything
>> else, we ignore cached pages, we did not have MEM_CGROUP_TYPE_CACHED
>> initially when the patches went in.
>
> I think you've given yourself far too many degrees of freedom here.
>
> Please explain to me how the system is supposed to behave when I
> echo -n 2 >/cg/0/memory.control_type
>
> From the name in the enum (MEM_CGROUP_TYPE_CACHED, doesn't even have
> an "= 2", yet this is a part of the API?), I think it's supposed to
> account pages into and out of the page cache, but not as they're
> mapped into and out of userspace. But from the code, no references
> whatever to MEM_CGROUP_TYPE_CACHED or MEM_CGROUP_TYPE_MAPPED,
> it looks as if it'll behave just like MEM_CGROUP_TYPE_MAPPED
> (which we hope = 1).
>
> As you say, you didn't have MEM_CGROUP_TYPE_CACHED originally:
> it looks like it got added to the straightforward case of MAPPED,
> in an apparently flexible but not fully thought out way.
>
Yes, I think your argument makes sense. I had initially thought of
making the enums a mask
MEM_CGROUP_TYPE_MAPPED = 0x1,
MEM_CGROUP_TYPE_CACHED = 0x2,
MEM_CGROUP_TYPE_ALL = 0x3
and check for control_type & MEM_CGROUP_TYPE_CACHED
>> But there's only mem_cgroup_uncharge.
>>> So when, for example, an add_to_page_cache fails, the uncharge may not
>>> balance the charge?
>>>
>> We use mem_cgroup_uncharge() everywhere. The reason being, we might
>> switch control type, we uncharge pages that have a page_cgroup
>> associated with them, hence once we;ve charged, uncharge does not
>> distinguish between charge types.
>
> Ah, so this is the meaning of that
> /*
> * This can handle cases when a page is not charged at all and we
> * are switching between handling the control_type.
> */
> if (!pc)
> return;
>
> if (atomic_dec_and_test(&pc->ref_cnt)) {
> at the beginning of mem_cgroup_uncharge.
>
> Sorry, that doesn't fly. You've no locking between acquiring pc from
> the page->page_cgroup, testing pc here, and decrementing its ref_cnt.
> When that ref_cnt goes down to zero, clear_page_cgroup may NULLify
> page->page_cgroup and kfree the pc.
>
Yes, we need to lock the page group.
> So if you cannot really keep track of the ref_cnt (because of
> changing control_type), that atomic_dec_and_test is in danger
> of corrupting someone else's memory - isn't it?
>
Yes, my bad :(
> I can just about imagine that some admins will want control_type
> MAPPED and others CACHED and others MAPPED+CACHED. But is there
> actually a need for one cgroup to be controlling MAPPED while
> another on the same machine is controlling MAPPED+CACHED? And
> does that make sense - there'll be weirdnesses, won't there?
> And is there actually a need to change a cgroup's control_type
> on the fly while it's alive?
>
> Of course it's nice to be flexible and allow for such possibilities;
> but not if that just opens windows for corruption. My own view is
> that at present you should just account mapped and cached for all,
> and strip out these extra degrees of freedom: add them back in some
> future when the locking is worked out.
>
I am going to rip out the control_type interface for now. The locking
needs fixing irrespective of control_type. I am sending out a patch to
rip out control_type.
> Hugh
Thanks for your detailed comments,
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/