Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting

From: Rasmus Villemoes
Date: Thu Oct 07 2021 - 04:47:49 EST


On 07/10/2021 10.10, Michal Hocko wrote:
> On Wed 06-10-21 11:18:31, Suren Baghdasaryan wrote:
>> On Wed, Oct 6, 2021 at 10:58 AM Pavel Machek <pavel@xxxxxx> wrote:
> [...]
>>> That "central facility" option can be as simple as "mkdir
>>> /somewhere/sanitized_id", using inode numbers for example. You don't
>>> really need IPC.
>>
>> Hmm, so the suggestion is to have some directory which contains files
>> representing IDs, each containing the string name of the associated
>> vma? Then let's say we are creating a new VMA and want to name it. We
>> would have to scan that directory, check all files and see if any of
>> them contain the name we want to reuse the same ID.
>
> I believe Pavel meant something as simple as
> $ YOUR_FILE=$YOUR_IDS_DIR/my_string_name
> $ touch $YOUR_FILE
> $ stat -c %i $YOUR_FILE

So in terms of syscall overhead, that would be open(..., O_CREAT |
O_CLOEXEC), fstat(), close() - or one could optimistically start by
doing a single lstat() if it is normal that the name is already created
(which I assume).

As for the consumer, one can't directly map an inode number to a dentry,
but whoever first creates the name->id mapping could also be responsible
for doing a "sprintf(buf, "/id/to/name/%016lx", id); symlink(name,
buf)". And if one did the optimistic lstat() succesfully, one would know
that someone else created the file and thus the symlink. And since the
operations are idempotent, the obvious races are irrelevant.

Then the consumer would only need to do a readlink() to get the name.
But that would only be for presentation to a human. Internally all the
aggregation based on the type of anon mem the tool might as well do in
terms of the integer id.

> YOUR_IDS_DIR can live on a tmpfs and you can even implement a policy on
> top of that (who can generate new ids, gurantee uniqness etc...).
>
> The above is certainly not for free of course but if you really need a
> system wide consistency when using names then you need some sort of
> central authority. How you implement that is not all that important
> but I do not think we want to handle that in the kernel.

IDK. If the whole thing could be put behind a CONFIG_ knob, with _zero_
overhead when not enabled (and I'm a bit worried about all the functions
that grow an extra argument that gets passed around), I don't mind the
string interface. But I don't really have a say either way.

Rasmus