RE: [RFC PATCH 0/3] Implement getcpu_cache system call

From: Seymour, Shane M
Date: Mon Jan 11 2016 - 18:16:28 EST


Ignore my email I'd overlooked one bit of code misunderstood how it worked.

-----Original Message-----
From: linux-api-owner@xxxxxxxxxxxxxxx [mailto:linux-api-owner@xxxxxxxxxxxxxxx] On Behalf Of Seymour, Shane M
Sent: Tuesday, January 12, 2016 9:38 AM
To: Mathieu Desnoyers; Thomas Gleixner; Paul Turner; Andrew Hunter; Peter Zijlstra
Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-api@xxxxxxxxxxxxxxx; Andy Lutomirski; Andi Kleen; Dave Watson; Chris Lameter; Ingo Molnar; Ben Maurer; Steven Rostedt; Paul E. McKenney; Josh Triplett; Linus Torvalds; Andrew Morton; Russell King; Catalin Marinas; Will Deacon; Michael Kerrisk
Subject: RE: [RFC PATCH 0/3] Implement getcpu_cache system call

Hi Mathieu,

I have some concerns and suggestions for you about this.

What's to stop someone in user space from requesting an arbitrarily large number of CPU # cache locations that the kernel needs to allocate memory to track and each time the task migrates to a new CPU it needs to update them all? Could you use it to dramatically slow down a system/task switching? Should there be a ulimit type value or a sysctl setting to limit the number that you're allowed to register per-task?

If you can just register consecutive addresses each 4 bytes apart the size of the structure required to keep track of it in the kernel looks to be 20 or 24 bytes long depending on kernel bitness (using kmalloc it should be 32 bytes allocated either way) so if you do something like tie up 1GiB of memory in user space registered with CPU cache locations it will tie up >8GiB of memory in the kernel and there will be a huge linked list that will take significant amounts of time to traverse. You could use it as a local denial of service attack to try and soak up memory and cause a kernel OOM because the kernel needs more memory to keep track of the request compared to the size of the memory used by user space to create it. There doesn't currently appear to be any upper bounds on the number that can be registered.

In terms of tracking what it's doing would you consider some sysfs attribute files (or something in debugfs) that tracked (these would all be in the add path so it shouldn't be performance sensitive):

1) The largest number of entries someone has created in the list in any task
2) The number of times (assuming you implement an upper bound on the number allowed) the upper bound is being hit (to allow someone to monitor for issues where the upper bound is being hit)

Assuming that something (e.g. glibc) is willing to register and make an entry available for the life of the task consider allowing one flag, for example, GETCPU_CACHE_PERSISTENT with GETCPU_CACHE_CMD_REGISTER and have a new command GETCPU_CACHE_CMD_GET_ PERSISTENT to allow someone to ask for the user space address of an entry that something has guaranteed will be there until the task ends. If none exist they can fall back and allocate a new one - it allows for better reuse of an existing resource but log a warning if you're forced to remove a persistent entry or someone attempts to unregister it (which should always fail) since someone in user space will have broken their promise that it will always be there until the task ends (that means persistent ones should be left to be torn down by the kernel not unregistered from user space). If you do this you might need to optimize the find process so it's more likely the first persistent one will be the first one found if you think someone is more likely to take the approach of asking for that first and falling back to creating a new one if there isn't already one present. Having this will also tend to limit the number of these that anyone will need to create if most libraries ask for a persistent entry first.

Thanks
Shane
ï{.nï+ïïïïïïï+%ïïlzwmïïbïëïïrïïzXïïïï)ïïïw*jgïïïïïïïïÝj/ïïïzïÞïï2ïÞïïï&ï)ßïaïïïïïGïïïhïïj:+vïïïwïÙ