Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting

Next message: Rob Herring: "Re: [PATCH 0/3] SC7180 MDSS core reset"
Previous message: Eric Biggers: "Re: [PATCH v13 12/12] pkcs7: Add ML-DSA FIPS selftest"
In reply to: Jens Axboe: "Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting"
Next in thread: Jens Axboe: "Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Pavel Begunkov

Date: Tue Jan 20 2026 - 16:45:13 EST

On 1/20/26 17:03, Jens Axboe wrote:

On 1/20/26 5:05 AM, Pavel Begunkov wrote:

On 1/20/26 07:05, Yuhao Jiang wrote:

...

I've been implementing the xarray-based ref tracking approach for v3.
While working on it, I discovered an issue with buffer cloning.

If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2.
Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero
and unaccount, so we double-unaccount and user->locked_vm goes negative.

The per-context xarray can't coordinate across clones - each context
tracks its own refcount independently. I think we either need a global
xarray (shared across all contexts), or just go back to v2. What do
you think?

The Jens' diff is functionally equivalent to your v1 and has
exactly same problems. Global tracking won't work well.

Why not? My thinking was that we just use xa_lock() for this, with
a global xarray. It's not like register+unregister is a high frequency
thing. And if they are, then we've got much bigger problems than the
single lock as the runtime complexity isn't ideal.

1. There could be quite a lot of entries even for a single ring
with realistic amount of memory. If lots of threads start up
at the same time taking it in a loop, it might become a chocking
point for large systems. Should be even more spectacular for
some numa setups.

2. Most likely it'll further relax accounting (i.e. one way
road), and I don't believe that's the right thing. Could even
be unexpected if consolidated w/o any explicit communication
b/w rings (like buffer cloning).

3. Map keys will need to be {page, user, mm}, so I suspect
impl is not going to be exactly trivial either way. Maybe some
nested xarrays + something for counting middle layer entries.

--
Pavel Begunkov