Re: [PATCH v2 0/6] alloc_tag: introduce IOCTL-based filtering for MAP

From: Abhishek Bapat

Date: Thu Jun 04 2026 - 14:28:38 EST


On Wed, Jun 3, 2026 at 12:51 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Mon, May 25, 2026 at 12:33 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
> >
> > Hi Andrew and Suren
> >
> >
> > On 2026/5/23 04:11, Andrew Morton wrote:
> > > On Fri, 22 May 2026 17:45:32 +0000 Abhishek Bapat <abhishekbapat@xxxxxxxxxx> wrote:
> > >
> > >> Currently, memory allocation profiling data is primarily exposed through
> > >> /proc/allocinfo. While useful for manual inspection, this text-based
> > >> interface poses challenges for production monitoring and large-scale
> > >> analysis:
> > >>
> > >> 1. Userspace must parse large amounts of text to extract specific
> > >> fields.
> > >> 2. To find specific tags, userspace must read the entire dataset,
> > >> requiring many context switches and high data copying.
> > >> 3. The kernel currently aggregates per-CPU counters for every allocation
> > >> size, even those the user intends to filter out immediately.
> > >>
> > >> This series introduces a new IOCTL-based binary interface for allocinfo
> > >> that supports kernel-side filtering. By allowing the user to specify a
> > >> filter mask, we significantly reduce the work performed in-kernel and
> > >> the amount of data transferred to userspace.
> > >>
> > >> Performance measurements were conducted on an Intel Xeon Platinum 8481C
> > >> (224 CPUs) with caches dropped before each run.
> > >>
> > >> The IOCTL mechanism shows a ~20x performance improvement for
> > >> filtered queries. The kernel avoids the expensive per-CPU counter
> > >> aggregation (alloc_tag_read) for any tags that fail the initial string
> > >> or location filters.
> > >>
> > >> Scenario 1: Specific File Filtering (arch/x86/events/rapl.c)
> > >> 1. Traditional (cat /proc/allocinfo | grep): 22ms (sys)
> > >> 2. IOCTL Interface: 1ms (sys)
> > >>
> > >> Scenario 2: Compound Filtering (Filename + Size)
> > >> 1. Traditional: (cat ... | grep | awk): 21ms (sys)
> > >> 2. IOCTL Interface: 1ms (sys)
> > >>
> > >> Scenario 3: Size-Based Filtering (min_size = 1MB)
> > >> 1. Traditional: (cat ... | awk): 21ms (sys)
> > >> 2. IOCTL Interface: 14ms (sys)
> > > Yup, textual interfaces aren't fast.
> > >
> > > And ioctl-baed interfaces aren't popular. One would prefer to see an
> > > interface which uses read()/lseek(), pread(), etc. It would be
> > > appropriate for this [0/N] to have a discussion of why that approach
> > > was not chosen.
> > >
> > >> .../userspace-api/ioctl/ioctl-number.rst | 2 +
> > >> MAINTAINERS | 2 +
> > >> include/linux/codetag.h | 1 +
> > >> include/uapi/linux/alloc_tag.h | 87 +++
> > >> lib/alloc_tag.c | 303 ++++++++++-
> > >> lib/codetag.c | 11 +
> > >> tools/testing/selftests/alloc_tag/Makefile | 9 +
> > >> .../alloc_tag/allocinfo_ioctl_test.c | 505 ++++++++++++++++++
> > >> 8 files changed, 918 insertions(+), 2 deletions(-)
> > >> create mode 100644 include/uapi/linux/alloc_tag.h
> > >> create mode 100644 tools/testing/selftests/alloc_tag/Makefile
> > >> create mode 100644 tools/testing/selftests/alloc_tag/allocinfo_ioctl_test.c
> > > At some point this should grow user-facing documentation, please.
> > >
> > > And the right time for that is now, because such documentation is
> > > useful for code review - it makes that review both easier and more
> > > useful.
> > >
> > > Sashiko had a few things to say:
> > >
> > > https://sashiko.dev/#/patchset/cover.1779471082.git.abhishekbapat@xxxxxxxxxx
> >
> > I notice that Sashiko has reported a pre-existing issue, as described below:
> >
> >
> > > static void *allocinfo_start(struct seq_file *m, loff_t *pos)
> > This is a pre-existing issue, but can resuming a sequential read on
> > /proc/allocinfo cause a use-after-free if a kernel module is unloaded
> > between read() system calls?
> > The seq_file read operation updates priv->iter.ct during allocinfo_next(),
> > stops iteration, and returns to userspace. If the module containing
> > priv->iter.ct is unloaded while the lock is dropped, the module's codetag
> > memory is freed.
> > On the next read() system call, allocinfo_start() with pos > 0 reacquires
> > the lock but returns priv without validating if priv->iter.ct still belongs
> > to a valid module. Does allocinfo_show() then dereference this dangling
> > pointer?
> > [ ... ]
> >
> > This issue is unrelated to the current patch series and can be resolved
> >
> > by reverting commit 9f44df50fee4.
> >
> > Therefore, I have submitted a separate patch addressing this issue,
> >
> > which is available at the link below:
> >
> > https://lore.kernel.org/all/20260525072117.112779-1-hao.ge@xxxxxxxxx/
>
> Thanks Hao! I commented on your patch, please take a look. I think
> there is a better fix.
>
> >
> > Thanks
> >
> > Best Regards
> >
> > Hao
> >

All, just wanted to acknowledge that I've gone through the comments
and will be sending out a v3 patchset addressing them. Thanks for the
reviews!