Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache
From: Hyeonggon Yoo
Date: Wed Sep 22 2021 - 04:32:37 EST
Hello Matthew.
There's good news.
in v3 (work in progress now), I fixed some bugs (I hate kernel panics!)
And for test, made NAPI use it. it works pretty well.
On Tue, Sep 21, 2021 at 05:17:02PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 21, 2021 at 03:42:39PM +0000, Hyeonggon Yoo wrote:
> > > > + /* slowpath */
> > > > + cache->size = kmem_cache_alloc_bulk(s, gfpflags,
> > > > + KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue);
> > >
> > > Go back to the Bonwick paper and look at the magazine section again.
> > > You have to allocate _half_ the size of the queue, otherwise you get
> > > into pathological situations where you start to free and allocate
> > > every time.
> >
> > I want to ask you where idea of allocating 'half' the size of queue came from.
> > the paper you sent does not work with single queue(magazine). Instead,
> > it manages pool of magazines.
> >
> > And after reading the paper, I see managing pool of magazine (where M is
> > an boot parameter) is valid approach to reduce hitting slowpath.
>
> Bonwick uses two magazines per cpu; if both are empty, one is replaced
> with a full one. If both are full, one is replaced with an empty one.
> Our slab implementation doesn't provide magazine allocation, but it does
> provide bulk allocation.
> So translating the Bonwick implementation to
> our implementation, we need to bulk-allocate or bulk-free half of the
> array at any time.
Is there a reason that the number should be 'half'?
what about something like this:
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 884d3311cd8e..f32736302d53 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -455,12 +455,13 @@ void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags)
}
cache = get_cpu_ptr(s->cache);
- if (cache->size) /* fastpath without lock */
+ if (cache->size) /* fastpath without lock */
p = cache->queue[--cache->size];
else {
/* slowpath */
- cache->size = kmem_cache_alloc_bulk(s, gfpflags,
- KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue);
+ cache->size += kmem_cache_alloc_bulk(s, gfpflags,
+ KMEM_LOCKLESS_CACHE_BATCHCOUNT,
+ cache->queue);
if (cache->size)
p = cache->queue[--cache->size];
else
@@ -491,13 +492,13 @@ void kmem_cache_free_cached(struct kmem_cache *s, void *p)
cache = get_cpu_ptr(s->cache);
if (cache->size < KMEM_LOCKLESS_CACHE_QUEUE_SIZE) {
cache->queue[cache->size++] = p;
- put_cpu_ptr(s->cache);
- return ;
+ } else {
+ kmem_cache_free_bulk(s,
+ KMEM_LOCKLESS_CACHE_BATCHCOUNT,
+ cache->queue - KMEM_LOCKLESS_CACHE_BATCHCOUNT);
+ cache->size -= KMEM_LOCKLESS_CACHE_BATCHCOUNT;
}
put_cpu_ptr(s->cache);
-
- /* Is there better way to do this? */
- kmem_cache_free(s, p);
}
EXPORT_SYMBOL(kmem_cache_free_cached);