Re: [PATCH 04/16] mm: allow PF_MEMALLOC from softirq context
From: Mel Gorman
Date: Tue Jul 10 2012 - 07:09:34 EST
On Mon, Jul 09, 2012 at 06:57:10PM +0200, Sebastian Andrzej Siewior wrote:
> On Mon, Jul 09, 2012 at 11:04:42AM +0100, Mel Gorman wrote:
> > > - lets assume your allocation happens with kmalloc() without __GFP_MEMALLOC
> > > and current->flags has PF_MEMALLOC ORed and your SLAB pool is empty. This
> > > forces SLAB to allocate more pages from the buddy allocator with it will
> > > receive more likely (due to ->current->flags + PF_MEMALLOC) but SLAB will
> > > drop this extra memory because the page has ->pf_memory (or something like
> > > that) set and the GFP_FLAGS do not have __GFP_MEMALLOC set.
> > >
> >
> > It's recorded if the slab page was allocated from PFMEMALLOC reserves (see
> > patch 2 from the swap over NBD series). slab will use this page for objects
> > but only allocate them to callers that pass a gfp_pfmemalloc_allowed() check.
> > kmalloc() users with either __GFP_MEMALLOC or PF_MEMALLOC will get
> > the pages they need but they will not "leak" to !_GFP_MEMALLOC users as
> > that would potentially deadlock.
>
> Argh, I missed that gfp_to_alloc_flags() is not only called from
> within the buddy allocater but also from slab. So this is fine then :)
>
Good to hear. I appreciate you taking the time to give it a solid review
like this looking for holes.
> One thing:
> You only get current->flags |= PF_MEMALLOC in softirq _if_ the skb, which is
> passed to netif_receive_skb(), was allocated with __GFP_MEMALLOC. That
> means if the NIC's RX allocation did not require an allocation from the
> emergency pool (without ->pfmemalloc set) then you never use this extra
> pool, even if this skb would end up in your swap socket. Also, the other way
> around, where you allocate it from the emergency pool but it is a user
> socket and you could drop it.
>
While there is a possibility that packets may get dropped later like this,
they still get retransmitted and eventually it'll get through. This is
not optimal but optimised swap-over-network was not the primary goal of
the series, deadlock avoidance was.
> What about extending sk_set_memalloc() to record socket's ips + ports
> in a separate list so that skb_pfmemalloc_protocol() might use that
> information and decide on per-protocol basis if the skb is worth to
> spend more ressource to deliver it. That means you would enable the
> extra pool if the currently received skb is part of your swap socket and
> not if the skb was allocated from the emergency pool.
>
> That said, there is nothing wrong with the code as of now and this
> optimization could be added later (if at all).
>
I think it is a good idea but it could also be done later iff a user had
a serious problem with the performance and that this made a measurable
difference. The series is already quite complex and I'd rather not add to
that complexity without strong motivation.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/