Re: [PATCH 1/4] net: VM deadlock avoidance framework

From: Indan Zupancic
Date: Mon Aug 28 2006 - 19:58:55 EST


On Mon, August 28, 2006 19:32, Peter Zijlstra said:
> Also, I'm really past caring what the thing
> is called ;-) But if ppl object I guess its easy enough to run yet
> another sed command over the patches.

True, same here.

>> >> You can get rid of the memalloc_reserve and vmio_request_queues variables
>> >> if you want, they aren't really needed for anything. If using them reduces
>> >> the total code size I'd keep them though.
>> >
>> > I find my version easier to read, but that might just be the way my
>> > brain works.
>>
>> Maybe true, but I believe my version is more natural in the sense that it makes
>> more clear what the code is doing. Less bookkeeping, more real work, so to speak.
>
> Ok, I'll have another look at it, perhaps my gray matter has shifted ;-)

I don't care either way, just providing an alternative. I'd compile both and see
which one is smaller.


> Ah, no accident there, I'm fully aware that there would need to be a
> spinlock in adjust_memalloc_reserve() if there were another caller.
> (I even had it there for some time) - added comment.

Good that you're aware of it. Thing is, how much sense does the split-up into
adjust_memalloc_reserve() and sk_adjust_memalloc() make at this point? Why not
merge the code of adjust_memalloc_reserve() with sk_adjust_memalloc() and only
add adjust_memalloc_reserve() when it's really needed? It saves an export.

Feedback on the 28-Aug-2006 19:24 version from
programming.kicks-ass.net/kernel-patches/vm_deadlock/current/


> +void setup_per_zone_pages_min(void)
> +{
> + static DEFINE_SPINLOCK(lock);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&lock, flags);
> + __setup_per_zone_pages_min();
> + spin_unlock_irqrestore(&lock, flags);
> +}

Better to put the lock next to min_free_kbytes, both for readability and
cache behaviour. And it satisfies the "lock data, not code" mantra.


> +static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask)
> +{
> + void * page = NULL;
> +
> + if (size > PAGE_SIZE)
> + return page;
> +
> + if (atomic_add_unless(&emergency_rx_pages_used, 1, RX_RESERVE_PAGES)) {
> + page = (void *)__get_free_page(gfp_mask);
> + if (!page) {
> + WARN_ON(1);
> + atomic_dec(&emergency_rx_pages_used);
> + }
> + }
> +
> + return page;
> +}

If you prefer to avoid cmpxchg (which is often used in atomic_add_unless
and can be expensive) then you can use something like:

static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask)
{
void * page;

if (size > PAGE_SIZE)
return NULL;

if (atomic_inc_return(&emergency_rx_pages_used) == RX_RESERVE_PAGES)
goto out;
page = (void *)__get_free_page(gfp_mask);
if (page)
return page;
WARN_ON(1);
out:
atomic_dec(&emergency_rx_pages_used);
return NULL;
}

The tiny race should be totally harmless. Both versions are a bit big
to inline though.


> @@ -195,6 +196,86 @@ __u32 sysctl_rmem_default = SK_RMEM_MAX;
> /* Maximal space eaten by iovec or ancilliary data plus some space */
> int sysctl_optmem_max = sizeof(unsigned long)*(2*UIO_MAXIOV + 512);
>
> +static DEFINE_SPINLOCK(memalloc_lock);
> +static int memalloc_reserve;
> +static unsigned int vmio_request_queues;
> +
> +atomic_t vmio_socks;
> +atomic_t emergency_rx_pages_used;
> +EXPORT_SYMBOL_GPL(vmio_socks);

Is this export needed? It's only used in net/core/skbuff.c and net/core/sock.c,
which are compiled into one module.

> +EXPORT_SYMBOL_GPL(emergency_rx_pages_used);

Same here. It's only used by code in sock.c and skbuff.c, and no external
code calls emergency_rx_alloc(), nor emergency_rx_free().

--

I think I depleted my usefulness, there isn't much left to say for me.
It's up to the big guys to decide about the merrit of this patch.
If Evgeniy's network allocator fixes all deadlocks and also has other
advantages, then great.

IMHO:

- This patch isn't really a framework, more a minimal fix for one specific,
though important problem. But it's small and doesn't have much impact
(numbers would be nice, e.g. vmlinux/modules size before and after, and
some network benchmark results).

- If Evgeniy's network allocator is as good as it looks, then why can't it
replace the existing one? Just adding private subsystem specific memory
allocators seems wrong. I might be missing the big picture, but it looks
like memory allocator things should be at least synchronized and discussed
with Christoph Lameter and his "modular slab allocator" patch.

All in all it seems it will take a while until Evgeniy's code will be merged,
so I think applying Peter's patch soonish and removing it again the moment it
becomes unnecessary is reasonable.

Greetings,

Indan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/