Re: [RFC PATCH 0/5] tasklist_lock fairness issues

From: Michel Lespinasse
Date: Sat Mar 09 2013 - 21:37:16 EST


On Sat, Mar 9, 2013 at 10:26 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> Hi Michel,
>
> Well. I can't say I really like this. 4/5 itself looks fine, but other
> complications do not look nice, at least in the long term. Imho, imho,
> I can be wrong.
>
> Everyone seem to agree that tasklist should die, this series doesn't
> even try to solve the fundamental problems with this global lock.

I admit that this series does not make any progress towards removing
tasklist_lock call sites, which has been the direction so far.

However, I think that patches 1-3 are not very complex (they add
inline wrapper functions around the tasklist_lock locking calls, but
the compiled code ends up being the same). And by showing us which
tasklist_lock call sites are taken from interrupt contexts, they show
up which places are imposing the current tasklist_lock semantics on
us:

- sending signals, with the send_sigio() / send_sigurg() / kill_pgrp()
call sites;
- posix_cpu_timer_schedule()
- sysrq debugging features

These are only 9 call sites, so they should be more easily attackable
than the full tasklist_lock removal problem. And if we could get these
few eliminated, then it would become trivial to make the remaining
tasklist_lock fair (and maybe even remove the unfair rwlock_t
implementation), which would IMO be a significant step forward.

In other words, patch 5 is, in a way, cheating by trying to get some
tasklist_lock fairness benefits right now. There is a bit of
complexity to it since it replaces tasklist_lock with a pair of locks,
one fair and one unfair. But if we don't like patch 5, patches 1-3 can
still give us a closer intermediate goal of removing much of the
tasklist_lock nastyness by eliminating only 9 of the existing locking
sites.

> However, I agree. tasklist rework can take ages, so probably we need
> to solve at least the write-starvation problem right now. Probably this
> series is correct (I didn't read it except 4/5), but too complex.

Regarding the complexity - do you mean the complexity of implementing
a fair rwlock (I would guess not, since you said 4/5 look ok) or do
you mean that you don't like having inline wrapper functions around
tasklist_lock locking sites (which is what 1-3 do) ?

> Can't we do some simple hack to avoid the user-triggered livelocks?
> Say, can't we add
>
> // Just for illustration, at least we need CONTENDED/etc
>
> void read_lock_fair(rwlock_t *lock)
> {
> while (arch_writer_active(lock))
> cpu_relax();
> read_lock(lock);
> }
>
> and then turn some of read_lock's (do_wait, setpriority, more) into
> read_lock_fair?
>
> I have to admit, I do not know how rwlock_t actually works, and it seems
> that arch_writer_active() is not trivial even on x86. __write_lock_failed()
> changes the counter back, and afaics you can't detect the writer in the
> window between addl and subl. Perhaps we can change __write_lock_failed()
> to use RW_LOCK_BIAS + HUGE_NUMBER while it spins, but I am not sure...

IMO having a separate fair rw spinlock is less trouble than trying to
build it on top of rwlock_t - especially if we're planning to remove
the tasklist_lock use of rwlock_t, which is currently the main
justification for rwlock_t's existence. I'm hoping we can eventually
get rid of rwlock_t, so I prefer not to build on top of it when we can
easily avoid doing so.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/