Re: [lustre-devel] [PATCH 02/16] staging: lustre: replace simple cases of l_wait_event() with wait_event().

From: NeilBrown
Date: Tue Dec 19 2017 - 15:49:51 EST


On Tue, Dec 19 2017, Dilger, Andreas wrote:

> On Dec 18, 2017, at 11:03, Patrick Farrell <paf@xxxxxxxx> wrote:
>>
>> The wait calls in ll_statahead_thread are done in a service thread, and
>> should probably *not* contribute to load.
>>
>> The one in osc_extent_wait is perhaps tough - It is called both from user
>> threads & daemon threads depending on the situation. The effect of adding
>> that to load average could be significant for some activities, even when
>> no user threads are busy. Thoughts from other Lustre people would be
>> welcome here.
>
> The main reasons we started using l_wait_event() were:
> - it is used by the request handling threads, and wait_event() caused the
> load average to always be == number of service threads, which was
> wrong if those threads were idle waiting for requests to arrive.
> That is mostly a server problem, but a couple of request handlers are
> on the client also (DLM lock cancellation threads, etc.) that shouldn't
> contribute to load. It looks like there is a better solution for this
> today with TASK_IDLE.
> - we want the userspace threads to be interruptible if the server is not
> functional, but the client should at least get a chance to complete the
> RPC if the server is just busy. Since Lustre needs to work in systems
> with 10,000+ clients pounding a server, the server response time is not
> necessarily fast. The l_wait_event() behavior is that it blocks signals
> until the RPC timeout, which will normally succeed, but after the timeout
> the signals are unblocked and the user thread can be interrupted if the
> user wants, but it will keep waiting for the RPC to finish if not. This
> is half-way between NFS soft and hard mounts. I don't think there is an
> equivalent wait_event_* for that behaviour.

Thanks for the historical background, it can often help to understand
why the code is the way it is!

Thanks particularly for that second point. I missed it in the code, and
skimmed over the explanatory comment too quickly (I'm afraid I don't
trust comments very much).
I would implement this two-stage wait by first calling
swait_event_idle_timeout(),
then if that times out,
swait_event_interruptible()

rather than trying to combine then both into one super-macro. At least,
that is my thought before I try to write the code - maybe I'll change my
mind.

Anyway, it is clear now that this l_wait_event() series needs to be
rewritten now that I have a better understanding.

Thanks,
NeilBrown


>
> Cheers, Andreas
>
>> Similar issues for osc_object_invalidate.
>>
>> (If no one else speaks up, my vote is no contribution to load for those
>> OSC waits.)
>>
>> Otherwise this one looks good...
>>
>> On 12/18/17, 1:17 AM, "lustre-devel on behalf of NeilBrown"
>> <lustre-devel-bounces@xxxxxxxxxxxxxxxx on behalf of neilb@xxxxxxxx> wrote:
>>
>>> @@ -968,7 +964,6 @@ static int ll_statahead_thread(void *arg)
>>> int first = 0;
>>> int rc = 0;
>>> struct md_op_data *op_data;
>>> - struct l_wait_info lwi = { 0 };
>>> sai = ll_sai_get(dir);
>>> sa_thread = &sai->sai_thread;
>>> @@ -1069,12 +1064,11 @@ static int ll_statahead_thread(void *arg)
>>> /* wait for spare statahead window */
>>> do {
>>> - l_wait_event(sa_thread->t_ctl_waitq,
>>> - !sa_sent_full(sai) ||
>>> - sa_has_callback(sai) ||
>>> - !list_empty(&sai->sai_agls) ||
>>> - !thread_is_running(sa_thread),
>>> - &lwi);
>>> + wait_event(sa_thread->t_ctl_waitq,
>>> + !sa_sent_full(sai) ||
>>> + sa_has_callback(sai) ||
>>> + !list_empty(&sai->sai_agls) ||
>>> + !thread_is_running(sa_thread));
>>> sa_handle_callback(sai);
>>> spin_lock(&lli->lli_agl_lock);
>>> @@ -1128,11 +1122,10 @@ static int ll_statahead_thread(void *arg)
>>> * for file release to stop me.
>>> */
>>> while (thread_is_running(sa_thread)) {
>>> - l_wait_event(sa_thread->t_ctl_waitq,
>>> - sa_has_callback(sai) ||
>>> - !agl_list_empty(sai) ||
>>> - !thread_is_running(sa_thread),
>>> - &lwi);
>>> + wait_event(sa_thread->t_ctl_waitq,
>>> + sa_has_callback(sai) ||
>>> + !agl_list_empty(sai) ||
>>> + !thread_is_running(sa_thread));
>>> sa_handle_callback(sai);
>>> }
>>> @@ -1145,9 +1138,8 @@ static int ll_statahead_thread(void *arg)
>>> CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
>>> sai, (unsigned int)agl_thread->t_pid);
>>> - l_wait_event(agl_thread->t_ctl_waitq,
>>> - thread_is_stopped(agl_thread),
>>> - &lwi);
>>> + wait_event(agl_thread->t_ctl_waitq,
>>> + thread_is_stopped(agl_thread));
>>> } else {
>>> /* Set agl_thread flags anyway. */
>>> thread_set_flags(agl_thread, SVC_STOPPED);
>>> @@ -1159,8 +1151,8 @@ static int ll_statahead_thread(void *arg)
>>> */
>>> while (sai->sai_sent != sai->sai_replied) {
>>> /* in case we're not woken up, timeout wait */
>>> - lwi = LWI_TIMEOUT(msecs_to_jiffies(MSEC_PER_SEC >> 3),
>>> - NULL, NULL);
>>> + struct l_wait_info lwi = LWI_TIMEOUT(msecs_to_jiffies(MSEC_PER_SEC >>
>>> 3),
>>> + NULL, NULL);
>>> l_wait_event(sa_thread->t_ctl_waitq,
>>> sai->sai_sent == sai->sai_replied, &lwi);
>>> }
>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation

Attachment: signature.asc
Description: PGP signature