Re: [PATCH 7/8] NTB: perf: Fix race condition when run with ntb_test

From: Logan Gunthorpe
Date: Fri Jun 15 2018 - 16:00:50 EST




On 15/06/18 01:51 PM, Serge Semin wrote:
> On Fri, Jun 08, 2018 at 06:08:18PM -0600, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>> When running ntb_test, the script tries to run the ntb_perf test
>> immediately after probing the modules. Since adding multi-port support,
>> this fails seeing the new initialization procedure in ntb_perf
>> can not complete instantly.
>>
>> To fix this we add a completion which is waited on when a test is
>> started. In this way, run can be written any time after the module is
>> loaded and it will wait for the initialization to complete instead of
>> sending an error.
>>
>
> Hmm, this behavior is the feature of the driver and isn't a bug or race to be
> fixed. ntb_perf driver returns -ENOLINK until the link is actually established,
> when the memory windows are properly initialized so the test can be performed.
> What do you think of leaving the algorithm as is, but instead to develop
> the polling scheme in the ntb_test.sh script and break the script execution if
> the link isn't established after sometime? At least we won't need to wait forever
> in case if the peer hanged up or crashed while the NTB link negotiation algorithm
> was in-progress.

I think polling is really ugly and doesn't really address solve the
issue of waiting forever. It's pretty easy to interrupt out of the wait
and provides a much better clue to whats going on than an error.

If we want to be more explicit, it would be pretty easy to start a timer
in the bash script and use SIGALRM to exit if the link doesn't come up
after 30 seconds or something.

Logan