Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage

From: Krzysztof Kozlowski
Date: Fri Jun 15 2018 - 10:07:09 EST


On Fri, Jun 15, 2018 at 2:53 PM, Sudeep Holla <sudeep.holla@xxxxxxx> wrote:
> Hi,
>
> On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski <krzk@xxxxxxxxxx> wrote:
>> Hi,
>>
>> When booting my boards under recent linux-next, I see failures of systemd:
>>
>> [FAILED] Failed to start Flush Journal to Persistent Storage.
>> See 'systemctl status systemd-journal-flush.service' for details.
>> Starting Create Volatile Files and Directories...
>> [** ] A start job is running for Create Vâ [ 223.209289] nfs:
>> server 192.168.1.10 not responding, still trying
>> [ 223.209377] nfs: server 192.168.1.10 not responding, still trying
>>
>> Effectively the boards fails to boot. Example is here:
>> https://krzk.eu/#/builders/1/builds/2157
>>
>
> I too encountered the same issue.
>
>> This was bisected to:
>> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
>> Author: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> Date: Fri May 4 15:34:53 2018 -0400
>>
>> SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>>
>> alloc_slot is a transport-specific op, but initializing an rpc_rqst
>> is common to all transports. In addition, the only part of initial-
>> izing an rpc_rqst that needs serialization is getting a fresh XID.
>>
>> Move rpc_rqst initialization to common code in preparation for
>> adding a transport-specific alloc_slot to xprtrdma.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx>
>>
>
> Unfortunately, spent time to bisect independently without seeing this
> report and got the same culprit.
>
>>
>> Bisect log attached. Full configuration:
>> 1. exynos_defconfig
>> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and others)
>> 3. NFSv4 client (from Raspberry Pi)
>>
>
> Yes the issue is seen only with NFSv4 client and with latest systemd I think.
> My Ubuntu 16.04(32bit FS) is boots fine while 18.04 has the above issue.
> Passing nfsv3 in kernel command line makes it work again.

Thanks for reply!

I test it on systemd versions 236 and 238... and it fails on both.
However one board passes always - it is Odroid HC1 with same core
configuration as described before. Probably there is some different SW
package on it.

>> Let me know if you need any more information.
>>
>
> Also I was observing this issue with Linus master branch from
> the time the above patch was merged until today. The issue
> is no longer seen since this morning however I just enabled lockdep
> and got these messages.

All recent linux-next fail. Today's Linus' tree (4c5e8fc62d6a ("Merge
tag 'linux-kselftest-4.18-rc1-2' of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"))
managed to get up on one board but stuck on different board with the
same issue.

I am quite surprised that there is no response from the author of the
commit and this was just moved from next (while failing) to Linus'
tree... bringing the issue to mainline now.

Best regards,
Krzysztof