Hello,If it's a single core system, and the kernel is configured with PREEMPT_NONE, I could see that being possible, but I think Debian (assuming you're using Debian because the /proc/version info that you posted in the original e-mail indicated the kernel was a Debian build) builds with PREEMPT_VOLUNTARY by default. I have little knowledge of the internal workings of the kernel's NFS client implementation, so I'm Cc'ing the associated mailing list.
Okay, i was wrong with FUSE and NFS thanks for the hint.
About the Problem:
Without digging deep into the kernel sources, your explaination is
more or less that was i thinking about whats happening.
Anyways, the reason why i report the Problem is that during this 120
Seconds (until the Kernel solves this issue by killing (?) the
process) the system is unusable.
What i mean about it:
Its not even possible to ssh on the server, even if /root and /home is
local and should not be affected by the slow NFS Servers.
Also it seems during this period a lot of network connections drop/freeze(?).
Youre completly right when you says, theres no other way/its by design
to wait for the NFS-Response. But in my point of view this 'wait' is
happening on the wrong security level. If im not wrong the current
implementation blocks/hangs tasks in kernelspace, or at least blocks
the scheduler during this period.
2015-10-01 16:24 GMT+02:00 Austin S Hemmelgarn <ahferroin7@xxxxxxxxx>:
On 2015-10-01 09:06, sascha a. wrote:
Hello,
I want to report a Bug with NFS / FuseFS.
Theres trouble with mounting a NFS FS with FuseFS, if the NFS Server
is slowly responding.
The problem occurs, if you mount a NFS FS with FuseFS driver for
example with this command:
mount -t nfs -o vers=3,nfsvers=3,hard,intr,tcp server /dest
Working on this nfs overlay works like a charm, as long as the NFS
Server is not under heavy load. If it gets under HEAVY load from time
to time the kernel hangs (which should in my opinion never ever
occur).
OK, before I start on an explanation of why what is happening is happening,
I should note that unless you're using some special FUSE driver instead of
the regular NFS tools, you're not using FUSE to mount the NFS share, you're
using a regular kernel driver.
Now, on to the explanation:
This behavior is expected and unavoidable for any network filesystem under
the described conditions. Sync (or any other command that causes access to
the filesystem that isn't served by the local cache) requires sending a
command to the server. Sync in particular is _synchronous_ (and it should
be, otherwise you break the implied data safety from using it), which means
that it will wait until it gets a reply from the server before it returns,
which means that if the server is heavily loaded (or just ridiculously
slow), it will be a while before it returns. On top of this, depending on
how the server is caching data, it may take a long time to return even on a
really fast server with no other load.
The stacktrace you posted indicates simply that the kernel noticed that
'sync' was in an I/O sleep state (the 'D state' it refers to) for more than
120 seconds, which is the default detection timeout for this.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature