Re: [RFC] Tux3 for review

From: Daniel Phillips
Date: Thu Jun 19 2014 - 17:58:38 EST


On Thursday, June 19, 2014 2:26:48 AM PDT, LukÃÅ Czerner wrote:
On Thu, 19 Jun 2014, Pavel Machek wrote:

Date: Thu, 19 Jun 2014 10:21:29 +0200
From: Pavel Machek <pavel@xxxxxx>
To: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
Cc: Daniel Phillips <daniel@xxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>,
linux-kernel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx,
...

Flamed ? really ?

Yes, really. There were valid points and there were also unabashed flames. The latter are not helpful to anybody, even the flamer. But note that there were no counter flames. The boy scout rule applies: always leave your campsite cleaner than you found it.

Dave pointed out some serious coding style problems.
Those should be very easy to fix.

One needs to be careful about the definition of "fix" so that it does not turn into "throw the baby out with the bath water". Our kernel code necessarily has a few __KERNEL__ #ifdefs because the majority of it also runs in user space. This not a feature to disparage, far from it.

Among other benefits, running in user space supports automated unit testing at fine granularity. We run make tests as a habit to catch a wide spectrum of correctness regressions. A successful make tests usually indicates that the heavyweight kernel stress tests are going to pass. Obviously, there are occasional exceptions to this. For example user space does not catch SMP races. In practice, only a handful of those have slipped through and required kernel level bug chasing.

That said, we will will happily merge any concrete suggestion that reduces the frequency of __KERNEL__. But please be realistic. There are 32 __KERNEL__ ifdefs in our 18K line code base. That hardly amounts to a "dog's breakfast".

Let me remind you some more important problems Dave brought up,
including page forking:

"
The hacks around VFS and MM functionality need to have demonstrated
methods for being removed.

We already removed 450 lines of core kernel workarounds from Tux3 with an approach that was literally cut and pasted from one of Dave's emails. Then Dave changed his mind. Now the Tux3 team has been assigned a research project to improve core kernel writeback instead of simply adapting the approach that is already proven to work well enough. That is a rather blatant example of "perfect is the enemy of good enough". Please read the thread.

We're not going to merge that page
forking stuff (like you were told at LSF 2013 more than a year ago:
http://lwn.net/Articles/548091/) without rigorous design review and
a demonstration of the solutions to all the hard corner cases it
has. The current code doesn't solve them (e.g. direct IO doesn't
work in tux3), and there's no clear patch set we can review that
demonstrates how it is all supposed to work. i.e. you need to
separate out all the page forking code into a separate patchset for
review, independent of the tux3 code and applies to the core mm/
code.
"

Direct IO is a spurious issue. To recap: direct IO does not introduce any new page forking issues. All of the page forking issues already exist with normal buffered IO and mmap. We have little interest and scant available time for heading off on a tangent to implement direct IO at this point just as a precondition for merging.

On the other hand, page forking itself has a number of interesting issues. Hirofumi is currently preparing a set of core kernel patches for review. These patches explicitly do not attempt to package page forking up into a nice and easy API that other filesystems could patch in tomorrow. That would be an unreasonable research burden on our small development team. Instead, we show how it works in Tux3, and if other filesystems want to get those benefits, they can make similar changes. If we (the kernel community) are lucky enough to find a pattern in it such that substantial parts of the code can be abstracted into a library, then good. But requiring such a library to be developed as a precondition to merging Tux3 is unreasonable.

"
Then there's all the writeback hacks. You've simply copy-n-pasted
most of fs-writeback.c, including duplicating structures like struct
wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
able to access core structures from kernel module context
(tux3_setup_writeback(), I'm looking at you). This is completely
unacceptible for a merge. Again, you need to separate out all the
writeback changes you need into an independent patchset so that they
can be reviewed independently of the tux3 code that uses it.
"

That was already fixed as noted above, and all the relevant changes were already posted as an independent patch set. After that, some developers weighed in with half formed ideas about how the same thing could be done better, but without concrete suggestions. There is nothing wrong with half formed ideas, except when they turn into a way of blocking forward progress. See "perfect is the enemy of good enough" above.

It is worth noting that we (the kernel community) have been thrashing away at the writeback problem for more than twenty years, and the current solution still leaves much to be desired. It is unfair to expect us, the Tux3 team, to fix that mess in a week or two, just to merge our filesystem. We prefer to adapt the existing infrastructure for now, as expressed in the currently proposed patch set. With that, we allow core to mark our inodes dirty just as it has always done, and we continue to use the usual inode writeback lists for writeback sheduling, which work just fine.

Regards,

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/