Re: [PATCH 00/16] f2fs: introduce flash-friendly file system

From: Theodore Ts'o
Date: Wed Oct 10 2012 - 04:16:07 EST


On Tue, Oct 09, 2012 at 01:01:24PM +0200, LukÃÅ Czerner wrote:
> Do not get me wrong, I do not think it is worth to wait for vendors
> to come to their senses, but it is worth constantly reminding that
> we *need* this kind of information and those heuristics are not
> feasible in the long run anyway.

A number of us has been telling flash vendors exactly this. The
technical people do seem to understand. It's management who seem to
be primarily clueless, even though this information can be extracted
by employing timing attacks on the media. I've pointed this out
before, and the technical people agree that trying to keep this
information as a "trade secret" is pointless, stupid, and
counterproductive. Trying to get the pointy-haired bosses to
understand may take quite a while.

That being said, in many cases, it doesn't really matter. For
example, if a manufacturer has a production run of a million Android
mobile devices, (a) all of the eMMC devices will be the same (or at
least come from a handful of suppliers in the worst case), and (b) the
menufacturers *will* be able to get this information under NDA, and so
they can just feed it straight to the mkfs program. There's no need
in many cases to have mkfs burn write cycles carrying out a timing
attack on which flash device that it is formatting.


My concern is a different one. We shouldn't just be focusing on
sqlite performance assuming that its characteristics are fixed, to the
point where it drives file system design and benchmarking. Currently
sqllite does a lot of pointless writes at every single transaction
boundary which could be optimized if you relax the design constraint
that the database has to be in a single file --- something which is a
nice-to-have for some applications, but which really doesn't matter in
an embedded/mobile handset use case.

It may very well be that f2fs is still going to be better since it is
trying to minimize the number of erase blocks that are "open" for
writing at one time. And even if eMMC devices become more
intelligent, optimizing for erase blocks is still a good thing
(although it may not result in as spectacular wins on flash devices
with more sophisticated FTL's.).

However, it may also be that we'll be able to teach some existing file
systme how to be more intelligent about optimizing for erase blocks
that could be made production stable faster. (I have some ideas of
how to do this for ext4.)

But the point I'm trying to drive home here is that we shouldn't
assume that the only thing we can do is do optimize the file system.
Given the amount of time it takes to test, performance tune, and
confidence that the file system is sound and stable (look at how long
btrfs has taken to mature), it is likely that both flash technology
and workload characteristics will change before f2fs is fully mature
--- and this is no slight on the good work Jaegeuk and his team have
done.

Long experience with file systems show us that they are like fine
wine; they take time to mature. Whether you're talking about
ext2/3/4, btrfs, Sun's ZFS, Digital's ADVFS, IBM's JFS or GPFS etc.,
and whether you're talking about file systems developed using open
source or more traditional corporate development processes, it takes a
minimum of 3-5 years and 50-200 PY's of effort to create a fully
production-ready file system from scratch (and some of the people
which I surveyed for the Nxxt Generation File System task force, some
of which had decades of experience creating and working with file
systems, thought the 50-75 Person-Year estimate was a lowball --- note
that Sun's ZFS took *seven* years to develop, even with a generously
staffed team.)

As an open source example, the NGFS system task force, decided to
claim, in its November 2007 report-out, that btrfs would be ready for
community distro's in two years, since otherwise the managers and
other folks who control corporate budgets at the companies involved
would be scared off and decide not to fund the project. And yet here
we are in 2012, five years later, and we're just starting to see btrfs
support show up in community distro's as a supported option, and I
don't think most people would claim it is ready for production use in
enterprise distro's yet.

Given that, we might as well make sure we can do what we can to
optimize performance up and down the storage stack --- not just at the
file system level, but also by optimizing sqlite for embedded/handset
use cases.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/