[RFC] udftools: steps towards fsck
From: Steve Magnani
Date: Wed Mar 06 2019 - 21:44:59 EST
(Please remove at least LKML when responding. Mailing lists are a
scattershot attempt to reach others who might be interested in this
topic since I'm not aware of any linux-udf mailing list. )
A few months ago I stumbled across an interesting bit of abandonware in
the Sourceforge CVS repo that hosted UDF development through about 2004.
Code that originated here eventually became the modern-day udftools:
ÂÂÂ https://sourceforge.net/p/linux-udf/code/
The 'udf' module in that repo contains a program from 1999 named
'chkudf', which appears to have been written by Rob Simms. Being from
the Y2K era, the program has no awareness of anything beyond UDF2.01; in
particular, its comprehension of VAT reflects UDF1.50 and not the
revamped design introduced in UDF2.00. But it does have an ability to
analyze the major UDF data structures and to walk the filesystem.
I've spent quite a bit of time enhancing and fixing bugs in this code,
with a short term goal of being able to report damage to UDF2.01
filesystems on "hard disk" (magnetic and SSD) media. It's not quite to
the point of being release-ready, but I think the code is on the cusp of
becoming useful to others so I wanted to get some feedback on the approach.
I posted a GIT port (via SVN) of the CVS repo here, including all the
changes I've made so far:
ÂÂÂ https://github.com/smagnani/chkudf.git
If you're interested in building the code you should be able to just run
'make' within the chkudf folder. On Debian-derived systems you'll need
libblkid-dev installed in order to build.
Some questions for consideration:
* Would a udffsck limited to checking of UDF2.01 and earlier on "hard
disk" media be a sufficiently useful starting point to justify inclusion
in udftools? Obviously a tool with such limitations would have to be
particularly vigilant about ensuring that media-under-test doesn't
exceed its capabilities.
* If so, do you think the chkudf implementation could qualify? It's not
ready yet, but with an investment of some time and energy it could be
made more functionally complete and (maybe more importantly) more
user-friendly.
In part this is a question of whether the chkudf design can support
enhancements to get (eventually) to UDF2.60 and optical media support,
balanced against the many years without an open-source udffsck and not
"letting the perfect become the enemy of the good."
* For any standards-based parser it's important to have examples of as
many variations as possible (both normal and pathological) in order to
ensure that corner cases and less common features are tested properly.
Can anyone point me to any good sources of UDF data for testing? There
are always commercial DVDs and Blu-Ray discs, of course, and I've
cobbled together a few special cases by hand (i.e., a filesystem with
directory cycles), but I have no examples with extended attributes or
stream data. If I could find a DVD of Mac software in a resale shop
would that help? [Side note, I've thought of enhancing chkudf to support
a tool that would store all the UDF structures of a filesystem in a
tarball that could be used to reconstitute that filesystem within a
sparse file. Since none of the file contents would be stored the
tarballs would be relatively small even if they represent terabyte-scale
filesystems.
* Are there versions (or features) of UDF that are less important to
support than others (1.50? Strategy 4096? Named streams? etc.) I know
1.02, 2.01, and 2.50 are in wide use.
* What kinds of repairs are most important to implement? I was thinking
that regeneration of the Logical Volume Integrity Descriptor and the
unallocated space bitmap are both important and hopefully relatively
straightforward. Beyond that...recovering ICBs to "lost+found"?
My 2 cents:
I didn't write this program. There are things I would have done
differently, but to this point I have tried to work within the existing
design and code style. After becoming more aware of differences between
the various UDF standards (in particular, the increase in complexity
since 2.01) and the many errata involved, I have a gut feeling that an
implementation in a language that supports inheritance might be a lot
more manageable over the long term - but it's not something I've spent a
lot of time thinking about. I've only recently become aware of
UDFclient, and haven't had time to look over its design yet. And, I can
see the potential for followon utilities such as a filesystem resizer -
which might argue for making more of the code library-based and not so
heavy on printed output.
Bottom line...udffsck has to start somewhere, could it start with chkudf?
Thanks for reading.
------------------------------------------------------------------------
ÂSteven J. MagnaniÂÂÂÂÂÂÂÂÂÂÂÂÂÂ "I claim this network for MARS!
Âwww.digidescorp.comÂÂÂÂÂÂÂÂÂÂÂÂÂ Earthling, return my space modulator!"
Â#include <standard.disclaimer>