On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote:
I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops
aggregate) to focus on testing the copy_file_range() changes, but
Darrick's tests are still ongoing and have passed 40 billion ops in
aggregate over the past few days.
The reason we are running these so long is that we've seen fsx data
corruption failures after 12+ hours of runtime and hundreds of
millions of ops. Hence the testing for backported fixes will need to
replicate these test runs across multiple configurations for
multiple days before we have any confidence that we've actually
fixed the data corruptions and not introduced any new ones.
If you pull only a small subset of the fixes, the fsx will still
fail and we have no real way of actually verifying that there have
been no regression introduced by the backport. IOWs, there's a
/massive/ amount of QA needed for ensuring that these backports work
correctly.
Right now the XFS developers don't have the time or resources
available to validate stable backports are correct and regression
fre because we are focussed on ensuring the upstream fixes we've
already made (and are still writing) are solid and reliable.
Ok, that's fine, so users of XFS should wait until the 4.20 release
before relying on it? :)
I understand your reluctance to want to backport anything, but it really
feels like you are not even allowing for fixes that are "obviously
right" to be backported either, even after they pass testing. Which
isn't ok for your users.