Re: (subset) [PATCH 22/32] vfs: inode cache conversion to hash-bl

From: Mateusz Guzik
Date: Fri Oct 27 2023 - 13:15:46 EST


On 10/23/23, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Fri, Oct 20, 2023 at 07:49:18PM +0200, Mateusz Guzik wrote:
>> On 10/20/23, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Thu, Oct 19, 2023 at 05:59:58PM +0200, Mateusz Guzik wrote:
>> >> > To be clear there is no urgency as far as I'm concerned, but I did
>> >> > run
>> >> > into something which is primarily bottlenecked by inode hash lock
>> >> > and
>> >> > looks like the above should sort it out.
>> >> >
>> >> > Looks like the patch was simply forgotten.
>> >> >
>> >> > tl;dr can this land in -next please
>> >>
>> >> In case you can't be arsed, here is something funny which may convince
>> >> you to expedite. ;)
>> >>
>> >> I did some benching by running 20 processes in parallel, each doing
>> >> stat
>> >> on a tree of 1 million files (one tree per proc, 1000 dirs x 1000
>> >> files,
>> >> so 20 mln inodes in total). Box had 24 cores and 24G RAM.
>> >>
>> >> Best times:
>> >> Linux: 7.60s user 1306.90s system 1863% cpu 1:10.55 total
>> >> FreeBSD: 3.49s user 345.12s system 1983% cpu 17.573 total
>> >> OpenBSD: 5.01s user 6463.66s system 2000% cpu 5:23.42 total
>> >> DragonflyBSD: 11.73s user 1316.76s system 1023% cpu 2:09.78 total
>> >> OmniosCE: 9.17s user 516.53s system 1550% cpu 33.905 total
>> >>
>> >> NetBSD failed to complete the run, OOM-killing workers:
>> >> http://mail-index.netbsd.org/tech-kern/2023/10/19/msg029242.html
>> >> OpenBSD is shafted by a big kernel lock, so no surprise it takes a
>> >> long
>> >> time.
>> >>
>> >> So what I find funny is that Linux needed more time than OmniosCE (an
>> >> Illumos variant, fork of Solaris).
>> >>
>> >> It also needed more time than FreeBSD, which is not necessarily funny
>> >> but not that great either.
>> >>
>> >> All systems were mostly busy contending on locks and in particular
>> >> Linux
>> >> was almost exclusively busy waiting on inode hash lock.
>> >
>> > Did you bother to test the patch, or are you just complaining
>> > that nobody has already done the work for you?
>>
>> Why are you giving me attitude?
>
> Look in the mirror, mate.
>
> Starting off with a derogatory statement like:
>
> "In case you can't be arsed, ..."
>
> is a really good way to start a fight.
>
> I don't think anyone working on this stuff couldn't be bothered to
> get their lazy arses off their couches to get it merged. Though you
> may not have intended it that way, that's exactly what "can't be
> arsed" means.
>
> I have not asked for this code to be merged because I'm not ready to
> ask for it to be merged. I'm trying to be careful and cautious about
> changing core kernel code that every linux installation out there
> uses because I care about this code being robust and stable. That's
> the exact opposite of "can't be arsed"....
>
> Further, you have asked for code that is not ready to be merged to
> be merged without reviewing it or even testing it to see if it
> solved your reported problem. This is pretty basic stuff - it you
> want it merged, then *you also need to put effort into getting it
> merged* regardless of who wrote the code. TANSTAAFL.
>
> But you've done neither - you've just made demands and thrown
> hypocritical shade implying busy people working on complex code are
> lazy arses.
>

So I took few days to take a look at this with a fresh eye and I see
where the major disconnect is coming from, albeit still don't see how
it came to be nor why it persists.

To my understanding your understanding is that I demand you carry the
hash bl patch over the finish line and I'm rude about it as well.

That is not my position here though.

For starters my opening e-mail was to Christian, not you. You are
CC'ed as the patch author. It is responding to an e-mail which claimed
the patch would land in -next, which to my poking around did not
happen (and I checked it's not in master either). Since there was no
other traffic about it that I could find, I figured it was probably
forgotten. You may also notice the e-mail explicitly states:
1. I have a case which runs into inode hash being a problem
2. *there is no urgency*, I'm just asking what's up with the patch not
getting anywhere.

The follow up including a statement about "being arsed" once more was
to Christian, not you and was rather "tongue in cheek".

If you know about Illumos, it is mostly slow and any serious
performance work stopped there when Oracle closed the codebase over a
decade ago. Or to put it differently, one has to be doing something
really bad to not be faster today. And there was this bad -- the inode
hash. I found it amusing and decided to share in addition to asking
about the patch.

So no Dave, I'm not claiming the patch is not in because anyone is lazy.

Whether the patch is ready for reviews and whatnot is your call to
make as the author.

To repeat from my previous e-mail I note the lock causes real problems
in a real-world setting, it's not just microbenchmarks, but I'm in no
position to test it against the actual workload (only the part I
carved out into a benchmark, where it does help -- gets rid of the
nasty back-to-back lock acquire, first to search for the inode and
then to insert a new one).

If your assessment is that more testing is needed, that makes sense
and is again your call to make. I repeat again I can't help with this
bit though. And if you don't think the effort is justified at the
moment (or there are other things with higher priority), so be it.

It may be I'll stick around in general and if so it may be I'm going
to run into you again.
With this in mind:

> Perhaps you should consider your words more carefully in future?
>

On that front perhaps you could refrain from assuming someone is
trying to call you names or whatnot. But more importantly if you
consider an e-mail to be rude, you can call it out instead of
escalating or responding in what you consider to be the same tone.

All that said I'm bailing from this patchset.

Cheers,
--
Mateusz Guzik <mjguzik gmail.com>