Re: [PATCH v12 00/31] Speculative page faults
From: Matthew Wilcox
Date: Tue Apr 23 2019 - 08:42:23 EST
On Tue, Apr 23, 2019 at 12:47:07PM +0200, Michal Hocko wrote:
> On Mon 22-04-19 14:29:16, Michel Lespinasse wrote:
> [...]
> > I want to add a note about mmap_sem. In the past there has been
> > discussions about replacing it with an interval lock, but these never
> > went anywhere because, mostly, of the fact that such mechanisms were
> > too expensive to use in the page fault path. I think adding the spf
> > mechanism would invite us to revisit this issue - interval locks may
> > be a great way to avoid blocking between unrelated mmap_sem writers
> > (for example, do not delay stack creation for new threads while a
> > large mmap or munmap may be going on), and probably also to handle
> > mmap_sem readers that can't easily use the spf mechanism (for example,
> > gup callers which make use of the returned vmas). But again that is a
> > separate topic to explore which doesn't have to get resolved before
> > spf goes in.
>
> Well, I believe we should _really_ re-evaluate the range locking sooner
> rather than later. Why? Because it looks like the most straightforward
> approach to the mmap_sem contention for most usecases I have heard of
> (mostly a mm{unm}ap, mremap standing in the way of page faults).
> On a plus side it also makes us think about the current mmap (ab)users
> which should lead to an overall code improvements and maintainability.
Dave Chinner recently did evaluate the range lock for solving a problem
in XFS and didn't like what he saw:
https://lore.kernel.org/linux-fsdevel/20190418031013.GX29573@xxxxxxxxxxxxxxxxxxx/T/#md981b32c12a2557a2dd0f79ad41d6c8df1f6f27c
I think scaling the lock needs to be tied to the actual data structure
and not have a second tree on-the-side to fake-scale the locking. Anyway,
we're going to have a session on this at LSFMM, right?
> SPF sounds like a good idea but it is a really big and intrusive surgery
> to the #PF path. And more importantly without any real world usecase
> numbers which would justify this. That being said I am not opposed to
> this change I just think it is a large hammer while we haven't seen
> attempts to tackle problems in a simpler way.
I don't think the "no real world usecase numbers" is fair. Laurent quoted:
> Ebizzy:
> -------
> The test is counting the number of records per second it can manage, the
> higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get
> consistent result I repeated the test 100 times and measure the average
> result. The number is the record processes per second, the higher is the best.
>
> BASE SPF delta
> 24 CPUs x86 5492.69 9383.07 70.83%
> 1024 CPUS P8 VM 8476.74 17144.38 102%
and cited 30% improvement for you-know-what product from an earlier
version of the patch.