Re: [PATCH 17/18] fs: icache remove inode_lock

From: Nick Piggin
Date: Thu Oct 14 2010 - 23:45:00 EST


On Fri, Oct 15, 2010 at 02:30:17PM +1100, Nick Piggin wrote:
> On Fri, Oct 15, 2010 at 02:13:43PM +1100, Dave Chinner wrote:
> > You've shown it can be done, and that's great - it shows
> > us the impact of making those changes, but they need to be analysed
> > separately and treated on own their merits, not lumped with core
> > locking changes necessary for store-free path walking.
>
> Actually I didn't see anyone else object to doing this. Everybody
> else it seems acknowledges that it needs to be done, and it gets
> done naturally as a side effect of fine grained locking.

Let's just get back to this part, which seems to be one you have
the most issues with maybe?

You're objecting to per-zone locks and per-zone LRUs for inode and
dcache?

Well I have told you why per-zone LRUs are needed, I can expand on
any of the reasons if that is unclear. Per-zone locks I think come
naturally at the same time and they will expose some fs bottlenecks,
but that is simply how scalability development works.

So, do you object to per-zone LRUs in particular, or per-zone locks?
(Ie. the potentially changed reclaim pattern, or the increased
parallelism).

When you looked at this initially, you didn't understand how
reclaim works. It will not fill up a zone with inodes and then start
reclaiming all those inodes, leaving other nodes empty (unless that
is how you configure the machine, but it isn't the default). It
fills up inodes from all nodes (same as today) and it will start
reclaiming from all nodes at about the same pressure when there is
a shortage.

Reclaim basically approximates LRU by scanning a little from the top
of each LRU. When you have many thousands of objects, and reclaim is
a really failable and dumb process anyway, then the perturbation of
the reclaim pattern doesn't matter much. Our zone based page reclaim
works exactly the same way.

I don't think you can possibly be arguing against more scalable
locking in reclaim, so perhaps you are also worried about increased
parallelism in the filesystem callbacks from reclaim? I really can't
see this being a big problem, any more than any other increased
paralellism on fses or other subsystems caused by scaling vfs.

There might be some interesting issues with different locking
designs being hit in different ways, but really we can't stop
progress and test all loads on all filesystems. The way forward is
to fix the bottleneck in the filesystem, or the filesystem sucks
so bad it can't handle it, then just put a lock in there and not
peanalise others.

It's not like I haven't tested it, I've spent the better part of
the past year testing things. The I_FREEING batching stuff is one
example where I found and fixed a small problem exposed by the
reclaim changes.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/