Re: [patch 6/12] hold atomic kmaps across generic_file_read

From: Linus Torvalds (torvalds@transmeta.com)
Date: Sat Aug 10 2002 - 12:01:27 EST

Next message: Nilanjan Bhowmik: "Kernel 2.4.19: Tulip network card does not sync with Dumb 10/100 Linksys 8 port Etherfast switch"
Previous message: Linus Torvalds: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
In reply to: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Next in thread: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Reply: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 10 Aug 2002, Daniel Phillips wrote:

> Sorry, this connection is too subtle for me. I see why we want to do
> this, and in fact I've been researching how to do it for the last few
> weeks, but I don't see how it's related to the atomic kmap path. Could
> you please explain, in words of one syllable?

We cannot do that optimization generally. I'll give you two reasons, both
of which are sufficient on their own:

- doing the page table walk is simply slower than doing the memcpy if the
   page is just there. So you have to have a good heuristic on when it
   might be worthwhile to do page table tricks. That heuristic should
   include "is the page directly accessible". Which is exactly what you
   get if you have a "atomic copy_to_user() that returns failure if it
   cannot be done atomically".

- Even if walking the page tables were to be fast (ie ignoring #1),
   replacing a page in virtual memory is absolutely not. Especially not on
   SMP, where replacing a page in memory implies doing CPU crosscalls in
   order to invalidate the TLB on other CPU's for the old page. So before
   you do the "clever VM stuff", you had better have a heuristic that says
   "this page isn't mapped, so it doesn't need the expensive cross-calls".

Again: guess what gives you pretty much exactly that heuristic?

See? The fact is, "memcpy()" is damned fast for a lot of cases, because it
natively uses the TLB and existing caches. It's slow for other cases, but
you want to have a good _heuristic_ for when you might want to try to
avoid the slow case without avoiding the fast case. Without that heuristic
you can't do the optimization sanely.

And obviously the heuristic should be a really fast one. The atomic
copy_to_user() is the _perfect_ heuristic, because if it just does the
memcpy there is absolutely zero overhead (it just does it). The overhead
comes in only in the case where we're going to be slowed down by the fault
anyway, _and_ where we want to do the clever tricks.

> While I'm feeling disoriented, what exactly is the deadlock path for a
> write from a mmaped, not uptodate page, to the same page? And why does
> __get_user need to touch the page in *two* places to instantiate it?

It doesn't touch it twice. It touches _both_ of the potential pages that
will be involved in the memcpy - since the copy may well not be
page-aligned in user space.

The deadlock is when you do a write of a page into a mapping of the very
same page that isn't yet mapped. What happens is:

- the write has gotten the page lock. Since the wrie knows that the whole
page is going to be overwritten, it is _not_ marked uptodate, and the
old contents (garbage from the allocation) are left alone.

- the copy_from_user() pagefaults and tries to bring in the _same_ page
into user land.

- that involves reading in the page and making sure it is up-to-date

- but since the write has already locked the page, you now have a
   deadlock. The write cannot continue, since it needs the old contents,
   and the old contents cannot be read in since the write holds the page
   lock.

The "copy_from_user() atomically" solves the problem quite nicely. If the
atomic copy fails, we can afford to do the things that we cannot afford to
do normally (because the thing never triggers under real load, and real
load absolutely _needs_ to not try to get the page up-to-date before the
write).

So with the atomic copy-from-user, we can trap the problem only when it is
a problem, and go full speed normally.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nilanjan Bhowmik: "Kernel 2.4.19: Tulip network card does not sync with Dumb 10/100 Linksys 8 port Etherfast switch"
Previous message: Linus Torvalds: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
In reply to: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Next in thread: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Reply: Daniel Phillips: "Re: [patch 6/12] hold atomic kmaps across generic_file_read"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Aug 15 2002 - 22:00:22 EST