Thank you for the thorough review of my patches. Comments below
On Thu, 9 Dec 2004, Hugh Dickins wrote:
Your V12 patches would apply well to 2.6.10-rc3, except that (as noted
before) your mailer or whatever is eating trailing whitespace: trivial
patch attached to apply before yours, removing that whitespace so yours
apply. But what your patches need to apply to would be 2.6.10-mm.
I am still mystified as to why this is an issue at all. The patches apply
just fine to the kernel sources as is. I have patched kernels numerous
times with this patchset and never ran into any issue. quilt removes trailing
whitespace from patches when they are generated as far as I can tell.
Patches will be made against mm after Nick's modifications to the 4 level
patches are in.
probably others (harder to think through). Your 4/7 patch for i386 has
an unused atomic get_64bit function from Nick, I think you'll have to
define a get_pte_atomic macro and use get_64bit in its 64-on-32 cases.
That would be a performance issue.
Hmm, that will only work if you're using atomic set_64bit rather than
relying on page_table_lock in the complementary places which matter.
Which I believe you are indeed doing in your 3level set_pte. Shouldn't
__set_64bit be using LOCK_PREFIX like __get_64bit, instead of lock?
But by making every set_pte use set_64bit, you are significantly slowing
down many operations which do not need that atomicity. This is quite
visible in the fork/exec/shell results from lmbench on i386 PAE (and is
the only interesting difference, for good or bad, that I noticed with
your patches in lmbench on 2*HT*P4), which run 5-20% slower. There are
no faults on dst mm (nor on src mm) while copy_page_range is copying,
so its set_ptes don't need to be atomic; likewise during zap_pte_range
(either mmap_sem is held exclusively, or it's in the final exit_mmap).
Probably revert set_pte and set_pte_atomic to what they were, and use
set_pte_atomic where it's needed.
Good suggestions. Will see what I can do but I will need some assistence
my main platform is ia64 and the hardware and opportunities for testing on
i386 are limited.