Re: CPA boot crash (was: [PATCH] [0/36] Great change_page_attrpatch series v3)

From: Ingo Molnar
Date: Tue Jan 22 2008 - 19:00:53 EST



* Andi Kleen <ak@xxxxxxx> wrote:

> > because it interferes/interacts with CPA and the page table code. So
>
> No that is not its main problem I believe. Main problem are all the
> driver and other subsystem interactions (it is a little bit similar to
> power management where you have lots of little bits all over right
> instead of a single big one). [...]

that is (yet another) major misconception on your part. "Drivers" are an
easy to blame target (i guess because there's no one out there to defend
a vague "drivers" accusation), and they are not the problem here _at
all_.

Drivers tell the architecture code which physical pages they'd like to
have access to (or which page range they'd like to see different cache
attributes on) and that's it. They are plain users of the ioremap() and
change_page_attr() APIs. Nothing more, nothing less.

It is the utmost duty of architecture code to make those APIs
fool-proof. Hardware _will_ mess up the physical parameters that get
passed in every possible way - and drivers just try to use what the
hardware tells them to use. So robustness is key and there's just no
"driver reason" why these APIs cannot be robust.

so you are delusional if you think that the c_p_a() problems are "driver
and other subsystem interactions".

And your analogy with power management could not be more mistaken. Power
management and suspend/resume in particular is so complex because it is
analogous to a _full bootup and shutdown cycle_, with the following,
hard to meet expectation from the user: 'this stuff must work all the
time, and must be instantaneous'. Suspend/resume is an _incredibly
complex_ machinery and the user does not realize (and does not accept
the concequences) of this complexity. It is a codepath that is affected
by tens and tens of thousands of driver and core kernel code. Just one
single mistake and "resume does not work".

ioremap() and change_page_attr() on the other hand is a small, few
hundred lines codebase for a stable and well-defined purpose. There's no
significant "subsystem interactions" whatsoever.

by far the most intense and most high-frequency user of the
change_page_attr() code is CONFIG_DEBUG_PAGEALLOC=y. It does a cpa call
for every single page and slab allocation/freeing. But this debug
feature ... is not enabled on the 64-bit side - why? So unfortunately we
dont have any real robustness track record of the 64-bit side of the CPA
code, and that's exactly the code your clflush and gbpages code changes.

oh, and due to that i'll probably revert these two patches of yours:

Subject: x86: c_p_a(), change kernel_map_pages to not use c_p_a()
Subject: x86: c_p_a(), change 32-bit back to init_mm semaphore locking

as with these changes you've removed _the_ most important stress-tester
for the c_p_a() code: DEBUG_PAGEALLOC.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/