Re: [PATCH] i386: fix vmalloc_sync_all() for Xen

From: Jeremy Fitzhardinge
Date: Thu Jun 19 2008 - 10:46:47 EST


Jan Beulich wrote:
Given that the usermode PGDs will never need syncing, I think it would be better to use KERNEL_PGD_PTRS, and define

#define sync_index(a) (((a) >> PMD_SHIFT) - KERNEL_PGD_BOUNDARY)

for a massive 192 byte saving in bss.

I was considering that, too, but didn't do so for simplicity's sake. If I'll
have to re-spin the patch, I may as well do it.

+ for (address = start; address >= TASK_SIZE; address += PMD_SIZE) {
Would it be better - especially for the Xen case - to only iterate from TASK_SIZE to FIXADDR_TOP rather than wrapping around? What will vmalloc_sync_one do on Xen mappings?

Could be done, but since there will never be any out-of-sync Xen entries,
it doesn't hurt doing the full pass. I agree it would possibly be more
correct,though.

+ if (!test_bit(sync_index(address), insync)) {
It's probably worth reversing this test and removing a layer of indentation.

How? There's a second if() following this one, so we can't just 'continue;'
here.

That second if() block seems completely redundant:

if (address == start && test_bit(pgd_index(address), insync))
start = address + PGDIR_SIZE;

All it does it update "start", but start isn't used anywhere else in the loop.

spin_lock_irqsave(&pgd_lock, flags);
+ if (unlikely(list_empty(&pgd_list))) {
+ spin_unlock_irqrestore(&pgd_lock, flags);
+ return;
+ }
This seems a bit warty. If the list is empty, then won't the list_for_each_entry() just fall through? Presumably this only applies to boot, since pgd_list won't be empty on a running system with usermode processes. Is there a correctness issue here, or is it just a micro-optimisation?

No, it isn't. Note the setting to NULL of page, which after the loop gets
tested for. list_for_each_entry() would never yield a NULL page, even
if the list is empty.

Does that matter? If pgd_list is empty, then it's in sync by definition. Why does it need special-casing?


And

list_for_each_entry(page, &pgd_list, lru) {
if (!vmalloc_sync_one(page_address(page),
- address))
+ address)) {
+ BUG_ON(list_first_entry(&pgd_list,
+ struct page,
+ lru) != page);
What condition is this testing for?

This is a replacement of the BUG_ON() that an earlier patch from you
removed: Failure of vmalloc_sync_one() must happen on the first
entry or never, and this is what is being checked for here.

Could you add a comment?

No, just like on 32-bit it's because modules loaded may access
vmalloc()-ed memory from notifiers that are called in contexts (NMI)
where taking even simple (propagation) page faults cannot be
tolerated (since the final IRET would result in finishing the NMI
handling from a CPU (or hypervisor) perspective.

Well, 32-bit PAE avoids any syncing by having all pagetables share the same pmd containing the vmalloc mappings (ignoring the complications Xen adds here). Couldn't 64-bit do the same thing at the pud level (preallocate as many puds needed to fill out the vmalloc area size).

Uh, I guess that's not practical with 64TB of vmalloc address space reserved...

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/