Given that the usermode PGDs will never need syncing, I think it would be better to use KERNEL_PGD_PTRS, and define
#define sync_index(a) (((a) >> PMD_SHIFT) - KERNEL_PGD_BOUNDARY)
for a massive 192 byte saving in bss.
I was considering that, too, but didn't do so for simplicity's sake. If I'll
have to re-spin the patch, I may as well do it.
+ for (address = start; address >= TASK_SIZE; address += PMD_SIZE) {Would it be better - especially for the Xen case - to only iterate from TASK_SIZE to FIXADDR_TOP rather than wrapping around? What will vmalloc_sync_one do on Xen mappings?
Could be done, but since there will never be any out-of-sync Xen entries,
it doesn't hurt doing the full pass. I agree it would possibly be more
correct,though.
+ if (!test_bit(sync_index(address), insync)) {It's probably worth reversing this test and removing a layer of indentation.
How? There's a second if() following this one, so we can't just 'continue;'
here.
spin_lock_irqsave(&pgd_lock, flags);This seems a bit warty. If the list is empty, then won't the list_for_each_entry() just fall through? Presumably this only applies to boot, since pgd_list won't be empty on a running system with usermode processes. Is there a correctness issue here, or is it just a micro-optimisation?
+ if (unlikely(list_empty(&pgd_list))) {
+ spin_unlock_irqrestore(&pgd_lock, flags);
+ return;
+ }
No, it isn't. Note the setting to NULL of page, which after the loop gets
tested for. list_for_each_entry() would never yield a NULL page, even
if the list is empty.
And
list_for_each_entry(page, &pgd_list, lru) {What condition is this testing for?
if (!vmalloc_sync_one(page_address(page),
- address))
+ address)) {
+ BUG_ON(list_first_entry(&pgd_list,
+ struct page,
+ lru) != page);
This is a replacement of the BUG_ON() that an earlier patch from you
removed: Failure of vmalloc_sync_one() must happen on the first
entry or never, and this is what is being checked for here.
No, just like on 32-bit it's because modules loaded may access
vmalloc()-ed memory from notifiers that are called in contexts (NMI)
where taking even simple (propagation) page faults cannot be
tolerated (since the final IRET would result in finishing the NMI
handling from a CPU (or hypervisor) perspective.