Re: [patch] mm, oom: fix unnecessary killing of additional processes
From: David Rientjes
Date: Wed Jun 20 2018 - 16:35:00 EST
On Wed, 20 Jun 2018, Michal Hocko wrote:
> On Tue 19-06-18 10:33:16, Michal Hocko wrote:
> [...]
> > As I've said, if you are not willing to work on a proper solution, I
> > will, but my nack holds for this patch until we see no other way around
> > existing and real world problems.
>
> OK, so I gave it a quick try and it doesn't look all that bad to me.
> This is only for blockable mmu notifiers. I didn't really try to
> address all the problems down the road - I mean some of the blocking
> notifiers can check the range in their interval tree without blocking
> locks. It is quite probable that only few ranges will be of interest,
> right?
>
> So this is only to give an idea about the change. It probably even
> doesn't compile. Does that sound sane?
It depends on how invasive we want to make this, it should result in more
memory being freeable if the invalidate callbacks can guarantee that they
won't block. I think it's much more invasive than the proposed patch,
however.
For the same reason as the mm->mmap_sem backoff, however, this should
retry for a longer period of time than HZ. If we can't grab mm->mmap_sem
the first five times with the trylock because of writer queueing, for
example, then we only have five attempts for each blockable mmu notifier
invalidate callback, and any of the numerous locks it can take to declare
it will not block.
Note that this doesn't solve the issue with setting MMF_OOM_SKIP too early
on processes with mm->mmap_sem contention or now invalidate callbacks that
will block; the decision that the mm cannot be reaped should come much
later.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6bcecc325e7e..ac08f5d711be 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7203,8 +7203,9 @@ static void vcpu_load_eoi_exitmap(struct kvm_vcpu *vcpu)
> kvm_x86_ops->load_eoi_exitmap(vcpu, eoi_exit_bitmap);
> }
>
> -void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> - unsigned long start, unsigned long end)
> +int kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> + unsigned long start, unsigned long end,
> + bool blockable)
> {
> unsigned long apic_address;
>
> @@ -7215,6 +7216,8 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
> apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> if (start <= apic_address && apic_address < end)
> kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
> +
> + return 0;
> }
>
> void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
Auditing the first change in the patch, this is incorrect because
kvm_make_all_cpus_request() for KVM_REQ_APIC_PAGE_RELOAD can block in
kvm_kick_many_cpus() and that is after kvm_make_request() has been done.