Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

From: Oleg Nesterov
Date: Tue May 07 2019 - 11:32:53 EST


I am not going to comment the intent, but to be honest I am skeptical too.

On 05/06, Sultan Alsawaf wrote:
>
> +static unsigned long find_victims(struct victim_info *varr, int *vindex,
> + int vmaxlen, int min_adj, int max_adj)
> +{
> + unsigned long pages_found = 0;
> + int old_vindex = *vindex;
> + struct task_struct *tsk;
> +
> + for_each_process(tsk) {
> + struct task_struct *vtsk;
> + unsigned long tasksize;
> + short oom_score_adj;
> +
> + /* Make sure there's space left in the victim array */
> + if (*vindex == vmaxlen)
> + break;
> +
> + /* Don't kill current, kthreads, init, or duplicates */
> + if (same_thread_group(tsk, current) ||
> + tsk->flags & PF_KTHREAD ||
> + is_global_init(tsk) ||
> + vtsk_is_duplicate(varr, *vindex, tsk))
> + continue;
> +
> + vtsk = find_lock_task_mm(tsk);

Did you test this patch with lockdep enabled?

If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
ensures that we do not take the same ->alloc_lock twice or more, but lockdep
can't know this.

> +static void scan_and_kill(unsigned long pages_needed)
> +{
> + static DECLARE_WAIT_QUEUE_HEAD(victim_waitq);
> + struct victim_info victims[MAX_VICTIMS];
> + int i, nr_to_kill = 0, nr_victims = 0;
> + unsigned long pages_found = 0;
> + atomic_t victim_count;
> +
> + /*
> + * Hold the tasklist lock so tasks don't disappear while scanning. This
> + * is preferred to holding an RCU read lock so that the list of tasks
> + * is guaranteed to be up to date. Keep preemption disabled until the
> + * SIGKILLs are sent so the victim kill process isn't interrupted.
> + */
> + read_lock(&tasklist_lock);
> + preempt_disable();

read_lock() disables preemption, every task_lock() too, so this looks
unnecessary.

> + for (i = 1; i < ARRAY_SIZE(adj_prio); i++) {
> + pages_found += find_victims(victims, &nr_victims, MAX_VICTIMS,
> + adj_prio[i], adj_prio[i - 1]);
> + if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS)
> + break;
> + }
> +
> + /*
> + * Calculate the number of tasks that need to be killed and quickly
> + * release the references to those that'll live.
> + */
> + for (i = 0, pages_found = 0; i < nr_victims; i++) {
> + struct victim_info *victim = &victims[i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + /* The victims' mm lock is taken in find_victims; release it */
> + if (pages_found >= pages_needed) {
> + task_unlock(vtsk);
> + continue;
> + }
> +
> + /*
> + * Grab a reference to the victim so it doesn't disappear after
> + * the tasklist lock is released.
> + */
> + get_task_struct(vtsk);

The comment doesn't look correct. the victim can't dissapear until task_unlock()
below, it can't pass exit_mm().

> + pages_found += victim->size;
> + nr_to_kill++;
> + }
> + read_unlock(&tasklist_lock);
> +
> + /* Kill the victims */
> + victim_count = (atomic_t)ATOMIC_INIT(nr_to_kill);
> + for (i = 0; i < nr_to_kill; i++) {
> + struct victim_info *victim = &victims[i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + pr_info("Killing %s with adj %d to free %lu kiB\n", vtsk->comm,
> + vtsk->signal->oom_score_adj,
> + victim->size << (PAGE_SHIFT - 10));
> +
> + /* Configure the victim's mm to notify us when it's freed */
> + vtsk->mm->slmk_waitq = &victim_waitq;
> + vtsk->mm->slmk_counter = &victim_count;
> +
> + /* Accelerate the victim's death by forcing the kill signal */
> + do_send_sig_info(SIGKILL, SIG_INFO_TYPE, vtsk, true);
^^^^
this should be PIDTYPE_TGID

> +
> + /* Finally release the victim's mm lock */
> + task_unlock(vtsk);
> + }
> + preempt_enable_no_resched();

See above. And I don't understand how can _no_resched() really help...

> +
> + /* Try to speed up the death process now that we can schedule again */
> + for (i = 0; i < nr_to_kill; i++) {
> + struct task_struct *vtsk = victims[i].tsk;
> +
> + /* Increase the victim's priority to make it die faster */
> + set_user_nice(vtsk, MIN_NICE);
> +
> + /* Allow the victim to run on any CPU */
> + set_cpus_allowed_ptr(vtsk, cpu_all_mask);
> +
> + /* Finally release the victim reference acquired earlier */
> + put_task_struct(vtsk);
> + }
> +
> + /* Wait until all the victims die */
> + wait_event(victim_waitq, !atomic_read(&victim_count));

Can't we avoid the new slmk_waitq/slmk_counter members in mm_struct?

I mean, can't we export victim_waitq and victim_count and, say, set/test
MMF_OOM_VICTIM. In fact I think you should try to re-use mark_oom_victim()
at least.

Oleg.