Re: [External] Re: [RFC PATCH v2 2/5] mm: Add policy_name to identify OOM policies

From: Chuyi Zhou
Date: Thu Sep 14 2023 - 08:50:56 EST


Hello,

在 2023/9/14 20:02, Bixuan Cui 写道:


在 2023/8/15 4:51, Jonathan Corbet 写道:
  /**
   * dump_tasks - dump current memory state of all system tasks
   * @oc: pointer to struct oom_control
@@ -484,8 +513,8 @@ static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim)
  static void dump_header(struct oom_control *oc, struct task_struct *p)
  {
-    pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n",
-        current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order,
+    pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, policy_name=%s, oom_score_adj=%hd\n",
+        current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, oc->policy_name,
...and if the policy name is unterminated, this print will run off the
end of the structure.

Am I missing something here?
Perhaps it is inaccurate to use policy name in this way. For example, some one use BPF_PROG(bpf_oom_evaluate_task, ...) but do not set the policy name through bpf_set_policy_name. In this way, the result is still policy name=default, which ultimately leads to error print in the dump_header.
I think a better way:

+static const char *const policy_select[] = {
+    "OOM_DEFAULT";
+    "BPF_ABORT",
+    "BPF_NEXT",
+    "BPF_SELECT",
+};

struct oom_control {

     /* Used to print the constraint info. */
     enum oom_constraint constraint;
+
+    /* Used to report the policy select. */
+    int policy_select;
 };

static int oom_evaluate_task(struct task_struct *task, void *arg)
{
...

+    switch (bpf_oom_evaluate_task(task, oc)) {
+    case BPF_EVAL_ABORT:
+              oc->policy_select = BPF_EVAL_ABORT;
+        goto abort; /* abort search process */
+    case BPF_EVAL_NEXT:
+              oc->policy_select = BPF_EVAL_NEXT;
+        goto next; /* ignore the task */
+    case BPF_EVAL_SELECT:
+             oc->policy_select = BPF_EVAL_SELECT;
+        goto select; /* select the task */
+    default:
+        break; /* No BPF policy */
+    }

 static void dump_header(struct oom_control *oc, struct task_struct *p)
 {
-    pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n",
-        current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order,
+    pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, policy_name=%s, oom_score_adj=%hd\n",
+        current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, policy_select[oc->policy_select],
             current->signal->oom_score_adj);



The policy_name may be different from the previous OOM reporting, even though they are using the same policy.

And all definitions of oc should be added
struct oom_control oc = {
     .select = NO_BPF_POLICY,
}

Delete set_oom_policy_name, it makes no sense for users to set policy names. :-)


There can be multiple OOM policy in the system at the same time.

If we need to apply different OOM policies to different memcgs based on different scenarios, we can use this hook(set_oom_policy_name) to set name to identify which policy in invoked at that time.

Just some thoughts.

Thanks.

Thanks
Bixuan Cui