Re: [RFC PATCH 3/3] misc_cgroup: remove error log to avoid log flood

From: Michal Koutný
Date: Thu Sep 09 2021 - 10:37:39 EST


On Wed, Sep 08, 2021 at 01:24:36PM +0800, brookxu <brookxu.cn@xxxxxxxxx> wrote:
> This log provides less information, we can get more detailed failure
> records through
> misc.events, misc.events.local and misc.failcnt.
> From this, perhaps we can remove it.

I hope: a) it's not used widely, b) no-one relies on parsing the
message so this is an acceptable change.

> @@ -157,13 +157,6 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
> new_usage = atomic_long_add_return(amount, &res->usage);
> if (new_usage > READ_ONCE(res->max) ||
> new_usage > READ_ONCE(misc_res_capacity[type])) {
> - if (!res->failed) {
> - pr_info("cgroup: charge rejected by the misc controller for %s resource in ",
> - misc_res_name[type]);
> - pr_cont_cgroup_path(i->css.cgroup);
> - pr_cont("\n");
> - res->failed = true;
> - }

`i` is the misc_cg whose limit was hit. In a sense, I think the current
implementation of the log message (before your patch) is not as useful
as it could be. The logged message here should not refer to `i` but `cg`
(i.e. the cgroup where the actual chargee resides). It's basically the
idea from [1].

So there could be two type of events (not referring to the v1-specific
failcnt):
- max - number of times the cgroup's misc.max was hit,
- fail - number of times operation failed (rejected) in the cgroup.

The former would tell you which limit is probably set too low, the
latter would capture which cgroup workload is affected. (The difference
would become apparent with >1 level deep hierarchies.)

Regards,
Michal

[1] https://lore.kernel.org/lkml/20191202191100.GF16681@xxxxxxxxxxxxxxxxxxxxxxxxxxx/