> > task->memcg_nr_pages_over_high is not vague, it's a best-effort
> > mechanism to distribute fairness. It's the current task's share of the
> > cgroup's overage, and it allows us in the majority of situations to
> > distribute reclaim work and sleeps in proportion to how much the task
> > is actually at fault.
>
> Agreed. But this stops being the case as soon as the reclaim target has
> been reached and new reclaim attempts are enforced because the memcg is
> still above the high limit. Because then you have a completely different
> reclaim target - get down to the limit. This would be especially visible
> with a large memcg_nr_pages_over_high which could even lead to an over
> reclaim.
We actually over reclaim even before this patch -- this patch doesn't bring
much new in that regard.
Tracing try_to_free_pages for a cgroup at the memory.high threshold shows
that before this change, we sometimes even reclaim on the order of twice the
number of pages requested. For example, I see cases where we requested 1000
pages to be reclaimed, but end up reclaiming 2000 in a single reclaim
attempt.
This is interesting and worth looking into. I am aware that we can
reclaim potentially much more pages during the icache reclaim and that
there was a heated discussion without any fix merged in the end IIRC.
Do you have any details?