Re: How to make warn_alloc() reliable?

From: Michal Hocko
Date: Tue Oct 18 2016 - 08:28:02 EST


On Tue 18-10-16 20:04:20, Tetsuo Handa wrote:
[...]
> @@ -1697,11 +1697,25 @@ static bool inactive_reclaimable_pages(struct lruvec *lruvec,
> int file = is_file_lru(lru);
> struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> + unsigned long wait_start = jiffies;
> + unsigned int wait_timeout = 10 * HZ;
> + long last_diff = 0;
> + long diff;
>
> if (!inactive_reclaimable_pages(lruvec, sc, lru))
> return 0;
>
> - while (unlikely(too_many_isolated(pgdat, file, sc))) {
> + while (unlikely((diff = too_many_isolated(pgdat, file, sc)) > 0)) {
> + if (diff < last_diff) {
> + wait_start = jiffies;
> + wait_timeout = 10 * HZ;
> + } else if (time_after(jiffies, wait_start + wait_timeout)) {
> + warn_alloc(sc->gfp_mask,
> + "shrink_inactive_list() stalls for %ums",
> + jiffies_to_msecs(jiffies - wait_start));
> + wait_timeout += 10 * HZ;
> + }
> + last_diff = diff;
> congestion_wait(BLK_RW_ASYNC, HZ/10);
>
> /* We are about to die and free our memory. Return now. */
> ----------
[...]
> So, how can we make warn_alloc() reliable?

This is not about warn_alloc reliability but more about
too_many_isolated waiting for an unbounded amount of time. And that
should be fixed. I do not have a good idea how right now.
--
Michal Hocko
SUSE Labs