Re: [PATCH 2/9] kstaled: documentation and config option.

From: Michel Lespinasse
Date: Wed Sep 28 2011 - 19:48:55 EST


On Tue, Sep 27, 2011 at 11:53 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Tue, 27 Sep 2011 17:49:00 -0700
> Michel Lespinasse <walken@xxxxxxxxxx> wrote:
>> +* idle_2_clean, idle_2_dirty_file, idle_2_dirty_swap: same definitions as
>> +  above, but for pages that have been untouched for at least two scan cycles.
>> +* these fields repeat up to idle_240_clean, idle_240_dirty_file and
>> +  idle_240_dirty_swap, allowing one to observe idle pages over a variety
>> +  of idle interval lengths. Note that the accounting is cumulative:
>> +  pages counted as idle for a given interval length are also counted
>> +  as idle for smaller interval lengths.
>
> I'm sorry if you've answered already.
>
> Why 240 ? and above means we have idle_xxx_clean/dirty/ xxx is 'seq 2 240' ?
> Isn't it messy ? Anyway, idle_1_clean etc should be provided.

We don't have all values - we export values for 1, 2, 5, 15, 30, 60,
120 and 240 idle scan intervals.
In our production setup, the scan interval is set at 120 seconds.
The exported histogram values are chosen so that each is approximately
double as the previous, and they align with human units i.e. 30 scan
intervals == 1 hour.
We use one byte per page to track the number of idle cycles, which is
why we don't export anything over 255 scan intervals

> Hmm, I don't like the idea very much...
>
> IIUC, there is no kernel interface which shows histgram rather than load_avg[].
> Is there any other interface and what histgram is provided ?
> And why histgram by kernel is required ?

I don't think exporting per-page statistics is very useful given that
userspace doesn't have a way to select individual pages to reclaim
(and if it did, we would have to expose LRU lists to userspace for it
to make good choices, and I don't think we want to go there). So, we
want to expose summary statistics instead. Histograms are a good way
to do that.

I don't think averages would work well for this application - the
distribution of idle page ages varies a lot between applications and
can't be assumed to be even close to a gaussian.

> BTW, can't this information be exported by /proc/<pid>/smaps or somewhere ?
> I guess per-proc will be wanted finally.

The problem with per-proc is that it only works for things that are
mapped in at the time you look at the report. It does not take into
consideration ephemeral mappings (i.e. if there is this thing you run
every 5 minutes and it needs 1G of memory) or files you access with
read() instead of mmap().

> Hm, do you use params other than idle_clean for your scheduling ?

The management software currently looks at only one bin of the
histogram - for each job, we can configure which bin it will look at.
Humans look at the complete picture when looking into performance
issues, and we're always thinking about teaching the management
software to do that as well :)

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/