Re: [PATCH V2 1/4] cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()

From: Lukasz Luba
Date: Thu Sep 24 2020 - 07:00:12 EST




On 9/24/20 11:24 AM, Rafael J. Wysocki wrote:
On Thu, Sep 24, 2020 at 11:25 AM Lukasz Luba <lukasz.luba@xxxxxxx> wrote:

Hi Rafael,

On 9/23/20 2:48 PM, Rafael J. Wysocki wrote:
On Wed, Sep 16, 2020 at 8:45 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:

In order to prepare for lock-less stats update, add support to defer any
updates to it until cpufreq_stats_record_transition() is called.

This is a bit devoid of details.

I guess you mean reset in particular, but that's not clear from the above.

Also, it would be useful to describe the design somewhat.

Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
---
drivers/cpufreq/cpufreq_stats.c | 75 ++++++++++++++++++++++++---------
1 file changed, 56 insertions(+), 19 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 94d959a8e954..3e7eee29ee86 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -22,17 +22,22 @@ struct cpufreq_stats {
spinlock_t lock;
unsigned int *freq_table;
unsigned int *trans_table;
+
+ /* Deferred reset */
+ unsigned int reset_pending;
+ unsigned long long reset_time;
};

-static void cpufreq_stats_update(struct cpufreq_stats *stats)
+static void cpufreq_stats_update(struct cpufreq_stats *stats,
+ unsigned long long time)
{
unsigned long long cur_time = get_jiffies_64();

- stats->time_in_state[stats->last_index] += cur_time - stats->last_time;
+ stats->time_in_state[stats->last_index] += cur_time - time;
stats->last_time = cur_time;
}

-static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
+static void cpufreq_stats_reset_table(struct cpufreq_stats *stats)
{
unsigned int count = stats->max_state;

@@ -41,42 +46,67 @@ static void cpufreq_stats_clear_table(struct cpufreq_stats *stats)
memset(stats->trans_table, 0, count * count * sizeof(int));
stats->last_time = get_jiffies_64();
stats->total_trans = 0;
+
+ /* Adjust for the time elapsed since reset was requested */
+ WRITE_ONCE(stats->reset_pending, 0);

What if this runs in parallel with store_reset()?

The latter may update reset_pending to 1 before the below runs.
Conversely, this may clear reset_pending right after store_reset() has
set it to 1, but before it manages to set reset_time. Is that not a
problem?

I wonder if we could just drop the reset feature. Is there a tool
which uses this file? The 'reset' sysfs would probably have to stay
forever, but an empty implementation is not an option?

Well, having an empty sysfs attr would be a bit ugly, but the
implementation of it could be simplified.

The documentation states:
'This can be useful for evaluating system behaviour under different
governors without the need for a reboot.'
With the scenario of fast-switch this resetting complicates the
implementation and the justification of having it just for experiments
avoiding reboot is IMO weak. The real production code would have to pay
extra cycles every time. Also, we would probably not experiment with
cpufreq different governors, since the SchedUtil is considered the best
option.

It would still be good to have a way to test it against the other
available options, though.


Experimenting with different governors would still be possible, just
the user-space would have to take a snapshot of the stats when switching
to a new governor. Then the values presented in the stats would just
need to be calculated in this user tool against the snapshot.

The resetting is also not that bad, since nowadays more components
maintain some kind of local statistics/history (scheduler, thermal).
I would recommend to reset the whole system and repeat the same tests
with different governor, just to be sure that everything starts from
similar state (utilization, temperature, other devfreq devices
frequencies etc).