Re: [PATCH v2 2/3] PM / devfreq: tegra: add devfreq driver for Tegra Activity Monitor
From: Tomeu Vizoso
Date: Tue Dec 02 2014 - 09:49:36 EST
On 26 November 2014 at 11:02, Alexandre Courbot <gnurou@xxxxxxxxx> wrote:
> On Mon, Nov 24, 2014 at 9:28 PM, Tomeu Vizoso
> <tomeu.vizoso@xxxxxxxxxxxxx> wrote:
>> The ACTMON block can monitor several counters, providing averaging and firing
>> interrupts based on watermarking configuration. This implementation monitors
>> the MCALL and MCCPU counters to choose an appropriate frequency for the
>> external memory clock.
>>
>> This patch is based on work by Alex Frid <afrid@xxxxxxxxxx> and Mikko
>> Perttunen <mikko.perttunen@xxxxxxxx>.
>
> Thanks for taking the time to adapt this driver to use devfreq.
> Looking at it, I am more and more convinced that's the correct way to
> do.
>
> I made some comments inline, but I'd like to bring Arto Merilainen
> into this discussion. Arto thought (and might even have some code)
> about adding watermarks support to devfreq's core and using a generic
> "watermark" governor which I believe would greatly benefit this patch
> set.
Do you know of any other SoC family that would be able to use that code?
> Arto, do you have some concrete code you could submit here? If
> you lack the time for doing so, some guidance so we could implement
> this support ourselves would be great.
>
>>
>> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@xxxxxxxxxxxxx>
>>
>> ---
>>
>> v2: * Use devfreq
>> ---
>> drivers/devfreq/Kconfig | 10 +
>> drivers/devfreq/Makefile | 1 +
>> drivers/devfreq/tegra-devfreq.c | 718 ++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 729 insertions(+)
>> create mode 100644 drivers/devfreq/tegra-devfreq.c
>>
>> diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
>> index faf4e70..4aab799 100644
>> --- a/drivers/devfreq/Kconfig
>> +++ b/drivers/devfreq/Kconfig
>> @@ -87,4 +87,14 @@ config ARM_EXYNOS5_BUS_DEVFREQ
>> It reads PPMU counters of memory controllers and adjusts the
>> operating frequencies and voltages with OPP support.
>>
>> +config ARM_TEGRA_DEVFREQ
>> + tristate "Tegra DEVFREQ Driver"
>> + depends on ARCH_TEGRA_124_SOC
>> + select DEVFREQ_GOV_SIMPLE_ONDEMAND
>> + select PM_OPP
>> + help
>> + This adds the DEVFREQ driver for the Tegra family of SoCs.
>> + It reads ACTMON counters of memory controllers and adjusts the
>> + operating frequencies and voltages with OPP support.
>> +
>> endif # PM_DEVFREQ
>> diff --git a/drivers/devfreq/Makefile b/drivers/devfreq/Makefile
>> index 16138c9..0ea991f 100644
>> --- a/drivers/devfreq/Makefile
>> +++ b/drivers/devfreq/Makefile
>> @@ -7,3 +7,4 @@ obj-$(CONFIG_DEVFREQ_GOV_USERSPACE) += governor_userspace.o
>> # DEVFREQ Drivers
>> obj-$(CONFIG_ARM_EXYNOS4_BUS_DEVFREQ) += exynos/
>> obj-$(CONFIG_ARM_EXYNOS5_BUS_DEVFREQ) += exynos/
>> +obj-$(CONFIG_ARM_TEGRA_DEVFREQ) += tegra-devfreq.o
>> diff --git a/drivers/devfreq/tegra-devfreq.c b/drivers/devfreq/tegra-devfreq.c
>> new file mode 100644
>> index 0000000..3479096
>> --- /dev/null
>> +++ b/drivers/devfreq/tegra-devfreq.c
>
> This file should probably be named tegra-actmon-devfreq.c, for nothing
> guarantees that we will not have more devfreq devices for Tegra in the
> future.
Good point.
>> @@ -0,0 +1,718 @@
>> +/*
>> + * A devfreq driver for NVIDIA Tegra SoCs
>> + *
>> + * Copyright (c) 2014 NVIDIA CORPORATION. All rights reserved.
>> + * Copyright (C) 2014 Google, Inc
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + *
>> + */
>> +
>> +#include <linux/clk.h>
>> +#include <linux/cpufreq.h>
>> +#include <linux/devfreq.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/io.h>
>> +#include <linux/module.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/pm_opp.h>
>> +#include <linux/reset.h>
>> +
>> +#include "governor.h"
>> +
>> +#define ACTMON_GLB_STATUS 0x0
>> +#define ACTMON_GLB_PERIOD_CTRL 0x4
>> +
>> +#define ACTMON_DEV_CTRL 0x0
>> +#define ACTMON_DEV_CTRL_K_VAL_SHIFT 10
>> +#define ACTMON_DEV_CTRL_ENB_PERIODIC BIT(18)
>> +#define ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN BIT(20)
>> +#define ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN BIT(21)
>> +#define ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_NUM_SHIFT 23
>> +#define ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_NUM_SHIFT 26
>> +#define ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN BIT(29)
>> +#define ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN BIT(30)
>> +#define ACTMON_DEV_CTRL_ENB BIT(31)
>> +
>> +#define ACTMON_DEV_UPPER_WMARK 0x4
>> +#define ACTMON_DEV_LOWER_WMARK 0x8
>> +#define ACTMON_DEV_INIT_AVG 0xc
>> +#define ACTMON_DEV_AVG_UPPER_WMARK 0x10
>> +#define ACTMON_DEV_AVG_LOWER_WMARK 0x14
>> +#define ACTMON_DEV_COUNT_WEIGHT 0x18
>> +#define ACTMON_DEV_AVG_COUNT 0x20
>> +#define ACTMON_DEV_INTR_STATUS 0x24
>> +
>> +#define ACTMON_INTR_STATUS_CLEAR 0xffffffff
>> +
>> +#define ACTMON_DEV_INTR_CONSECUTIVE_UPPER BIT(31)
>> +#define ACTMON_DEV_INTR_CONSECUTIVE_LOWER BIT(30)
>> +
>> +#define ACTMON_ABOVE_WMARK_WINDOW 1
>> +#define ACTMON_BELOW_WMARK_WINDOW 3
>> +#define ACTMON_BOOST_FREQ_STEP 16000
>> +
>> +/* activity counter is incremented every 256 memory transactions, and each
>> + * transaction takes 4 EMC clocks for Tegra124; So the COUNT_WEIGHT is
>> + * 4 * 256 = 1024.
>> + */
>> +#define ACTMON_COUNT_WEIGHT 0x400
>> +
>> +/*
>> + * ACTMON_AVERAGE_WINDOW_LOG2: default value for @DEV_CTRL_K_VAL, which
>> + * translates to 2 ^ (K_VAL + 1). ex: 2 ^ (6 + 1) = 128
>> + */
>> +#define ACTMON_AVERAGE_WINDOW_LOG2 6
>> +#define ACTMON_SAMPLING_PERIOD 12 /* ms */
>> +#define ACTMON_DEFAULT_AVG_BAND 6 /* 1/10 of % */
>> +
>> +#define KHZ 1000
>> +
>> +/* Assume that the bus is saturated if the utilization is 25% */
>> +#define BUS_SATURATION_RATIO 25
>> +
>> +/**
>> + * struct tegra_devfreq_device_config - configuration specific to an ACTMON
>> + * device
>> + *
>> + * Coefficients and thresholds are in %
>
> Short comments about what the members are for would be perfect (same
> for other structures). It is not immediatly obvious what e.g.
> avg_dependency_threshold is for.
Fair enough.
>> + */
>> +struct tegra_devfreq_device_config {
>> + u32 offset;
>> + u32 irq_mask;
>> +
>> + unsigned int boost_up_coeff;
>> + unsigned int boost_down_coeff;
>> + unsigned int boost_up_threshold;
>> + unsigned int boost_down_threshold;
>> + u32 avg_dependency_threshold;
>> +};
>> +
>> +enum tegra_actmon_device {
>> + MCALL = 0,
>> + MCCPU,
>> +};
>> +
>> +static struct tegra_devfreq_device_config actmon_device_configs[] = {
>> + {
>> + /* MCALL */
>> + .offset = 0x1c0,
>> + .irq_mask = 1 << 26,
>> + .boost_up_coeff = 200,
>> + .boost_down_coeff = 50,
>> + .boost_up_threshold = 60,
>> + .boost_down_threshold = 40,
>> + },
>> + {
>> + /* MCCPU */
>> + .offset = 0x200,
>> + .irq_mask = 1 << 25,
>> + .boost_up_coeff = 800,
>> + .boost_down_coeff = 90,
>> + .boost_up_threshold = 27,
>> + .boost_down_threshold = 10,
>> + .avg_dependency_threshold = 50000,
>
> The boost_* and avg_dependency_threshold parameters are calling to be
> exported through debugfs in some future patch. :)
>
> Just for my education, why do we need to monitor both MCALL and MCCPU,
> since MCCPU accesses are included in MCALL? Do we want to boost things
> differently in case of CPU activity?
Yeah, I started by using only MCALL and found that performance
decreased substantially in scenarios with high CPU loads. Once I took
into account CPU frequency and the MCCPU counters as per downstream,
performance improved greatly.
I cannot tell why MCALL isn't enough though, maybe you can find a
fellow NVIDIAn that knows about it? Couldn't find anything in the
public TRM about it.
>> + },
>> +};
>> +
>> +/**
>> + * struct tegra_devfreq_device - state specific to an ACTMON device
>> + *
>> + * Frequencies are in kHz.
>> + */
>> +struct tegra_devfreq_device {
>> + const struct tegra_devfreq_device_config *config;
>> +
>> + void __iomem *regs;
>> + u32 avg_band_freq;
>> + u32 avg_count;
>> +
>> + unsigned long target_freq;
>> + unsigned long boost_freq;
>> +};
>> +
>> +struct tegra_devfreq {
>> + struct devfreq *devfreq;
>> +
>> + struct platform_device *pdev;
>> + struct reset_control *reset;
>> + struct clk *clock;
>> + void __iomem *regs;
>> +
>> + spinlock_t lock;
>> +
>> + struct clk *emc_clock;
>> + unsigned long max_freq;
>> + unsigned long cur_freq;
>> + struct notifier_block rate_change_nb;
>> +
>> + struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)];
>> +};
>> +
>> +struct tegra_actmon_emc_ratio {
>> + unsigned long cpu_freq;
>> + unsigned long emc_freq;
>> +};
>> +
>> +static struct tegra_actmon_emc_ratio actmon_emc_ratios[] = {
>> + { 1400000, ULONG_MAX },
>> + { 1200000, 750000 },
>> + { 1100000, 600000 },
>> + { 1000000, 500000 },
>> + { 800000, 375000 },
>> + { 500000, 200000 },
>> + { 250000, 100000 },
>
> Aren't these numbers board-specific? Should they be moved to the DT maybe?
I would expect it to be common to all boards with the same SoC.
>> +};
>> +
>> +static unsigned long do_percent(unsigned long val, unsigned int pct)
>> +{
>> + return val * pct / 100;
>> +}
>> +
>> +static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq_device *dev)
>> +{
>> + u32 avg = dev->avg_count;
>> + u32 band = dev->avg_band_freq * ACTMON_SAMPLING_PERIOD;
>> +
>> + writel(avg + band, dev->regs + ACTMON_DEV_AVG_UPPER_WMARK);
>
> Having actmon_readl() and actmon_writel() inline functions that make
> the "dev->regs +" unnecessary would be nicer to read and less
> error-prone IMHO.
Will try that.
>> + avg = max(avg, band);
>> + writel(avg - band, dev->regs + ACTMON_DEV_AVG_LOWER_WMARK);
>> +}
>> +
>> +static void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra,
>> + struct tegra_devfreq_device *dev)
>> +{
>> + u32 val = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
>> +
>> + writel(do_percent(val, dev->config->boost_up_threshold),
>> + dev->regs + ACTMON_DEV_UPPER_WMARK);
>> +
>> + writel(do_percent(val, dev->config->boost_down_threshold),
>> + dev->regs + ACTMON_DEV_LOWER_WMARK);
>> +}
>> +
>> +static void actmon_write_barrier(struct tegra_devfreq *tegra)
>> +{
>> + /* ensure the update has reached the ACTMON */
>> + wmb();
>> + readl(tegra->regs + ACTMON_GLB_STATUS);
>> +}
>> +
>> +static irqreturn_t actmon_isr(int irq, void *data)
>> +{
>> + struct tegra_devfreq *tegra = data;
>> + struct tegra_devfreq_device *dev = NULL;
>> + unsigned long flags;
>> + u32 val;
>> + unsigned int i;
>> +
>> + val = readl(tegra->regs + ACTMON_GLB_STATUS);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>> + if (val & tegra->devices[i].config->irq_mask) {
>> + dev = tegra->devices + i;
>> + break;
>> + }
>> + }
>> +
>> + if (!dev)
>> + return IRQ_NONE;
>> +
>> + spin_lock_irqsave(&tegra->lock, flags);
>> +
>> + dev->avg_count = readl(dev->regs + ACTMON_DEV_AVG_COUNT);
>> + tegra_devfreq_update_avg_wmark(dev);
>> +
>> + val = readl(dev->regs + ACTMON_DEV_INTR_STATUS);
>> + if (val & ACTMON_DEV_INTR_CONSECUTIVE_UPPER) {
>> + val = readl(dev->regs + ACTMON_DEV_CTRL) |
>> + ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN |
>> + ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>> +
>> + /*
>> + * new_boost = min(old_boost * up_coef + step, max_freq)
>> + */
>> + dev->boost_freq = do_percent(dev->boost_freq,
>> + dev->config->boost_up_coeff);
>> + dev->boost_freq += ACTMON_BOOST_FREQ_STEP;
>> + if (dev->boost_freq >= tegra->max_freq) {
>> + dev->boost_freq = tegra->max_freq;
>> + val &= ~ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN;
>
> Instead of setting this flag to possibly remove it later, maybe set it
> only if this condition is not met?
That sounds better indeed, thanks.
>> + }
>> + writel(val, dev->regs + ACTMON_DEV_CTRL);
>> + } else if (val & ACTMON_DEV_INTR_CONSECUTIVE_LOWER) {
>> + val = readl(dev->regs + ACTMON_DEV_CTRL) |
>> + ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN |
>> + ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>> +
>> + /*
>> + * new_boost = old_boost * down_coef
>> + * or 0 if (old_boost * down_coef < step / 2)
>> + */
>> + dev->boost_freq = do_percent(dev->boost_freq,
>> + dev->config->boost_down_coeff);
>> + if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >> 1)) {
>> + dev->boost_freq = 0;
>> + val &= ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>
> Same here.
>
>> + }
>> + writel(val, dev->regs + ACTMON_DEV_CTRL);
>> + }
>> +
>> + if (dev->config->avg_dependency_threshold) {
>> + val = readl(dev->regs + ACTMON_DEV_CTRL);
>> + if (dev->avg_count >= dev->config->avg_dependency_threshold)
>> + val |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>> + else if (dev->boost_freq == 0)
>> + val &= ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN;
>> + writel(val, dev->regs + ACTMON_DEV_CTRL);
>> + }
>> +
>> + writel(ACTMON_INTR_STATUS_CLEAR, dev->regs + ACTMON_DEV_INTR_STATUS);
>> +
>> + actmon_write_barrier(tegra);
>> +
>> + spin_unlock_irqrestore(&tegra->lock, flags);
>> +
>> + return IRQ_WAKE_THREAD;
>> +}
>> +
>> +static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq *tegra,
>> + unsigned long cpu_freq)
>> +{
>> + unsigned int i;
>> + struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios;
>> +
>> + for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++, ratio++) {
>> + if (cpu_freq >= ratio->cpu_freq) {
>> + if (ratio->emc_freq >= tegra->max_freq)
>> + return tegra->max_freq;
>> + else
>> + return ratio->emc_freq;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void actmon_update_target(struct tegra_devfreq *tegra,
>> + struct tegra_devfreq_device *dev)
>> +{
>> + unsigned long cpu_freq = 0;
>> + unsigned long static_cpu_emc_freq = 0;
>> + unsigned int avg_sustain_coef;
>> + unsigned long flags;
>> +
>> + if (dev->config->avg_dependency_threshold) {
>> + cpu_freq = cpufreq_get(0);
>> + static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq);
>> + }
>> +
>> + spin_lock_irqsave(&tegra->lock, flags);
>> +
>> + dev->target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD;
>> + avg_sustain_coef = 100 * 100 / dev->config->boost_up_threshold;
>> + dev->target_freq = do_percent(dev->target_freq, avg_sustain_coef);
>> + dev->target_freq += dev->boost_freq;
>> +
>> + if (dev->avg_count >= dev->config->avg_dependency_threshold)
>> + dev->target_freq = max(dev->target_freq, static_cpu_emc_freq);
>> +
>> + spin_unlock_irqrestore(&tegra->lock, flags);
>> +}
>> +
>> +static irqreturn_t actmon_thread_isr(int irq, void *data)
>> +{
>> + struct tegra_devfreq *tegra = data;
>> +
>> + mutex_lock(&tegra->devfreq->lock);
>> + update_devfreq(tegra->devfreq);
>> + mutex_unlock(&tegra->devfreq->lock);
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static int tegra_actmon_rate_notify_cb(struct notifier_block *nb,
>> + unsigned long action, void *ptr)
>> +{
>> + struct clk_notifier_data *data = ptr;
>> + struct tegra_devfreq *tegra = container_of(nb, struct tegra_devfreq,
>> + rate_change_nb);
>> + unsigned int i;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&tegra->lock, flags);
>> +
>> + switch (action) {
>> + case POST_RATE_CHANGE:
>> + tegra->cur_freq = data->new_rate / KHZ;
>> +
>> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++)
>> + tegra_devfreq_update_wmark(tegra, tegra->devices + i);
>> +
>> + actmon_write_barrier(tegra);
>> + break;
>> + case PRE_RATE_CHANGE:
>> + /* fall through */
>> + case ABORT_RATE_CHANGE:
>
> Can't you use "default:" here instead?
Yeah, this needs simplifying.
>> + break;
>> + };
>> +
>> + spin_unlock_irqrestore(&tegra->lock, flags);
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static void tegra_actmon_configure_device(struct tegra_devfreq *tegra,
>> + struct tegra_devfreq_device *dev)
>> +{
>> + u32 val;
>> +
>> + dev->avg_band_freq = tegra->max_freq * ACTMON_DEFAULT_AVG_BAND / KHZ;
>> + dev->target_freq = tegra->cur_freq;
>> +
>> + dev->avg_count = tegra->cur_freq * ACTMON_SAMPLING_PERIOD;
>> + writel(dev->avg_count, dev->regs + ACTMON_DEV_INIT_AVG);
>> +
>> + tegra_devfreq_update_avg_wmark(dev);
>> + tegra_devfreq_update_wmark(tegra, dev);
>> +
>> + writel(ACTMON_COUNT_WEIGHT, dev->regs + ACTMON_DEV_COUNT_WEIGHT);
>> + writel(ACTMON_INTR_STATUS_CLEAR, dev->regs + ACTMON_DEV_INTR_STATUS);
>> +
>> + val = 0;
>> + val |= ACTMON_DEV_CTRL_ENB_PERIODIC |
>> + ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN |
>> + ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN;
>> + val |= (ACTMON_AVERAGE_WINDOW_LOG2 - 1)
>> + << ACTMON_DEV_CTRL_K_VAL_SHIFT;
>> + val |= (ACTMON_BELOW_WMARK_WINDOW - 1)
>> + << ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_NUM_SHIFT;
>> + val |= (ACTMON_ABOVE_WMARK_WINDOW - 1)
>> + << ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_NUM_SHIFT;
>> + val |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN |
>> + ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN;
>> +
>> + writel(val, dev->regs + ACTMON_DEV_CTRL);
>> +
>> + actmon_write_barrier(tegra);
>> +
>> + val = readl(dev->regs + ACTMON_DEV_CTRL);
>> + val |= ACTMON_DEV_CTRL_ENB;
>> + writel(val, dev->regs + ACTMON_DEV_CTRL);
>> +
>> + actmon_write_barrier(tegra);
>> +}
>> +
>> +static int tegra_devfreq_suspend(struct device *dev)
>> +{
>> + struct platform_device *pdev;
>> + struct tegra_devfreq *tegra;
>> + struct tegra_devfreq_device *actmon_dev;
>> + unsigned int i;
>> + u32 val;
>> +
>> + pdev = container_of(dev, struct platform_device, dev);
>> + tegra = platform_get_drvdata(pdev);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>> + actmon_dev = &tegra->devices[i];
>> +
>> + val = readl(actmon_dev->regs + ACTMON_DEV_CTRL);
>> + val &= ~ACTMON_DEV_CTRL_ENB;
>> + writel(val, actmon_dev->regs + ACTMON_DEV_CTRL);
>> +
>> + writel(ACTMON_INTR_STATUS_CLEAR,
>> + actmon_dev->regs + ACTMON_DEV_INTR_STATUS);
>> +
>> + actmon_write_barrier(tegra);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_resume(struct device *dev)
>> +{
>> + struct platform_device *pdev;
>> + struct tegra_devfreq *tegra;
>> + struct tegra_devfreq_device *actmon_dev;
>> + unsigned int i;
>> +
>> + pdev = container_of(dev, struct platform_device, dev);
>> + tegra = platform_get_drvdata(pdev);
>> +
>> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>> + actmon_dev = &tegra->devices[i];
>> +
>> + tegra_actmon_configure_device(tegra, actmon_dev);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_target(struct device *dev, unsigned long *freq,
>> + u32 flags)
>> +{
>> + struct platform_device *pdev;
>> + struct tegra_devfreq *tegra;
>> + struct dev_pm_opp *opp;
>> + unsigned long rate = *freq * KHZ;
>> +
>> + pdev = container_of(dev, struct platform_device, dev);
>> + tegra = platform_get_drvdata(pdev);
>> +
>> + rcu_read_lock();
>> + opp = devfreq_recommended_opp(dev, &rate, flags);
>> + if (IS_ERR(opp)) {
>> + rcu_read_unlock();
>> + dev_err(dev, "Failed to find opp for %lu KHz\n", *freq);
>> + return PTR_ERR(opp);
>> + }
>> + rate = dev_pm_opp_get_freq(opp);
>> + rcu_read_unlock();
>> +
>> + /* TODO: Once we have per-user clk constraints, set a floor */
>> + clk_set_rate(tegra->emc_clock, rate);
>> +
>> + /* TODO: Set voltage as well */
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_get_dev_status(struct device *dev,
>> + struct devfreq_dev_status *stat)
>> +{
>> + struct platform_device *pdev;
>> + struct tegra_devfreq *tegra;
>> + struct tegra_devfreq_device *actmon_dev;
>> +
>> + pdev = container_of(dev, struct platform_device, dev);
>> + tegra = platform_get_drvdata(pdev);
>> +
>> + stat->current_frequency = tegra->cur_freq;
>> +
>> + /* To be used by the tegra governor */
>> + stat->private_data = tegra;
>> +
>> + /* The below are to be used by the other governors */
>> +
>> + actmon_dev = &tegra->devices[MCALL];
>> +
>> + /* Number of cycles spent on memory access */
>> + stat->busy_time = actmon_dev->avg_count;
>> +
>> + /* The bus can be considered to be saturated way before 100% */
>> + stat->busy_time *= 100 / BUS_SATURATION_RATIO;
>> +
>> + /* Number of cycles in a sampling period */
>> + stat->total_time = ACTMON_SAMPLING_PERIOD * tegra->cur_freq;
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_get_target(struct devfreq *devfreq,
>> + unsigned long *freq)
>> +{
>> + struct devfreq_dev_status stat;
>> + struct tegra_devfreq *tegra;
>> + struct tegra_devfreq_device *dev;
>> + unsigned long target_freq = 0;
>> + unsigned int i;
>> + int err;
>> +
>> + err = devfreq->profile->get_dev_status(devfreq->dev.parent, &stat);
>> + if (err)
>> + return err;
>> +
>> + tegra = stat.private_data;
>> +
>> + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) {
>> + dev = &tegra->devices[i];
>> +
>> + actmon_update_target(tegra, dev);
>> +
>> + target_freq = max(target_freq, dev->target_freq);
>> + }
>
> I *think* you should be able to remove the dependency on
> tegra_devfreq_device from this function (ideally, governors should be
> device-agnostic). Since you call get_dev_status() from this function
> anyway, why not moving the actmon_update_target() calls into
> tegra_devfreq_get_dev_status(), and write busy_time/total_time values
> that will make tegra_devfreq_get_target() infer the right frequency?
>
>> +
>> + *freq = target_freq;
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_event_handler(struct devfreq *devfreq,
>> + unsigned int event, void *data)
>> +{
>> + return 0;
>> +}
>
> I think you will need an implementation of this. Upon
> DEVFREQ_GOV_START/DEVFREQ_GOV_RESUME, enable actmon interrupts. On
> DEVFREQ_GOV_STOP/DEVFREQ_GOV_SUSPEND, disable interrupts. I know you
> are doing it somewhere, but it should be done here as this is what
> devfreq expects (e.g. to correctly stop interrupts if you switch from
> the "tegra" to the "performance" governor).
>
> Sadly, this adds the exact same dependency on tegra_devfreq that I
> tried to remove from the previous function, which is unfortunate. This
> is where watermarking support in devfreq would be useful - when
> starting the governor, we could ask the device to monitor some low and
> high marks through new hooks and handle all this nicely.
Wonder if the devfreq maintainers have an opinion on this.
>> +
>> +static struct devfreq_governor tegra_devfreq_governor = {
>> + .name = "tegra",
>
> This governor could certainly get a more specific name. :P
Fair.
>> + .get_target_freq = tegra_devfreq_get_target,
>> + .event_handler = tegra_devfreq_event_handler,
>> +};
>> +
>> +static struct devfreq_dev_profile tegra_devfreq_profile = {
>> + .polling_ms = 0,
>> + .target = tegra_devfreq_target,
>> + .get_dev_status = tegra_devfreq_get_dev_status,
>> +};
>
> Maybe move this declaration right after the declaration of
> tegra_devfreq_get_dev_status() to make things easier to put together
> in the reader's mind.
Agreed.
>> +
>> +static int tegra_devfreq_probe(struct platform_device *pdev)
>> +{
>> + struct tegra_devfreq *tegra;
>> + struct tegra_devfreq_device *dev;
>> + struct resource *res;
>> + unsigned long max_freq;
>> + unsigned int i;
>> + int irq;
>> + int err;
>> +
>> + tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL);
>> + if (!tegra)
>> + return -ENOMEM;
>> +
>> + spin_lock_init(&tegra->lock);
>> +
>> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> + if (!res) {
>> + dev_err(&pdev->dev, "Failed to get regs resource\n");
>> + return -ENODEV;
>> + }
>> +
>> + tegra->regs = devm_ioremap_resource(&pdev->dev, res);
>> + if (IS_ERR(tegra->regs)) {
>> + dev_err(&pdev->dev, "Failed to get IO memory\n");
>> + return PTR_ERR(tegra->regs);
>> + }
>> +
>> + tegra->reset = devm_reset_control_get(&pdev->dev, "actmon");
>> + if (IS_ERR(tegra->reset)) {
>> + dev_err(&pdev->dev, "Failed to get reset\n");
>> + return PTR_ERR(tegra->reset);
>> + }
>> +
>> + tegra->clock = devm_clk_get(&pdev->dev, "actmon");
>> + if (IS_ERR(tegra->clock)) {
>> + dev_err(&pdev->dev, "Failed to get actmon clock\n");
>> + return PTR_ERR(tegra->clock);
>> + }
>> +
>> + tegra->emc_clock = devm_clk_get(&pdev->dev, "emc");
>> + if (IS_ERR(tegra->emc_clock)) {
>> + dev_err(&pdev->dev, "Failed to get emc clock\n");
>> + return PTR_ERR(tegra->emc_clock);
>> + }
>> +
>> + err = of_init_opp_table(&pdev->dev);
>> + if (err) {
>> + dev_err(&pdev->dev, "Failed to init operating point table\n");
>> + return err;
>> + }
>> +
>> + tegra->rate_change_nb.notifier_call = tegra_actmon_rate_notify_cb;
>> + err = clk_notifier_register(tegra->emc_clock, &tegra->rate_change_nb);
>> + if (err) {
>> + dev_err(&pdev->dev,
>> + "Failed to register rate change notifier\n");
>> + return err;
>> + }
>> +
>> + reset_control_assert(tegra->reset);
>> +
>> + err = clk_prepare_enable(tegra->clock);
>> + if (err) {
>> + reset_control_deassert(tegra->reset);
>> + return err;
>> + }
>> +
>> + reset_control_deassert(tegra->reset);
>> +
>> + max_freq = clk_round_rate(tegra->emc_clock, ULONG_MAX);
>> + tegra->max_freq = max_freq / KHZ;
>> +
>> + clk_set_rate(tegra->emc_clock, max_freq);
>> +
>> + tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ;
>> +
>> + writel(ACTMON_SAMPLING_PERIOD - 1,
>> + tegra->regs + ACTMON_GLB_PERIOD_CTRL);
>> +
>> + for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) {
>> + dev = tegra->devices + i;
>> + dev->config = actmon_device_configs + i;
>> + dev->regs = tegra->regs + dev->config->offset;
>> +
>> + tegra_actmon_configure_device(tegra, tegra->devices + i);
>> + }
>> +
>> + err = devfreq_add_governor(&tegra_devfreq_governor);
>
> This should probably be done in an initcall (that would be executed
> before probe() is called). We can in theory have several instances of
> ACTMON, but the governor should only be added once in all cases. I
> understand that this is unlikely to happen in practice though.
Will look at this.
>> + if (err) {
>> + dev_err(&pdev->dev, "Failed to add governor\n");
>> + return err;
>> + }
>> +
>> + tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock);
>> + tegra->devfreq = devm_devfreq_add_device(&pdev->dev,
>> + &tegra_devfreq_profile,
>> + "tegra",
>
> I think "tegra_actmon" would look better here. :)
Agreed.
>> + NULL);
>> +
>> + irq = platform_get_irq(pdev, 0);
>> + err = devm_request_threaded_irq(&pdev->dev, irq, actmon_isr,
>> + actmon_thread_isr, IRQF_SHARED,
>> + "tegra-devfreq", tegra);
>
> You probably ought to do this *before* adding the devfreq device,
> otherwise you are at risk of receiving interrupts nobody will handle.
I don't get this one. The IRQ is requested as soon as everything is in
place to handle it.
>> + if (err) {
>> + dev_err(&pdev->dev, "Interrupt request failed\n");
>> + return err;
>> + }
>> +
>> + platform_set_drvdata(pdev, tegra);
>
> This should also be done before the call to devfreq_add_device().
Shouldn't this be done between devfreq_add_device and
devm_request_threaded_irq? Guess I'm missing something.
>> +
>> + return 0;
>> +}
>> +
>> +static int tegra_devfreq_remove(struct platform_device *pdev)
>> +{
>> + struct tegra_devfreq *tegra = platform_get_drvdata(pdev);
>> +
>> + clk_notifier_unregister(tegra->emc_clock, &tegra->rate_change_nb);
>> +
>> + clk_disable_unprepare(tegra->clock);
>
> There seems to be a lot missing in this function, like removing the
> devfreq device?
Those resources are managed, so devm should take care of releasing them.
Thanks,
Tomeu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/