Re: [RFC 0/6] mmc: Add clock scaling support for mmc driver
From: Ulf Hansson
Date: Wed Dec 18 2019 - 10:11:50 EST
On Fri, 29 Nov 2019 at 08:34, <rampraka@xxxxxxxxxxxxxx> wrote:
>
> Hi Ulf,
>
> Seems some setting issue with my thunderbird application.
> Sorry for spams, please ignore my last responses as unsupported
> characters got added.
>
> Typing my response again from browser and re-sending.
>
> Thanks,
> Ram
>
> On 2019-10-22 14:10, Ulf Hansson wrote:
> > On Mon, 21 Oct 2019 at 16:30, Ram Prakash Gupta
> > <rampraka@xxxxxxxxxxxxxx> wrote:
> >>
> >> This change adds the use of devfreq based clock scaling to MMC.
> >> This applicable for eMMC and SDCard.
> >> For some workloads, such as video playback, it isn't necessary
> >> for these cards to run at high speed. Running at lower
> >> frequency, in such cases can still meet the deadlines for data
> >> transfers.
> >>
> >> Scaling down the clock frequency dynamically has power savings
> >> not only because the bus is running at lower frequency but also
> >> has an advantage of scaling down the system core voltage, if
> >> supported. Provide an ondemand clock scaling support similar
> >> to the cpufreq ondemand governor having two thresholds,
> >> up_threshold and down_threshold to decide whether to increase
> >> the frequency or scale it down respectively as per load.
> >
> > This sounds simple, but what the series is doing is far more
> > complicated but scaling the bus clock, as it also re-negotiates the
> > bus speed mode.
> >
> > Each time the triggering point for scaling up/down is hit, then a
> > series of commands needs to be sent to the card, including running the
> > tuning procedure. The point is, for sure, this doesn't come for free,
> > both from a latency point of view, but also from an energy cost point
> > of view. So, whether this really improves the behaviour, seems like
> > very use case sensitive, right!?
>
> Switching modes would incur some latency for sending commands to switch
> modes, but tuning is not needed as most of the emmc devices used now a
> days are with enhanced strobe support, so tuning would not add up any
> latency as it is not required in hs400 enhanced strobe mode.
>
> This feature is implemented for video playback case, where data transfer
> request is less, where this feature helps with saving power consumption.
>
> And when there is burst of data transfer request, load will remain
> _high_
> so there won't be any switching and hence it won't affect any existing
> use cases from latency point of view.
>
> Also if hw supports to switch clk frequency without changing mode. I
> will
> make change in code. For this I have seek input from hw team.
>
> From collected data, I see this feature is helping in saving power
> consumption. And no energy penalty is observed. Please share if I am
> missing any specific. Power saving using this feature is quite good
> and considerable. Please find the data below.
>
> Use Case Delta at Battery Power Impact
> 30 fps at HD 1080p decode 20Mbps 10 mA 11%
> 30 fps at UHD 8b H.264 42 Mbps 20.93 mA 19%
>
> >
> > Overall, when it comes to use cases, we have very limited knowledge
> > about them at the mmc block layer level. I think it should remain like
> > that. If at any place at all, this information is better maintained by
> > the generic block layer and potentially the configured I/O scheduler.
>
> I think, generic block layer do not have knowledge of use case for data
> transfer request. And devfreq framework have been used to implement this
> feature, which should be same in any layer.
Just to be clear, my main concern here is not using the devfreq framework.
It's rather whether switching speed modes is really worth it, which
Doug pointed out as well.
>
> Also mobile platforms comes mostly with emmc and ufs as storage media.
> And clock scaling is already implemented in upstream ufs driver using
> devfreq framework. On similar line, this feature is implemented for mmc
> driver. So I believe, clk scaling implementation is better placed in mmc
> driver rather in generic block layer.
At some point you want to trigger the clock to be scaled up/down,
probably because of reaching some bandwidth threshold. This part seems
like a generic block thing and not an mmc specific thing.
Exactly how that would affect this approach is hard to say, but my
point is, that I don't want to sprinkle the mmc subsystem with code
that may belong in upper common layers.
>
> >
> > This brings me to a question about the tests you have you run. Can you
> > share some information and data about that?
>
> Test case used are 1080p and 4k video playback use case. As this feature
> is implemented specifically for video playback use case.
Right.
It seems to me, that you are optimizing for only one particular use
case. How do you make sure to not increase energy consumption, for
other use cases that would gain from running at the highest speed mode
and "race to idle"?
I think you need to take a step back and think more general about this
approach. More importantly, you need to provide more data to prove
your approach, also according to suggestions from Dough.
> >
> >>
> >>
> >> Ram Prakash Gupta (6):
> >> mmc: core: Parse clk scaling dt entries
> >> mmc: core: Add core scaling support in driver
> >> mmc: core: Initialize clk scaling for mmc and SDCard
> >> mmc: core: Add debugfs entries for scaling support
> >> mmc: sdhci-msm: Add capability in platfrom host
> >> dt-bindings: mmc: sdhci-msm: Add clk scaling dt parameters
> >>
> >> .../devicetree/bindings/mmc/sdhci-msm.txt | 19 +
> >
> > I noticed that the DT patch was put last in the series, but the
> > parsing is implemented in the first patch. Please flip this around. If
> > you want to implement DT parsing of new bindings, please make sure to
> > discuss the new bindings first.
>
> I will update in next post.
>
> >
> >> drivers/mmc/core/block.c | 19 +-
> >> drivers/mmc/core/core.c | 777
> >> +++++++++++++++++++++
> >> drivers/mmc/core/core.h | 17 +
> >> drivers/mmc/core/debugfs.c | 90 +++
> >> drivers/mmc/core/host.c | 226 ++++++
> >> drivers/mmc/core/mmc.c | 246 ++++++-
> >> drivers/mmc/core/queue.c | 2 +
> >> drivers/mmc/core/sd.c | 84 ++-
> >> drivers/mmc/host/sdhci-msm.c | 2 +
> >> include/linux/mmc/card.h | 7 +
> >> include/linux/mmc/host.h | 66 ++
> >> 12 files changed, 1550 insertions(+), 5 deletions(-)
> >
> > This is a lot of new code in the mmc core, which I would then need to
> > maintain, of course. I have to admit, I am a bit concerned about that,
> > so you have to convince me that there are good reasons for me to apply
> > this.
> >
> > As I stated above, I think the approach looks quite questionable in
> > general. And even if you can share measurement, that it improves the
> > behaviour, I suspect (without a deeper code review) that some of the
> > code better belongs in common block device layer.
>
> From the collected power data, I see this as good reason to have this
> feature in mmc driver as number is quite considerable.
>
> For approach, it would be helpful if you share your inputs regarding
> this
> approach. And as you have stated, this can be further discussed after a
> review from you.
Besides my comments above, Doug has provided you with valuable
feedback. Please follow up on that as well, if you still intend to
pursue with this approach.
Kind regards
Uffe