Re: [PATCH v5] dma-buf: Add DmaBufTotal counter in meminfo

From: Peter.Enderborg
Date: Thu Apr 22 2021 - 10:09:43 EST


On 4/22/21 10:06 AM, Mike Rapoport wrote:
> On Wed, Apr 21, 2021 at 05:35:57PM +0000, Peter.Enderborg@xxxxxxxx wrote:
>> On 4/21/21 5:31 PM, Mike Rapoport wrote:
>>> On Wed, Apr 21, 2021 at 10:37:11AM +0000, Peter.Enderborg@xxxxxxxx wrote:
>>>> On 4/21/21 11:15 AM, Daniel Vetter wrote:
>>>>> We need to understand what the "correct" value is. Not in terms of kernel
>>>>> code, but in terms of semantics. Like if userspace allocates a GL texture,
>>>>> is this supposed to show up in your metric or not. Stuff like that.
>>>> That it like that would like to only one pointer type. You need to know what
>>>>
>>>> you pointing at to know what it is. it might be a hardware or a other pointer.
>>>>
>>>> If there is a limitation on your pointers it is a good metric to count them
>>>> even if you don't  know what they are. Same goes for dma-buf, they
>>>> are generic, but they consume some resources that are counted in pages.
>>>>
>>>> It would be very good if there a sub division where you could measure
>>>> all possible types separately.  We have the detailed in debugfs, but nothing
>>>> for the user. A summary in meminfo seems to be the best place for such
>>>> metric.
>>>
>>> Let me try to summarize my understanding of the problem, maybe it'll help
>>> others as well.
>> Thanks!
>>
>>
>>> A device driver allocates memory and exports this memory via dma-buf so
>>> that this memory will be accessible for userspace via a file descriptor.
>>>
>>> The allocated memory can be either allocated with alloc_page() from system
>>> RAM or by other means from dedicated VRAM (that is not managed by Linux mm)
>>> or even from on-device memory.
>>>
>>> The dma-buf driver tracks the amount of the memory it was requested to
>>> export and the size it sees is available at debugfs and fdinfo.
>>>
>>> The debugfs is not available to user and maybe entirely disabled in
>>> production systems.
>>>
>>> There could be quite a few open dma-bufs so there is no overall summary,
>>> plus fdinfo in production systems your refer to is also unavailable to the
>>> user because of selinux policy.
>>>
>>> And there are a few details that are not clear to me:
>>>
>>> * Since DRM device drivers seem to be the major user of dma-buf exports,
>>> why cannot we add information about their memory consumption to, say,
>>> /sys/class/graphics/drm/cardX/memory-usage?
>> Android is using it for binder that connect more or less everything
>> internally.
>
> Ok, then it rules out /sys/class/graphics indeed.
>
>>> * How exactly user generates reports that would include the new counters?
>>> From my (mostly outdated) experience Android users won't open a terminal
>>> and type 'cat /proc/meminfo' there. I'd presume there is a vendor agent
>>> that collects the data and sends it for analysis. In this case what is
>>> the reason the vendor is unable to adjust selinix policy so that the
>>> agent will be able to access fdinfo?
>> When you turn on developer mode on android you can use
>> usb with a program called adb. And there you get a normal shell.
>>
>> (not root though)
>>
>> There is applications that non developers can use to get
>> information. It is very limited though and there are API's
>> provide it.
>>
>>
>>> * And, as others already mentioned, it is not clear what are the problems
>>> that can be detected by examining DmaBufTotal except saying "oh, there is
>>> too much/too little memory exported via dma-buf". What would be user
>>> visible effects of these problems? What are the next steps to investigate
>>> them? What other data will be probably required to identify root cause?
>>>
>> When you debug thousands of devices it is quite nice to have
>> ways to classify what the problem it is not. The normal user does not
>> see anything of this. However they can generate bug-reports that
>> collect information about as much they can. Then the user have
>> to provide this bug-report to the manufacture or mostly the
>> application developer. And when the problem is
>> system related we need to reproduce the issue on a full
>> debug enabled unit.
> So the flow is like this:
>
> * a user has a problem and reports it to an application developer; at best
> the user runs simple and limited app to collect some data
> * if the application developer considers this issue as a system related
> they can open adb and collect some more information about the system
> using non-root shell with selinux policy restrictions and send this
> information to the device manufacturer.
> * the manufacturer continues to debug the issue and at this point as much
> information is possible would have been useful.
>
> In this flow I still fail to understand why the manufacturer cannot provide
> userspace tools that will be able to collect the required information.
> These tools not necessarily need to target the end user, they may be only
> intended for the application developers, e.g. policy could allow such tool
> to access some of the system data only when the system is in developer
> mode.
>
The manufacture is trying to get the tool to work. This is what the
patch is about. Even for a application developer a commercial
phone is locked down. Many vendors allow that you flash
some other software like a AOSP.  But that can be very
different. Like installing a ubuntu on a PC to debug a Fedora issue.

And sure we can pickup parts of what using the dma-buf. But
we can not get the total and be sure that is the total without a
proper counter.