On 12/12/2019 15:47, Robin Murphy wrote:
On 12/12/2019 1:53 pm, Marc Gonzalez wrote:
On 11/12/2019 23:28, Dmitry Torokhov wrote:
On Wed, Dec 11, 2019 at 05:17:28PM +0100, Marc Gonzalez wrote:
What is the rationale for the devm_add_action API?
For one-off and maybe complex unwind actions in drivers that wish to use
devm API (as mixing devm and manual release is verboten). Also is often
used when some core subsystem does not provide enough devm APIs.
Thanks for the insight, Dmitry. Thanks to Robin too.
This is what I understand so far:
devm_add_action() is nice because it hides/factorizes the complexity
of the devres API, but it incurs a small storage overhead of one
pointer per call, which makes it unfit for frequently used actions,
such as clk_get.
Is that correct?
My question is: why not design the API without the small overhead?
Probably because on most architectures, ARCH_KMALLOC_MINALIGN is at
least as big as two pointers anyway, so this "overhead" should mostly be
free in practice. Plus the devres API is almost entirely about being
able to write simple robust code, rather than absolute efficiency - I
mean, struct devres itself is already 5 pointers large at the absolute
minimum ;)
(3 pointers: 1 list_head + 1 function pointer)
I'm confused. The first patch was criticized for potentially adding
an extra pointer for every devm_clk_get (e.g. 800 bytes on a 64-bit
platform with 100 clocks).
Let's see. On arm64, ARCH_KMALLOC_MINALIGN is 128.
So basically, a struct devres looks like this on arm64:
list_head.next
list_head.prev
dr_release_t
.
.
.
104 bytes of padding
.
.
.
data (flexible array)
.
.
.
padding up to 256 bytes
Basically, on arm64, every struct devres occupies 256 bytes, most of it
(typically 104 + 112 = 216) wasted as padding.
Hmmm, given how many devm stuff goes on in a modern platform, there
might be large savings to be had...
Assuming 10,000 calls to devres_alloc_node(), we would be wasting ~2 MB
of RAM. Not sure it's worth trying to save that?
$ git grep '#define ARCH_DMA_MINALIGN'
arch/arc/include/asm/cache.h:#define ARCH_DMA_MINALIGN SMP_CACHE_BYTES
arch/arm/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/arm64/include/asm/cache.h:#define ARCH_DMA_MINALIGN (128)
arch/c6x/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/csky/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/hexagon/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/m68k/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/microblaze/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/mips/include/asm/mach-generic/kmalloc.h:#define ARCH_DMA_MINALIGN 128
arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 32
arch/mips/include/asm/mach-ip32/kmalloc.h:#define ARCH_DMA_MINALIGN 128
arch/mips/include/asm/mach-tx49xx/kmalloc.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/nds32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/nios2/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/parisc/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/powerpc/include/asm/page_32.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/sh/include/asm/page.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/unicore32/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
arch/xtensa/include/asm/cache.h:#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
Hmmm, how does arch/x86 do it?
Regards.