Re: [PATCH v1 6/8] dmaengine: enhance network subsystem to supportDMA device hotplug

From: Jiang Liu
Date: Wed Apr 25 2012 - 11:47:31 EST


Hi Dan,
Thanks for your great comments about the performance penalty issue. And I'm trying
to refine the implementation to reduce penalty caused by hotplug logic. If the algorithm works
correctly, the optimized hot path code will be:

------------------------------------------------------------------------------
struct dma_chan *dma_find_channel(enum dma_transaction_type tx_type)
{
struct dma_chan *chan = this_cpu_read(channel_table[tx_type]->chan);

this_cpu_inc(dmaengine_chan_ref_count);
if (static_key_false(&dmaengine_quiesce)) {
chan = NULL;
}

return chan;
}
EXPORT_SYMBOL(dma_find_channel);

struct dma_chan *dma_get_channel(struct dma_chan *chan)
{
if (static_key_false(&dmaengine_quiesce))
atomic_inc(&dmaengine_dirty);
this_cpu_inc(dmaengine_chan_ref_count);

return chan;
}
EXPORT_SYMBOL(dma_get_channel);

void dma_put_channel(struct dma_chan *chan)
{
this_cpu_dec(dmaengine_chan_ref_count);
}
EXPORT_SYMBOL(dma_put_channel);
-----------------------------------------------------------------------------

The disassembled code is:
(gdb) disassemble dma_find_channel
Dump of assembler code for function dma_find_channel:
0x0000000000000000 <+0>: push %rbp
0x0000000000000001 <+1>: mov %rsp,%rbp
0x0000000000000004 <+4>: callq 0x9 <dma_find_channel+9>
0x0000000000000009 <+9>: mov %edi,%edi
0x000000000000000b <+11>: mov 0x0(,%rdi,8),%rax
0x0000000000000013 <+19>: mov %gs:(%rax),%rax
0x0000000000000017 <+23>: incq %gs:0x0 //overhead: this_cpu_inc(dmaengine_chan_ref_count)
0x0000000000000020 <+32>: jmpq 0x25 <dma_find_channel+37> //overhead: if (static_key_false(&dmaengine_quiesce)), will be replaced as NOP by jump label
0x0000000000000025 <+37>: pop %rbp
0x0000000000000026 <+38>: retq
0x0000000000000027 <+39>: nopw 0x0(%rax,%rax,1)
0x0000000000000030 <+48>: xor %eax,%eax
0x0000000000000032 <+50>: pop %rbp
0x0000000000000033 <+51>: retq
End of assembler dump.
(gdb) disassemble dma_put_channel // overhead: to decrease channel reference count, 6 instructions
Dump of assembler code for function dma_put_channel:
0x0000000000000070 <+0>: push %rbp
0x0000000000000071 <+1>: mov %rsp,%rbp
0x0000000000000074 <+4>: callq 0x79 <dma_put_channel+9>
0x0000000000000079 <+9>: decq %gs:0x0
0x0000000000000082 <+18>: pop %rbp
0x0000000000000083 <+19>: retq
End of assembler dump.
(gdb) disassemble dma_get_channel
Dump of assembler code for function dma_get_channel:
0x0000000000000040 <+0>: push %rbp
0x0000000000000041 <+1>: mov %rsp,%rbp
0x0000000000000044 <+4>: callq 0x49 <dma_get_channel+9>
0x0000000000000049 <+9>: mov %rdi,%rax
0x000000000000004c <+12>: jmpq 0x51 <dma_get_channel+17>
0x0000000000000051 <+17>: incq %gs:0x0
0x000000000000005a <+26>: pop %rbp
0x000000000000005b <+27>: retq
0x000000000000005c <+28>: nopl 0x0(%rax)
0x0000000000000060 <+32>: lock incl 0x0(%rip) # 0x67 <dma_get_channel+39>
0x0000000000000067 <+39>: jmp 0x51 <dma_get_channel+17>
End of assembler dump.

So for a typical dma_find_channel()/dma_put_channel(), the total overhead
is about 10 instructions and two percpu(local) memory updates. And there's
no shared cache pollution any more. Is this acceptable ff the algorithm
works as expected? I will test the code tomorrow.

For typical systems which don't support DMA device hotplug, the overhead
could be completely removed by condition compilation.

Any comments are welcomed!

Thanks!
--gerry


On 04/24/2012 11:09 AM, Dan Williams wrote:
>>> If you are going to hotplug the entire IOH, then you are probably ok

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/