Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem

From: Alexei Starovoitov
Date: Wed Jan 13 2016 - 20:20:04 EST

On Wed, Jan 13, 2016 at 10:56:38PM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 1:30 PM, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> > On Wed, Jan 13, 2016 at 11:17:23AM +0800, Ming Lei wrote:
> >> On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@xxxxxx> wrote:
> >> > On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
> >> >> > The userspace usually only aggregates value across all cpu every X seconds.
> >> >>
> >> >> That is just in your case, and Alexei worried the issue of data stale.
> >> > I believe we are talking about validity of a value. How to
> >> > make use of a less-stale but invalid data?
> >>
> >> About the 'invalidity' thing, it should be same between using
> >> smp_call(run in IPI irq handler) and simple memcpy().
> >>
> >> When smp_call_function_single() is used to request to lookup element in
> >> the specific CPU, the value of the element may be in updating in that CPU
> >> and not completed yet in eBPF prog, then IPI comes and half updated
> >> data is still returned to syscall.
> >
> > hmm. I'm not following. bpf programs are executing with preempt disabled,
> > so smp_call_function_single suppose to execute when bpf is not running.
> Preempt disabled doesn't mean irq disabled, does it? So when bpf prog is
> running, the IPI irq for smp_call still may come on that CPU.

In case of kprobes irqs are disabled, but yeah for sockets smp_call won't help.
Can probably use schedule_work_on(), but that's too heavy.
I guess we need bpf_map_lookup_and_delete_elem() syscall command, so we can
delete single pointer out of per-cpu hash map and in call_rcu() copy precise

> Also in current non-percpu hash, the situation exists too between
> lookup elem syscall and updating value of element from bpf prog in
> SMP.

looks like regular bpf_map_lookup_elem() syscall will return inaccurate data
even for per-cpu hash. hmm. we need to brain storm more on it.