Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu

From: Tim Chen
Date: Mon Jul 14 2014 - 15:51:10 EST

Next message: Borislav Petkov: "Re: [PATCH] hwmon, k10temp: Add support for AMD F15h M60h processor"
Previous message: Sander Eikelenboom: "Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17."
In reply to: Peter Zijlstra: "Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu"
Next in thread: Peter Zijlstra: "Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2014-07-14 at 21:15 +0200, Peter Zijlstra wrote:
> On Mon, Jul 14, 2014 at 12:08:28PM -0700, Tim Chen wrote:
> > On Mon, 2014-07-14 at 20:17 +0200, Peter Zijlstra wrote:
>
> > > Your multi-buffer thing isn't generic either, it seems lmiited to sha1.
> >
> > We actually have many other multi-buffer crypto algorithms already
> > published for encryption and other IPSec usages. So
> > multi-buffer algorithm is not just limited to SHA1.
> > We hope to port those to the kernel crypto library eventually.
> > http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-multi-buffer-ipsec-implementations-ia-processors-paper.pdf
>
> That's all nice and such; but the code as I've seen in these patches is
> very much sha1 specific. The mb part isn't separated out.

There is a generic multi-buffer infrastructure portion that manages
pulling and queuing jobs on the crypto workqueue, and it is separated out
in patch 1 of the patchset. The other portions are algorithm specific that defines
algorithm specific data structure and does the crypto computation
for a particular algorithm, mostly in
assemblies and C glue code. The infrastructure code is
meant to be reused for other similar
multi-buffer algorithms.

>
> > > It does not reuse padata,
> > padata tries to speed things up by parallelizing jobs to *multiple*
> > cpus. Whereas multi-buffer tries to speed things up by speeding things
> > up by using multiple data lanes in SIMD register in a *single* cpu.
> > These two usages are complementary but not the same.
>
> And if its single cpu, wth do you need that nr_running thing for another
> cpu for?

We use nr_running_cpu to check whether there are other tasks running on
the *current* cpu, (not for another cpu), to decide if we should flush
and compute crypto jobs accumulated. If there's nobody else running,
we can take advantage of available cpu cycles on the cpu we are running
on to do computation on the existing jobs in a SIMD mannner.
Waiting a bit longer may accumulate more jobs to process in parallel
in a single SIMD instruction, but will have more delay.

>
> Also, this difference wasn't clear to me.
>
> I still loathe all the async work, because it makes a mockery of
> accounting etc.. but that's a story for another day I suppose :-(
>
>

Thanks.

Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Borislav Petkov: "Re: [PATCH] hwmon, k10temp: Add support for AMD F15h M60h processor"
Previous message: Sander Eikelenboom: "Re: [Xen-devel] [PATCH v4] PCI back fixes for 3.17."
In reply to: Peter Zijlstra: "Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu"
Next in thread: Peter Zijlstra: "Re: [PATCH v4 6/7] sched: add function nr_running_cpu to expose number of tasks running on cpu"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]