Re: [RFC PATCH V3 0/5] Hi:

From: Jason Wang
Date: Tue Jan 08 2019 - 06:42:57 EST



On 2019/1/7 äå10:11, Michael S. Tsirkin wrote:
On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote:
On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote:
On 2019/1/7 äå11:28, Michael S. Tsirkin wrote:
On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote:
On 2019/1/3 äå4:47, Michael S. Tsirkin wrote:
On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote:
This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.
Will review, thanks!
One questions that comes to mind is whether it's all about bypassing
stac/clac. Could you please include a performance comparison with
nosmap?

On machine without SMAP (Sandy Bridge):

Before: 4.8Mpps

After: 5.2Mpps
OK so would you say it's really unsafe versus safe accesses?
Or would you say it's just a better written code?

It's the effect of removing speculation barrier.

You mean __uaccess_begin_nospec introduced by
commit 304ec1b050310548db33063e567123fae8fd0301
?

So fundamentally we do access_ok checks when supplying
the memory table to the kernel thread, and we should
do the spec barrier there.

Then we can just create and use a variant of uaccess macros that does
not include the barrier?

Or, how about moving the barrier into access_ok?
This way repeated accesses with a single access_ok get a bit faster.
CC Dan Williams on this idea.
It would be interesting to see how expensive re-doing the address
limit check is compared to the speculation barrier. I.e. just switch
vhost_get_user() to use get_user() rather than __get_user(). That will
sanitize the pointer in the speculative path without a barrier.
Hmm it's way cheaper even though IIRC it's measureable.
Jason, would you like to try?


0.5% regression after using get_user()/put_user()/...


Although frankly __get_user being slower than get_user feels very wrong.
Not yet sure what to do exactly but would you agree?


I recall we had a convert access_ok() discussion with this result here:

https://lkml.org/lkml/2018/1/17/929
Sorry let me try to clarify. IIUC speculating access_ok once
is harmless. As Linus said the problem is with "_subsequent_
accesses that can then be used to perturb the cache".

Thus:

1. if (!access_ok)
2. return
3. get_user
4. if (!access_ok)
5. return
6. get_user

Your proposal that Linus nacked was to effectively add a barrier after
lines 2 and 5 (also using the array_index_nospec trick for speed),
right? Unfortunately that needs a big API change.

I am asking about adding barrier_nospec within access_ok.
Thus effectively before lines 1 and 4.
access_ok will be slower but after all the point of access_ok is
to then access the same memory multiple times.
So we should be making __get_user faster and access_ok slower ...


And the barrier_nospec() was completely necessary if you want to do write instead read.


Thanks



...but it sounds like you are proposing a smaller scope fixup for the
vhost use case? Something like barrier_nospec() in the success path
for all vhost access_ok() checks and then a get_user() variant that
disables the barrier.
Maybe we'll have to. Except I hope vhost won't end up being the
only user otherwise it will be hard to maintain.