Re: [net-next rfc V2 7/8] macvtap: add TUNSETQUEUE ioctl
From: Jason Wang
Date: Tue Jun 04 2013 - 23:02:34 EST
On 06/04/2013 03:05 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 04, 2013 at 01:54:56PM +0800, Jason Wang wrote:
>> On 06/03/2013 07:09 PM, Michael S. Tsirkin wrote:
>>> On Mon, Jun 03, 2013 at 01:20:58PM +0800, Jason Wang wrote:
>>>> On 06/02/2013 07:22 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, May 31, 2013 at 05:53:24PM +0800, Jason Wang wrote:
>>>>>> This patch adds TUNSETQUEUE ioctl to let userspace can temporarily disable or
>>>>>> enable a queue of macvtap. This is used to be compatible at API layer of tuntap
>>>>>> to simplify the userspace to manage the queues.
>>>>>>
>>>>>> This is done by split the taps array into three different areas:
>>>>>>
>>>>>> - [0, numvtaps) : enabled taps
>>>>>> - [numvtaps, numvtaps + numdisabled) : disabled taps
>>>>>> - [numvtaps + numdisabled, MAX_MAXVTAP_QUEUES) : unused slots
>>>>>>
>>>>>> When a tap were enabled and disabled, it was moved to another area.
>>>>>>
>>>>>> Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx>
[...]
>>>>> +static int macvtap_disable_queue(struct macvtap_queue *q)
>>>>> +{
>>>>> + struct macvlan_dev *vlan;
>>>>> + int err = -EINVAL;
>>>>> +
>>>>> + spin_lock(&macvtap_lock);
>>>>> + vlan = rcu_dereference_protected(q->vlan,
>>>>> + lockdep_is_held(&macvtap_lock));
>>>>> +
>>>>> + if (vlan) {
>>>>> + int total = vlan->numvtaps + vlan->numdisabled;
>>>>> + int index = q->queue_index;
>>>>> +
>>>>> + BUG_ON(q->queue_index >= total);
>>>>> + if (q->queue_index >= vlan->numvtaps)
>>>>> + goto out;
>>>>> +
>>>>> + err = 0;
>>>>> + macvtap_swap_slot(vlan, index, total - 1);
>>>>> + if (vlan->numdisabled)
>>>>> + /* If there's disabled taps, the above swap will cause
>>>>> + * a disabled tap to be moved to enabled area. So
>>>>> + * another swap is needed to keep the right order.
>>>>> + */
>>>>> + macvtap_swap_slot(vlan, index, vlan->numvtaps - 1);
>>>>> +
>>>>> + /* make sure the pointers were seen before indices */
>>>>> + wmb();
>>>>> Hmm this looks questionable. The code near rmb first
>>>>> checks numvtaps then dereferences the queue.
>>>>> So it might see a new queue but old value of numvtaps.
>>>> Right, barriers here were just best effort to reduce the possibility of
>>>> wrong queue selection when changing the number of queues.
>>> If this is an optimization, I'd say benchmark it and
>>> see if it helps performance.
>>>
>>> I don't expect it to have any effect really.
>>> In fact, the re-ordering of queues that this patch does
>>> is likely to cause packet reorering and hurt performance
>>> more.
>> Yes, so I will remove the barriers.
>>
>> The re-ordering seems to be the easiest way to do fast lookup of active
>> queues. We could use indirection to avoid the re-ordering of queues,
>> it's hard to eliminate OOO packets. If we don't depends on changing the
>> number of queues frequently, we're ok.
>>> I'm guessing the only thing we need for correctness
>>> is ACCESS_ONCE on numvtaps?
>> Did't see how it help.
> For this loop:
> while (unlikely(rxq >= numvtaps))
> rxq -= numvtaps;
>
> you better read numvtaps with ACCESS_ONCE otherwise
> compiler can re-read numvtaps and it would
> change during loop execution.
> rxq can then become negative.
>
I get your meaning, looks like tun should be fixed as well.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/