RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous mode control

From: Skidmore, Donald C
Date: Fri Jan 23 2015 - 13:24:22 EST




> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@xxxxxxxxx]
> Sent: Thursday, January 22, 2015 11:18 PM
> To: Hiroshi Shimamoto; Skidmore, Donald C; BjÃrn Mork
> Cc: e1000-devel@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Choi, Sy
> Jong; linux-kernel@xxxxxxxxxxxxxxx; David Laight; Hayato Momma
> Subject: Re: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast promiscuous
> mode control
>
> On 01/22/2015 04:32 PM, Hiroshi Shimamoto wrote:
> >> Subject: RE: [E1000-devel] [PATCH 1/2] if_link: Add VF multicast
> >> promiscuous mode control
> >>> "Skidmore, Donald C" <donald.c.skidmore@xxxxxxxxx> writes:
> >>>
> >>>> My hang up is more related to: without the nob to enable it (off by
> >>>> default) we are letting one VF dictate policy for all the other VFs
> >>>> and the PF. If one VF needs to be in promiscuous multicast so is
> >>>> everyone else. Their stacks now needs to deal with all the extra
> >>>> multicast packets. As you point out this might not be a direct
> >>>> concern for isolation in that the VM could have 'chosen' to join
> >>>> any Multicast group and seen this traffic. My concern over
> >>>> isolation is one VF has chosen that all the other VM now have to
> >>>> see this multicast traffic.
> >>> Apologies if this question is stupid, but I just have to ask about
> >>> stuff I don't understand...
> >>>
> >>> Looking at the proposed implementation, the promiscous multicast
> >>> flag seems to be a per-VF flag:
> >>>
> >>> +int ixgbe_ndo_set_vf_mc_promisc(struct net_device *netdev, int vf,
> >>> +bool
> >>> +setting) {
> >>> + struct ixgbe_adapter *adapter = netdev_priv(netdev);
> >>> + struct ixgbe_hw *hw = &adapter->hw;
> >>> + u32 vmolr;
> >>> +
> >>> + if (vf >= adapter->num_vfs)
> >>> + return -EINVAL;
> >>> +
> >>> + adapter->vfinfo[vf].mc_promisc_enabled = setting;
> >>> +
> >>> + vmolr = IXGBE_READ_REG(hw, IXGBE_VMOLR(vf));
> >>> + if (setting) {
> >>> + e_info(drv, "VF %u: enabling multicast promiscuous\n", vf);
> >>> + vmolr |= IXGBE_VMOLR_MPE;
> >>> + } else {
> >>> + e_info(drv, "VF %u: disabling multicast promiscuous\n", vf);
> >>> + vmolr &= ~IXGBE_VMOLR_MPE;
> >>> + }
> >>> +
> >>> + IXGBE_WRITE_REG(hw, IXGBE_VMOLR(vf), vmolr);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>>
> >>> I haven't read the data sheet, but I took a quick look at the
> >>> excellent high level driver docs:
> >>> http://www.intel.com/content/dam/doc/design-guide/82599-sr-iov-
> drive
> >>> r-
> >>> companion-guide.pdf
> >>>
> >>> It mentions "Multicast Promiscuous Enable" in its "Thoughts for
> >>> Customization" section:
> >>>
> >>> 7.1 Multicast Promiscuous Enable
> >>>
> >>> The controller has provisions to allow each VF to be put into
> >>> Multicast Promiscuous mode. The Intel reference driver does not
> >>> configure this option .
> >>>
> >>> The capability can be enabled/disabled by manipulating the MPE
> >>> field (bit
> >>> 28) of the PF VF L2 Control Register (PFVML2FLT â 0x0F000)
> >>>
> >>> and showing a section from the data sheet describing the "PF VM L2
> >>> Control Register - PFVML2FLT[n] (0x0F000 + 4 * n, n=0...63; RW)"
> >>>
> >>> To me it looks like enabling Promiscuos Multicast for a VF won't
> >>> affect any other VF at all. Is this really not the case?
> >>>
> >>>
> >>>
> >>> BjÃrn
> >> Clearly not a dumb question at all and I'm glad you mentioned that.
> >> :) I was going off the assumption, been awhile since I read the
> >> patch, that the patch was using FCTRL.MPE or MANC.MCST_PASS_L2
> which would turn multicast promiscuous on for everyone. Since the patch is
> using PFVML2FLT.MPE this lessens my concern over effect on the entire
> system.
> > I believe the patches for this VF multicast promiscuous mode is per VF.
> >
> >> That said I still would prefer having a way to override this behavior on the
> PF, although I admit my argument is weaker.
> >> I'm still concerned about a VF changing the behavior of the PF
> >> without any way to prevent it. This might be one part philosophical
> >> (PF sets policy not the VF) but this still could have a noticeable
> >> effect on the overall system. If any other VFs (or the PF) are
> >> receiving MC packets these will have to be replicated which will be a
> >> performance hit. When we use the MC hash this is limited vs. when
> anyone is in MC promiscuous every MC packet used by another pool would
> be replicated. I could imagine in some environments (i.e. public clouds)
> where you don't trust what is running in your VM you might what to block
> this from happening.
> > I understand your request and I'm thinking to submit the patches
> > 1) Add new mbox API between ixgbe/ixgbevf to turn MC promiscuous on,
> > and enables it when ixgbevf needs over 30 MC addresses.
> > 2) Add a policy knob to prevent enabling it from the PF.
> >
> > Does it seem okay?
>
> I would advise that if such a knob is added it should be set to disabled by
> default. The main problem with supporting that many multicast addresses
> per VF is that you run the risk for flooding the PCIe interface on the network
> adapter if too many adapters were to go into such a mode.
>

This was my understanding as well. Someone would have to "choose" to allow VM to enter this mode, meaning off by default.

> > BTW, I'm bit worried about to use ndo interface for 2) because adding
> > a new hook makes core code complicated.
> > Is it really reasonable to do it with ndo?
> > I haven't find any other suitable method to do it, right now. And
> > using ndo VF hook looks standard way to control VF functionality.
> > Then, I think it's the best way to implement this policy in ndo hook.
>
> The ndo is probably the only way to go on this. It is how past controls for the
> VF network functionality have been implemented.
>
> >> In some ways it is almost the mirror image of the issue you brought up:
> >>
> >> Adding a new hook for this seems over-complicated to me. And it
> >> still doesn't solve the real problems that
> >> a) the user has to know about this limit, and
> >> b) manually configure the feature
> >>
> >> My reverse argument might be that if this happens automatically. It
> >> might take the VM provider a long time to realize performance has
> >> taken a hit because some VM asked to join 31 multicast groups and
> entered MC promiscuous. Then only to find that they have no way to block
> such behavior.
> >>
> >> Maybe I wouldn't as concerned if the patch author could provide some
> >> performance results to show this won't have as a negative effect as I'm
> afraid it might?
> > Hm, what kind of test case would you like to have?
> > The user A who has 30 MC addresses vs the user B who has 31 MC
> > addresses, which means that MC promiscuous mode, and coming MC
> packets the user doesn't expect?
> >
> > thanks,
> > Hiroshi
>
> The question I would have is how many of these interfaces do you expect to
> have supporting the expanded multicast mode? As it currently stands
> multicast traffic has the potential to flood the adapter, greatly reduce the
> overall throughput, and add extra workload to the PF and all VFs.
> For example if several VFs enable this feature, and then someone on the
> network sends a stream of multicast traffic what happens to the CPU load for
> the host system?
>
> Also how many addresses beyond 30 is it you require? An alternative to
> adding multicast promiscuous might be to consider extending the mailbox
> API to support sending more than 30 addresses via something such as a
> multi-part multicast configuration message. The fm10k driver already has
> logic similar to this as it adds addresses 1 at a time though a separate Switch
> interface API between the PF and the Switch. You might be able to reuse
> some of that code to reach beyond the 30 address limit.

I think this would be a much safer approach and probably scale well for small sets. However I don't think it would work for Hiroshi's use case. If I remember correctly he wanted 1000's of MC groups per VM. I imagine there would be considerable overhead even loading up our 4K hash table 1 address at a time, likewise with that many address wouldn't this just be pretty much the same as being in multicast promiscuous. Still this might be an interesting approach to support those needing only a few MC groups over the limit.

Alex's point is worth reiterating, this will effect performance. Have you tested this out under load and still see the results you can live with?

-Don Skidmore <donald.c.skidmore@xxxxxxxxx>

>
> - Alex
N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå