Re: Intermittent performance regression related to ipset between 5.10 and 5.15
From: Jakub Kicinski
Date: Fri Jul 29 2022 - 21:41:51 EST
On Fri, 29 Jul 2022 20:21:17 +0000 U'ren, Aaron wrote:
> Jozef / Jakub / Thorsten-
>
> Thanks for all of your help with this issue. I think that we can close this out now.
>
> After continuing to dig into this problem some more, I eventually figured out that the problem was caused because of how our userspace tooling was interacting with ipset save / restore and the new (ish) initval option that is included in saves / restores.
>
> Specifically, kube-router runs an ipset save then processes the saved ipset data, messages it a bit based upon the state from the Kubernetes cluster, and then runs that data back through ipset restore. During this time, we create unique temporary sets based upon unique sets of options and then rotate in the new endpoints into the temporary set and then use swap instructions in order to minimize impact to the data path.
>
> However, because we were only messaging options that were recognized and important to us, initval was left alone and blindly copied into our option strings for new and temporary sets. This caused initval to be used incorrectly (i.e. the same initval ID was used for multiple sets). I'm not 100% sure about all of the consequences of this, but it seems to have objectively caused some performance issues.
>
> Additionally, since initval is intentionally unique between sets, this caused us to create many more temporary sets for swapping than was actually necessary. This caused obvious performance issues as restores now contained more instructions than they needed to.
>
> Reverting the commit removed the issue we saw because it removed the portion of the kernel that generated the initvals which caused ipset save to revert to its previous (5.10 and below) functionality. Additionally, applying your patches also had the same impact because while I believed I was updating our userspace ipset tools in tandem, I found that the headers were actually being copied in from an alternate location and were still using the vanilla headers. This meant that while the kernel was generating initval values, the userspace actually recognized it as IPSET_ATTR_GC values which were then unused.
>
> This was a very long process to come to such a simple recognition about the ipset save / restore format having been changed. I apologize for the noise.
Thanks for working it out and explaining the root cause :)
I'm probably going to get the syntax wrong, but here goes nothing:
#regzbot invalid: user space mis-configuration