Re: regression in 4.14-rc2 caused by apparmor: add base infastructure for socket mediation

From: Thorsten Leemhuis
Date: Thu Oct 26 2017 - 05:12:10 EST


On 24.10.2017 13:31, John Johansen wrote:
> On 10/23/2017 11:39 PM, Thorsten Leemhuis wrote:
>> Lo, your friendly regression tracker here!
>> On 03.10.2017 09:17, John Johansen wrote:
>>> On 10/02/2017 11:48 PM, Vlastimil Babka wrote:
>>>> On 10/03/2017 07:15 AM, James Bottomley wrote:
>>>>> On Mon, 2017-10-02 at 21:11 -0700, John Johansen wrote:
>>>>>> On 10/02/2017 09:02 PM, James Bottomley wrote:
>>>>>>>
>>>>>>> The specific problem is that dnsmasq refuses to start on openSUSE
>>>>>>> Leap 42.2. The specific cause is that and attempt to open a
>>>>>>> PF_LOCAL socket gets EACCES. This means that networking doesn't
>>>>>>> function on a system with a 4.14-rc2 system.
>>>>>>> Reverting commit 651e28c5537abb39076d3949fb7618536f1d242e
>>>>>>> (apparmor: add base infastructure for socket mediation) causes the
>>>>>>> system to function again.
>>>>>> This is not a kernel regression,
>>>>> Regression means something that worked in a previous version of the
>>>>> kernel which is broken now. This problem falls within that definition.
>>>> Hm, but if this was because opensuse kernel and apparmor rules relied on
>>>> an out-of-tree patch, then it's not an upstream regression?
>>> While its true that previous opensuse kernels were relying on an out
>>> of tree patch for doing mediation in this area, the real issue is the
>>> configuration of the userspace on the system is setup to enforce new
>>> policy features advertised by the kernel. Regardless of whether policy
>>> has been updated to deal with it.
>> Did anything came out of this discussion? I checked LKML and recent
>> commits, but missed if anything happened. But it seems this problem
>> annoys quite a few of people on various distros. It turned out one of
>> the the regressions in my last regression report seemed to be due to the
>> changes in apparmor. See:
>
> yes, there has been testing and discussions, and a regression was
> found just not the "regression" you are encountering. A fix for that
> regression is in testing and I will send a pull request for it soon.

Just out of curiosity: any pointer to the discussion or the fix?

>> https://bugzilla.kernel.org/show_bug.cgi?id=197137#7
> yes, this is the same issue you have encountered

FWIW: I didn't encounter any of this, I'm just doing regression tracking
and now hit the point where to escalate the issue to Linus...

>> That commit links to two bugs filed for Debian and Ubuntu:
>> https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1724450
> this is actually a different issue. Ubuntu hasn't SRUed the most
> recent maintenance releases or even just cherry-picked a specific
> patch into their userspace packaging.
>
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=877581
> this is largely the same issue as ubuntu.

Well, afaics it boils down: Things stop working on two (or even three?)
mainstream distros in case their users update to 4.14 without updating
their userland. Is that the case even after the fix you mentioned gets
merged? Then this definitely is a regression.
>> The stuff even made the news:
>> https://www.phoronix.com/scan.php?page=news_item&px=AppArmor-Linux-4.14
>> It's obviously Linus to decide in the end, but from my understanding of
>> the whole "no regressions" rule this looks quite a lot like a regression
>> to me.
> I understand your pov, its breaking you so it is a regression. However
> this is not a regression in the kernel nor the apparmor interfaces
> between userspace and the kernel. It is a userspace configuration
> issue.
>
> It is a userspace configuration issue. Your userspace is set up to
> basically do policy development. Atm this is the default configuration
> that all distros are using, however the debian maintainer is planning
> to use featurea abi pinning for stable releases.
>
> However if you are doing things like using kernels that run ahead of
> the distro's apparmor policy, that also means you need to either do
> some policy revision, pin the feature abi (userspace configuration),
> or disable apparmor.

All that afaics doesn't matter. If a new kernel breaks things for people
(that especially includes people that do *not* update their userland)
then it's a kernel regression, even if the root of the problem is in
usersland. Linus (CCed) said that often enough (I really should sit down
and collect his mails on this from the web and put them in one
document). He for example recently said in
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004746.html
recently that people should "feel safe in always upgrading to any higher
version". And that's not the case afaics -- or am I missing something?
See also this discussion, where the problem was quite similar iirc:
https://lkml.org/lkml/2012/12/23/75 "If a change results in user
programs breaking, it's a bug in the kernel. We never EVER blame the
user programs. How hard can this be to understand? [â]"

Ciao, Thorsten

>>> Distros should be pinning the feature set supported because as you
>>> note below, policy will not get updated for unsupported kernels and you
>>> will end up in an unsupported state where regressions like this can
>>> happen.
>>>
>>> There are reasons why distros don't, largely because certain packages
>>> would like to take advanatage of new features, or only want to support
>>> a single policy version across multiple releases and are relying on
>>> the userspace tools to properly compile the policy to different
>>> kernels.
>>>
>>> The current pinning support doesn't allow for mixing policy versions
>>> which can make supporting updated packages difficult atm, but there is
>>> work (that hasn't landed yet) to allow for policy of different version
>>> by putting the requirements within the individual profiles and will
>>> completely avoid the problems encountered here.
>>>
>>>
>>>>>> it is because opensuse dnsmasque is starting with policy that
>>>>>> doesn't allow access to PF_LOCAL socket
>>>>>
>>>>> Because there was no co-ordination between their version of the patch
>>>>> and yours. If you're sending in patches that you know might break
>>>>> systems because they need a co-ordinated rollout of something in
>>>>> userspace then it would be nice if you could co-ordinate it ...
>>>>>
>>>>> Doing it in the merge window and not in -rc2 would also be helpful
>>>>> because I have more expectation of a userspace mismatch from stuff in
>>>>> the merge window.
>>>>
>>>> Agree, but with rc2 there's still plenty of time, and running rcX means
>>>> some issues can be expected...
>>>>
>>>>>> Christian Boltz the opensuse apparmor maintainer has been working
>>>>>> on a policy update for opensuse see bug
>>>>>>
>>>>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1061195
>>>>>
>>>>> Well, that looks really encouraging: The line about "To give you an
>>>>> impression what "lots of" means - I had to adjust 40 profiles on my
>>>>> laptop". The upshot being apart from a bandaid, openSUSE still has no
>>>>> co-ordinated fix for this.
>>>>
>>>> Note that the openSUSE Leap 42.2 kernel is 4.4, so by running 4.14 means
>>>> you are unsupported from the distro POV and you can't expect that the
>>>> 42.2 apparmor profiles will ever be updated. I reported the bug above
>>>> for the Tumbleweed rolling distro, which gets new kernels after the
>>>> final version is released and passes QA. rcX kernels are packaged for
>>>> testing, but you have to add the repo explicitly. So there's still
>>>> enough time to co-ordinate fix of profiles and final 4.14 even for
>>>> Tumbleweed.
>>>>> James