Re: [GIT PULL] x86/mm changes for v4.4
From: Andy Lutomirski
Date: Fri Nov 06 2015 - 02:05:59 EST
On Thu, Nov 5, 2015 at 10:55 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>> On Wed, Nov 4, 2015 at 6:17 PM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
>> > On Wed, Nov 04, 2015 at 05:31:59PM -0800, Linus Torvalds wrote:
>> > >
>> > > I don't have that later debug output at all. Presumably some config difference.
>> >
>> > CONFIG_X86_PTDUMP_CORE iirc.
>>
>> No, I have that. I suspect CONFIG_EFI_PGT_DUMP instead.
>>
>> Anyway, as it stands now, I think the CONFIG_DEBUG_WX option should
>> not default to 'y' unless it is made more useful if it actually
>> triggers. Ingo?
>
> Yeah, agreed absolutely.
>
> So this is a bit sad because RWX pages are a real problem in practice, especially
> since the EFI addresses are well predictable, but generating a warning without
> being able to fix it quickly is counterproductive as well, as it only annoys
> people and makes them turn off the option. (Which we could do as well to begin
> with, without the annoyance factor...)
>
> So the plan would be:
>
> 1) Make it default-n.
>
> 2) We should try to further improve the messages to make it easier to determine
> what's wrong. We _do_ try to output symbolic information in the warning, to
> make it easier to find buggy mappings, but these are not standard kernel
> mappings. So I think we need an e820 mappings based semi-symbolic printout of
> bad addresses - maybe even correlate it with the MMIO resource tree.
>
> 3) We should fix the EFI permission problem without relying on the firmware: it
> appears we could just mark everything R-X optimistically, and if a write fault
> happens (it's pretty rare in fact, only triggers when we write to an EFI
> variable and so), we can mark the faulting page RW- on the fly, because it
> appears that writable EFI sections, while not enumerated very well in 'old'
> firmware, are still supposed to be page granular. (Even 'new' firmware I
> wouldn't automatically trust to get the enumeration right...)
I think it was Borislav who pointed out that this idea, which might
have been mine, is a bit silly. Why not just skip mapping the EFI
stuff in the init_pgd entirely and only map it in the EFI pgd?
We'll have RWX stuff in the EFI pgd, but so what? If we're exposing
anything that runs with the EFI pgd loaded to untrusted input, I think
we've already lost.
Admittedly, we might need to use a certain amount of care to avoid
interesting conflicts with the vmap mechanism. We might need to vmap
all of the EFI stuff, and possibly even all the top-level entries that
contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*)
as a blank not-present region to avoid overlaps, but that's not a big
deal.
>
> If that 'supposed to be' turns out to be 'not true' (not unheard of in
> firmware land), then plan B would be to mark pages that generate write faults
> RWX as well, to not break functionality. (This 'mark it RWX' is not something
> that exploits would have easy access to, and we could also generate a warning
> [after the EFI call has finished] if it ever triggers.)
>
> Admittedly this approach might not be without its own complications, but it
> looks reasonably simple (I don't think we need per EFI call page tables,
> etc.), and does not assume much about the firmware being able to enumerate its
> permissions properly. Were we to merge EFI support today I'd have insisted on
> trying such an approach from day 1 on.
I think we have separate EFI page tables already for other reasons. I
could be wrong -- I've never really understood the EFI mapping layout
very well.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/