Re: + elf-loader-crash-while-zero-filling-bss.patch added to -mm tree
From: Abel Bernabeu
Date: Thu Feb 14 2008 - 05:04:10 EST
2008/2/13, Daniel Jacobowitz <dan@xxxxxxxxxx>:
> On Wed, Feb 13, 2008 at 12:15:06AM -0800, akpm@xxxxxxxxxxxxxxxxxxxx wrote:
> > Subject: Elf loader crash while zero-filling .bss
> > From: "Abel Bernabeu" <abelbg@xxxxxxxxx>
> >
> > I've finally found a solution for the crash in load_binary_elf I
> > reported last week:
> >
> > http://lkml.org/lkml/2008/1/30/171
> >
> > The attached patch solves my problem.
> >
> > set_brk(start, end) allocs just page aligned regions (by "collapsing" both
> > extremes to the start of the page in which they lay)... That means than
> > even if both pointers are not equal there are still some chances that
> > set_brk has allocated no space at all because ELF_PAGEALIGN(elf_bss) ==
> > ELF_PAGEALIGN(elf_brk).
> >
> > So the condition was not correct.
>
> This patch is wrong.
>
> ELF_PAGEALIGN rounds up to the end of the page, not down to the start
> of the page. If elf_bss is in the middle of a page, set_brk allocates
> any additional pages after the one already allocated. elf_bss is the
> start of the area that needs to be zero initialized, elf_brk is its
> end. So if elf_bss != elf_brk then there's garbage mapped in BSS
> from the file and if you don't clear it some of your zero-initialized
> variables won't be zero initialized at all.
>
> In the linked message, set_brk is passed elf_bss so its actual
> arguments are set_brk (0xa3801, 0x000a4ec8). It should map one
> page. 0xa3801 should be an already mapped page, and clear_user should
> succeed in clearing it.
The bad news... Dan is rigth and I've made you loss your time because
of my initial missunderstanding of that piece of code.
Having your boss and the client continously asking if does Linux
already run on our board doesn't help to keep your mind clearness X)
The call to clear_user produced a page fault (it's true), but it was a
benign one (the some sort of page fault with permits load-on-demand of
the binary).
Then Rusell King's page fault handler
(arch/arm/mm/fault.c:do_DataAbort) manages to finally call the correct
non-architecture specific handler for load-on-demand.
This non-architecture specific handler allocs a physical page in which
clear_user can retry to write at a second try when the program
execution is restarted.
The "funny" thing is that now I am almost sure that I was playing with
Rusell's code and I accidentally left commented some of the code that
finally calls this non architecture specific code.
The good news for "the community" is that now I am very sure I can
refactor the overpatched (but correct) portion of code I
missunderstood and send it back to you when is already tested. There
are some easy clean ups I can do in order to make Adrian Bunk happy :D
Yours, Abel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/