Hi Zach and thanks for you really great review!
Zachary Amsden wrote:Overall, this looks like a great cleanup. But I think there are some details with the patch that need fixing. It is extremely hard to follow the entry.S changes based on patches alone - which tree does this apply cleanly to?This should cleanly apply to 2.6.18-rc2-mm1.
I would like to see the final result in entry.S, there areI'll send it to you privately.
still some details (dealing with eliminating the push/pop of %eax as a temporary for the segment comparison) which I could not follow exactly in patch format.The reason for removal was that I also removed the xorl %eax,%eax, decl %eax
stuff from error_code (replaced with $-1, as you've seen), and so preserving
%eax became unnecessary I think.
Because I simply broke my head over that code. If you say sti shouldiret_exc:Why did you remove the sti here?
- TRACE_IRQS_ON
- sti
pushl $0 # no error code
pushl $do_iret_error
jmp error_code
stay - I beleive you. But please, explain me how it works, before I
went crazy! :)
I read the iret pseudo-code in the Intel manuals carefully. It first
pops the iret frame into temporaries, then checks the values for
validity, and then either faults or writes the values to the real
registers. Now, if we have a bad user's iret frame, the iret will
pop it, raise an exception and push a kernel's iret frame, while
the popped user's frame at that point should be completely lost,
as far as I can understand. Now we go into GPF handler, then to a
fixup, push $0 as an error code, but what iret frame is above that
error code? Where does it come from? Doesn't the iret loses the
user's iret frame completely? Certainly I am missing something obvious
here, but I just can't follow that magic...
Did you test this with deliberately faulting IRETs? I think this is a bug.Yes, I tested that and it works (I have a rather extensive test-suite
for that problem - faulting iret is always tested). But since I
don't understand the magic, I can't say why it works.
Bigger problem! (*!) It is _unsafe_ to call C-code here.Ow, holly shit!!! What a disaster... But you are right. Well,
this will make my new patch really big...
You did not introduce this bug, I just spotted it now, and the old code has the same problem.You are wrong here - I introduced that bug as the old code is
also mine. Linus attributed it to someone else though, probably
by mistake (or did he just wanted to rescue me from the rant about
that code? :). But it was mine.
There are two options I see to fixing this problem. One is to move the GDT fixup code into assembler. The other is to compile the GDT fixup code in a separate .o file with exactly specified CFLAGS to make sure -fomit-frame-pointer is not passed to it.Well, unsafe is unsafe. If we can't explicitly tell gcc to prefix the
%ebp accesses with %ds, then I'd better go for an asm version.
Since, when calculating the base, I do &-THREAD_SIZE, I guess the minimal- .quad 0x0000920000000000 /* 0xd0 - ESPFIX 16-bit SS */Seems a bit dangerous to allow access to full 4GB through this. Can you tighten the limit any? I suppose not, because the high bits in %esp really could be anything. But it might be nice to try setting the limit to regs->esp + THREAD_SIZE. Of course, this is not strictly necessary, just an extra paranoid protection mechanism.
+ .quad 0x00cf92000000ffff /* 0xd0 - ESPFIX SS */
safe limit is regs->esp + THREAD_SIZE*2... Well, may just I not do that please? :)
For what, btw? There are no such a things for __KERNEL_DS or anything, so
I just don't see the necessity.