Re: Kernel prctl feature for syscall interception and emulation

From: Paul Gofman
Date: Thu Nov 19 2020 - 16:19:41 EST


On 11/19/20 23:54, Paul Gofman wrote:
> On 11/19/20 20:57, David Laight wrote:
>>>> The Windows code is not completely loaded at initialization time. It
>>>> also has dynamic libraries loaded later. yes, wine knows the memory
>>>> regions, but there is no guarantee there is a small number of segments
>>>> or that the full picture is known at any given moment.
>>> Yes, I didn't mean it was known statically at init time (although
>>> maybe it can be; see below) just that all the code doing the loading
>>> is under Wine's control (vs having system dynamic linker doing stuff
>>> it can't reliably see, which is the case with host libraries).
>> Since wine must itself make the mmap() system calls that make memory
>> executable can't it arrange for windows code and linux code to be
>> above/below some critical address?
>>
>> IIRC 32bit windows has the user/kernel split at 2G, so all the
>> linux code could be shoe-horned into the top 1GB.
>>
>> A similar boundary could be picked for 64bit code.
>>
>> This would probably require flags to mmap() to map above/below
>> the specified address (is there a flag for the 2G boundary
>> these days - wine used to do very horrid things).
>> It might also need a special elf interpreter to load the
>> wine code itself high.
>>
> Wine does not control the loading of native libraries (which are subject
> to ASLR and thus do not necessarily exactly follow mmap's top down
> order). Wine is also not free to choose where to load the Windows
> libraries. Some of Win libraries are relocatable, some are not. Even
> those relocatable are still often assumed to be loaded at the base
> address specified in PE, with assumption made either by library itself
> or DRM or sandboxing / hotpatching / interception code from around.
>
> Also, it is very common to DRMs to unpack the encrypted code to a newly
> allocated segment (which gives no clue at the moment of allocation
> whether it is going to be executable later), and then make it
> executable. There are a lot of tricks about that and such code sometimes
> assumes very specific (and Windows implementation dependent) things, in
> particular, about the memory layout. Windows VirtualAlloc[Ex] gives the
> way to request top down or bottom up allocation order, as well as
> specific allocation address. The latter is not guaranteed to succeed of
> course just like on Linux for obvious reasons, but if specific (high)
> address ranges  always have some space available on Windows, then there
> are the apps in the wild which depend of that, as far as our practice goes.
>
> If we were given mmap flag for specifying memory allocation boundary,
> and also a sort of process-wide dlopen() config option for specifying
> that boundary for every host shared library load, the address space
> separation could probably work... until we hit a tricky case when the
> app wants to get a memory specifically high address range. I think we
> can't do that cleanly as both Windows and Linux currently have the same
> 128TB limit for user address space on x64 and we've got no spare space
> to safely put native code without potential interference with Windows code.
>
Maybe it is also interesting to mention that the initial Gabriel's
patches version was introducing the emulation trigger by specifying a
flag for memory region through mprotect(), so we could mark the regions
calls from which should be trapped. That would be probably the easiest
possible solution in terms of using that in Wine (as no memory allocated
by Wine itself is supposed to contain native host syscalls) but that
idea was not accepted. Mainly because, as I understand, such a
functionality does not belong to VM management.