Re: [PATCH 08/11] x86: document X86_INTEL_MID as 64-bit-only

From: Arnd Bergmann
Date: Fri Dec 06 2024 - 09:27:50 EST


On Fri, Dec 6, 2024, at 12:23, Ferry Toth wrote:
> Op 04-12-2024 om 19:55 schreef Andy Shevchenko:
>>
>> It's all other way around (from SW point of view). For unknown reasons
>> Intel decided to release only 32-bit SW and it became the only thing
>> that was heavily tested (despite misunderstanding by some developers
>> that pointed finger to the HW without researching the issue that
>> appears to be purely software in a few cases) _that_ time. Starting
>> ca. 2017 I enabled 64-bit for Merrifield and from then it's being used
>> by both 32- and 64-bit builds.
>>
>> I'm totally fine to drop 32-bit defaults for Merrifield/Moorefield,
>> but let's hear Ferry who might/may still have a use case for that.
>
> Do to the design of SLM if found (and it is also documented in Intel's
> HW documentation)
>
> that there is a penalty introduced when executing certain instructions
> in 64b mode. The one I found
>
> is crc32di, running slower than 2 crc32si in series. Then there are
> other instructions seem to runs faster in 64b mode.
>
> And there is of course the usual limited memory space than could benefit
> for 32b mode. I never tried the mixed (x86_32?)
>
> mode. But I am building and testing both i686 and x86_64 for each Edison
> image.

Hi Ferry,

Thanks a lot for the detailed reply, this is exactly the kind of
information I was hoping to get out of my series, in particular
since we have a lot of the same tradeoffs on low-end 64-bit
Arm platforms, and I've been trying to push users toward running
64-bit kernels on those.

I generally think that it makes a lot of sense to run 32-bit
userspace on memory limited devices, in particular with less
than 512MB, but it's often still useful on devices with 1GB.

Running a 32-bit kernel is usually not worth it if you can
avoid it, and with 1GB of RAM you definitely run into limits
either from using HIGHMEM (with CONFIG_VMSPLIT_3G) or in
user addressing (with any other VMPLIT_*), in addition to the
32-bit kernels just being less well maintained and missing
security features.

Using a 64-bit kernel with CONFIG_COMPAT for 32-bit userspace
tends to be the best combination for a large number of
embedded workloads. As a rough estimate on Arm hardware,
I found that a 64-bit kernel tends to use close to twice
the amount of RAM for itself (vmlinux, slab caches, page
tables, mem_map[]) compared to a 32-bit kernel, but this
should be no more than 10-20% of the total RAM for sensible
workloads as all the interesting bits happen in userland.
I expect the numbers to be similar for x86, but have not
looked in detail.

In userspace there is more variation depending on the type
of application: the base system has a similar 2x ratio, but
once you get into data intensive tasks (file server,
networking, image/video processing, ...) the overhead of
64-bit userspace is lower because the size of the actual
data is the same on both.

For the specific case of the crc32di instruction, I
suspect the in-kernel version of this can be trivially
changed like

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 52c5d47ef5a1..60b9b3cab679 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -60,10 +60,10 @@ static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len
{
unsigned int iquotient = len / SCALE_F;
unsigned int iremainder = len % SCALE_F;
- unsigned long *ptmp = (unsigned long *)p;
+ unsigned int *ptmp = (unsigned int *)p;

while (iquotient--) {
- asm(CRC32_INST
+ asm("crc32l %1, %0"
: "+r" (crc) : "rm" (*ptmp));
ptmp++;
}

to get you the faster version, plus some form of
configurability to make sure other CPUs still get the
crc32q version by default.

> I think that should at minimum be useful to catch 32b errors in the
> kernel in certain areas (shared with other 32b
> archs. So, I would prefer 32b support for this platform to continue.

I can certainly see this both ways, on the one hand I do
care a lot about 32-bit Arm platforms and appreciate the help
in finding issues on 32-bit kernels. On the other hand I
really don't want anyone to waste time testing something that
should never be used in practice and keeping a feature in
the kernel only for the purpose of regression testing that
feature.

The platform is also special enough that I don't see
testing it in 32-bit mode as particularly helpful to
others, and it's unlikely to catch bugs that testing in
KVM won't.

Testing your 32-bit userland with a 64-bit kernel would be
helpful of course to ensure it keeps working for anyone
that had been using 32-bit kernel+userspace if we drop
32-bit kernel support for it.

One related idea that I've discussed before is to have
32-bit kernels refuse to boot on 64-bit hardware and
instead print the URL of a wiki page to explain all of
the above. There would probably have to be whitelist
of platforms that are buggy in 64-bit mode, and a command
line option to revert back to the previous behavior
to allow testing.

Arnd