Re: [PATCH V10 09/22] LoongArch: Add boot and setup routines

From: WANG Xuerui
Date: Sun May 15 2022 - 22:41:53 EST


Hi,

On 5/15/22 20:38, Huacai Chen wrote:
diff --git a/arch/loongarch/kernel/head.S b/arch/loongarch/kernel/head.S
new file mode 100644
index 000000000000..f0b3e76bb762
--- /dev/null
+++ b/arch/loongarch/kernel/head.S
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
+ */
+#include <linux/init.h>
+#include <linux/threads.h>
+
+#include <asm/addrspace.h>
+#include <asm/asm.h>
+#include <asm/asmmacro.h>
+#include <asm/regdef.h>
+#include <asm/loongarch.h>
+#include <asm/stackframe.h>
+#include <generated/compile.h>
+#include <generated/utsrelease.h>
+
+#ifdef CONFIG_EFI_STUB
+
+#include "efi-header.S"
+
+ __HEAD
+
+_head:
+ .word MZ_MAGIC /* "MZ", MS-DOS header */
+ .org 0x28
+ .ascii "Loongson\0" /* Magic number for BootLoader */
If you must use a magic number, "Loongson" is not recommended, because
this string lacks uniqueness in the Loongson/LoongArch world. Too many
things are called "Loongson foo" right now, and the string is so
ordinary people don't immediately think of it as "magic".

I recommended using some other interesting text (and encoding) for the
magic number, in a different communication venue, but I think that
proposal got ignored by you without any explanation whatsoever. For now
I'll just repeat myself:

For an interesting magic number related to Loongson/LoongArch/Loong
(like dragons but not exactly the same, let's not expand on that front)
in general, it's perhaps better to use GB18030-encoded four-character
dragon-related idioms. It's GB18030 because one Chinese character is 2
bytes in this encoding, and being non-UTF-8 it's unlikely any user input
would accidentally resemble it. So we get 8 bytes that appear as huge
negative numbers if cast into C long, and random enough that collisions
are highly unlikely.

For example, I chose 4 famous dragon-related phrases from the I Ching,
in both simplified and traditional characters:

潜龙勿用: 0xc7b1c1facef0d3c3
见龙在田: 0xbcfbc1fad4daccef
飞龙在天: 0xb7c9c1fad4daccec
亢龙有悔: 0xbfbac1fad3d0bbda
潛龍勿用: 0x9d93fd88cef0d3c3
見龍在田: 0xd28afd88d4daccef
飛龍在天: 0xef77fd88d4daccec
亢龍有悔: 0xbfbafd88d3d0bbda

and I think each of them is better than "Loongson".
ARM64_IMAGE_MAGIC is "ARM64", RISCV_IMAGE_MAGIC is "RISCV", so I think
we use "Loongson" as a magic is just OK.

Actually you made a good point here, that I failed to check for myself earlier.

Looking at the arm64 and riscv image header code more closely, it seems loongarch is trying to follow the now deprecated riscv-specific practice of using 8-byte magic (deprecated as of commit 474efecb65dce ("riscv: modify the Image header to improve compatibility with the ARM64 header")). In doing this they also changed the offset of the magic: on riscv it's at 0x30, while here it's at 0x28 (riscv's "res2" field). This is just the exact kind of "proliferation of image header formats" that we would want to avoid.

Now for some additional but important bikeshedding...

The current arm64 and riscv magic numbers are all 4-byte long, at offset 0x38, and they are cute little strings identifying their origin: "ARM\x64" and "RSC\x05" respectively. Thus, for loongarch, we probably want to do the same -- 4-byte nice little strings with a hint of LoongArch/Loong. Considering UTF-8 uses 3 bytes for most Chinese characters, and 4 bytes for characters outside of BMP, we could use a little bit of creativity here:

- "LA64", the "dullest" version with only ASCII characters, but I don't know if future LA32 systems will want to use the same image header format;
- "\xe9\xbe\x99\x64" ("龙\x64") or "\xe9\xbe\x8d\x64" ("龍\x64") -- 龙/龍 means "loong/dragon", hence a variant of the above;
- "\xf0\x9f\x90\xb2" ("🐲") or "\xf0\x9f\x90\x89" ("🐉") -- the loong/dragon emoji, taking full advantage of the 4 bytes available while not mentioning bitness.

A case might be made for pure-ASCII magic numbers, that they're easier for naked-eye inspection, but (1) this is already not the case for the new riscv magic, and (2) given all other interesting fields are in binary it's already necessary to use hex editors for any task more complex than mere identification.

So, I think the bottom line is: don't use the 8-byte magic at offset 0x28, switch to 4-byte magic at offset 0x38 to keep consistent with everyone else. I don't actually have a preference, but personally I'd prefer some freshness in the low-level land, if that doesn't hamper people's flows. ;-)