Re: [PATCH 1/2] gpu: nova-core: add initial driver stub

From: John Hubbard
Date: Sat Feb 01 2025 - 14:44:12 EST

Next message: Andy Shevchenko: "Re: [PATCH 00/13] gpiolib: add gpiods_set_array_value_cansleep"
Previous message: Ferass El Hafidi: "Re: [PATCH 0/2] Add support for Xiaomi Mi TV Stick"
In reply to: Danilo Krummrich: "Re: [PATCH 1/2] gpu: nova-core: add initial driver stub"
Next in thread: Joel Fernandes: "Re: [PATCH 1/2] gpu: nova-core: add initial driver stub"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2/1/25 5:52 AM, Danilo Krummrich wrote:

On Fri, Jan 31, 2025 at 08:01:00PM -0800, John Hubbard wrote:

On 1/31/25 2:04 PM, Danilo Krummrich wrote:

...

+/// Enum representation of the GPU generation.
+#[derive(Debug)]
+pub(crate) enum CardType {
+ /// Turing
+ TU100 = 0x160,
+ /// Ampere
+ GA100 = 0x170,
+ /// Ada Lovelace
+ AD100 = 0x190,
+}
+
+/// Structure holding the metadata of the GPU.
+#[allow(dead_code)]
+pub(crate) struct GpuSpec {
+ /// Contents of the boot0 register.
+ boot0: u64,

It is redundant to store boot0, when all of the following fields
are deduced from boot0.

Yes, I think we can probably remove it, I only use it to print it in Gpu::new()
as a sign of life and because I don't know if boot0 contains any other useful
information than chipset and chiprev.

But maybe you can help me out here? :) That is, share the register layout and
field names. This way I could also get rid of those magic numbers, and put in
proper naming for fields, masks and shifts.

We've open sourced those, starting around 2017, and for example the Ampere
fields for boot0 are here:

https://github.com/NVIDIA/open-gpu-doc/blob/master/manuals/ampere/ga100/dev_boot.ref.txt

If more information is needed in this area, let me know and we might be able
to open source it, depending of course on what it is. It's just explaining what
the chip is, after all.

...

+// TODO replace with something like derive(FromPrimitive)
+impl CardType {
+ fn from_u32(value: u32) -> Option<CardType> {
+ match value {
+ 0x160 => Some(CardType::TU100),
+ 0x170 => Some(CardType::GA100),
+ 0x190 => Some(CardType::AD100),

Is this how nouveau does it too? I mean, classifying cards as GA100,
and variants as TU102. It feels wrong to me, because we have for example
GA100 GPUs. I mean, GA100 is the same kind of thing as a GA102: each is
a GPU.

Yes, that's what nouveau came up with and it's meant as e.g. 'GA1xx'. But yes,
I agree it's a bit confusing.

OOC, what about the first digit in this example? For Blackwell it would seem to
be 'GB2xx'. Can you shed some light on this?

I'll take a homework assignment to go dig that up. After GSP and Open RM came
out, we stopped explicitly publishing to https://github.com/NVIDIA/open-gpu-doc ,
because the same information was being published in
https://github.com/NVIDIA/open-gpu-kernel-modules .

If I were naming card types, I'd calling them by their architecture names:
Turing, Ampere, Ada.

Yeah, that is probably less cryptic. Plus, we should probably name it something
around the term "architecture" rather than "CardType".

Yes, "architecture" is accurate.

+ _ => None,
+ }
+ }
+}
+
+impl GpuSpec {
+ fn new(bar: &Devres<Bar0>) -> Result<GpuSpec> {
+ let bar = bar.try_access().ok_or(ENXIO)?;
+ let boot0 = u64::from_le(bar.readq(0));
+ let chip = ((boot0 & 0x1ff00000) >> 20) as u32;
+
+ if boot0 & 0x1f000000 == 0 {
+ return Err(ENODEV);
+ }
+
+ let Some(chipset) = Chipset::from_u32(chip) else {
+ return Err(ENODEV);
+ };
+
+ let Some(card_type) = CardType::from_u32(chip & 0x1f0) else {
+ return Err(ENODEV);
+ };
+
+ Ok(Self {
+ boot0,
+ card_type,
+ chipset,
+ chiprev: (boot0 & 0xff) as u8,
+ })
+ }
+}
+
+impl Firmware {
+ fn new(dev: &device::Device, spec: &GpuSpec, ver: &str) -> Result<Firmware> {
+ let mut chip_name = CString::try_from_fmt(fmt!("{:?}", spec.chipset))?;
+ chip_name.make_ascii_lowercase();
+
+ let fw_booter_load_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_load-{}.bin", &*chip_name, ver))?;
+ let fw_booter_unload_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_unload-{}.bin", &*chip_name, ver))?;
+ let fw_gsp_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/gsp-{}.bin", &*chip_name, ver))?;
+
+ let booter_load = firmware::Firmware::request(&fw_booter_load_path, dev)?;
+ let booter_unload = firmware::Firmware::request(&fw_booter_unload_path, dev)?;
+ let gsp = firmware::Firmware::request(&fw_gsp_path, dev)?;
+
+ Ok(Firmware {
+ booter_load,
+ booter_unload,
+ gsp,
+ })
+ }
+}
+
+impl Gpu {
+ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
+ let spec = GpuSpec::new(&bar)?;
+ let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;

lol there it is: our one, "stable" set of GSP firmware. Maybe a one line comment
above might be appropriate, to mention that this is hardcoded, but new firmware
versions will not be. On the other hand, that's obvious. :)

Well, I guess we'll have to probe what the distribution provides us with and see
if that's supported and sufficient for the chipset we try to initialize.

Oh, I think we can leave it alone for a moment. We're working hard on a
firmware plan to make things better.

thanks,
--
John Hubbard

Next message: Andy Shevchenko: "Re: [PATCH 00/13] gpiolib: add gpiods_set_array_value_cansleep"
Previous message: Ferass El Hafidi: "Re: [PATCH 0/2] Add support for Xiaomi Mi TV Stick"
In reply to: Danilo Krummrich: "Re: [PATCH 1/2] gpu: nova-core: add initial driver stub"
Next in thread: Joel Fernandes: "Re: [PATCH 1/2] gpu: nova-core: add initial driver stub"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]