On Fri, Jan 31, 2025 at 08:01:00PM -0800, John Hubbard wrote:...
On 1/31/25 2:04 PM, Danilo Krummrich wrote:
+/// Enum representation of the GPU generation.
+#[derive(Debug)]
+pub(crate) enum CardType {
+ /// Turing
+ TU100 = 0x160,
+ /// Ampere
+ GA100 = 0x170,
+ /// Ada Lovelace
+ AD100 = 0x190,
+}
+
+/// Structure holding the metadata of the GPU.
+#[allow(dead_code)]
+pub(crate) struct GpuSpec {
+ /// Contents of the boot0 register.
+ boot0: u64,
It is redundant to store boot0, when all of the following fields
are deduced from boot0.
Yes, I think we can probably remove it, I only use it to print it in Gpu::new()
as a sign of life and because I don't know if boot0 contains any other useful
information than chipset and chiprev.
But maybe you can help me out here? :) That is, share the register layout and
field names. This way I could also get rid of those magic numbers, and put in
proper naming for fields, masks and shifts.
+// TODO replace with something like derive(FromPrimitive)
+impl CardType {
+ fn from_u32(value: u32) -> Option<CardType> {
+ match value {
+ 0x160 => Some(CardType::TU100),
+ 0x170 => Some(CardType::GA100),
+ 0x190 => Some(CardType::AD100),
Is this how nouveau does it too? I mean, classifying cards as GA100,
and variants as TU102. It feels wrong to me, because we have for example
GA100 GPUs. I mean, GA100 is the same kind of thing as a GA102: each is
a GPU.
Yes, that's what nouveau came up with and it's meant as e.g. 'GA1xx'. But yes,
I agree it's a bit confusing.
OOC, what about the first digit in this example? For Blackwell it would seem to
be 'GB2xx'. Can you shed some light on this?
If I were naming card types, I'd calling them by their architecture names:
Turing, Ampere, Ada.
Yeah, that is probably less cryptic. Plus, we should probably name it something
around the term "architecture" rather than "CardType".
+ _ => None,
+ }
+ }
+}
+
+impl GpuSpec {
+ fn new(bar: &Devres<Bar0>) -> Result<GpuSpec> {
+ let bar = bar.try_access().ok_or(ENXIO)?;
+ let boot0 = u64::from_le(bar.readq(0));
+ let chip = ((boot0 & 0x1ff00000) >> 20) as u32;
+
+ if boot0 & 0x1f000000 == 0 {
+ return Err(ENODEV);
+ }
+
+ let Some(chipset) = Chipset::from_u32(chip) else {
+ return Err(ENODEV);
+ };
+
+ let Some(card_type) = CardType::from_u32(chip & 0x1f0) else {
+ return Err(ENODEV);
+ };
+
+ Ok(Self {
+ boot0,
+ card_type,
+ chipset,
+ chiprev: (boot0 & 0xff) as u8,
+ })
+ }
+}
+
+impl Firmware {
+ fn new(dev: &device::Device, spec: &GpuSpec, ver: &str) -> Result<Firmware> {
+ let mut chip_name = CString::try_from_fmt(fmt!("{:?}", spec.chipset))?;
+ chip_name.make_ascii_lowercase();
+
+ let fw_booter_load_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_load-{}.bin", &*chip_name, ver))?;
+ let fw_booter_unload_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/booter_unload-{}.bin", &*chip_name, ver))?;
+ let fw_gsp_path =
+ CString::try_from_fmt(fmt!("nvidia/{}/gsp/gsp-{}.bin", &*chip_name, ver))?;
+
+ let booter_load = firmware::Firmware::request(&fw_booter_load_path, dev)?;
+ let booter_unload = firmware::Firmware::request(&fw_booter_unload_path, dev)?;
+ let gsp = firmware::Firmware::request(&fw_gsp_path, dev)?;
+
+ Ok(Firmware {
+ booter_load,
+ booter_unload,
+ gsp,
+ })
+ }
+}
+
+impl Gpu {
+ pub(crate) fn new(pdev: &pci::Device, bar: Devres<Bar0>) -> Result<impl PinInit<Self>> {
+ let spec = GpuSpec::new(&bar)?;
+ let fw = Firmware::new(pdev.as_ref(), &spec, "535.113.01")?;
lol there it is: our one, "stable" set of GSP firmware. Maybe a one line comment
above might be appropriate, to mention that this is hardcoded, but new firmware
versions will not be. On the other hand, that's obvious. :)
Well, I guess we'll have to probe what the distribution provides us with and see
if that's supported and sufficient for the chipset we try to initialize.