RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver

From: Shiju Jose
Date: Mon Mar 10 2025 - 10:38:15 EST


>-----Original Message-----
>From: Shiju Jose
>Sent: 10 March 2025 11:12
>To: 'Daniel Ferguson' <danielf@xxxxxxxxxxxxxxxxxxxxxx>; linux-
>edac@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; bp@xxxxxxxxx;
>tony.luck@xxxxxxxxx; rafael@xxxxxxxxxx; lenb@xxxxxxxxxx;
>mchehab@xxxxxxxxxx; leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx
>Cc: linux-cxl@xxxxxxxxxxxxxxx; dan.j.williams@xxxxxxxxx; dave@xxxxxxxxxxxx;
>Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
>alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>david@xxxxxxxxxx; Vilas.Sridharan@xxxxxxx; linux-mm@xxxxxxxxx; linux-
>kernel@xxxxxxxxxxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
>Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx; dferguson@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>; Linuxarm
><linuxarm@xxxxxxxxxx>
>Subject: RE: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
>
>>-----Original Message-----
>>From: Daniel Ferguson <danielf@xxxxxxxxxxxxxxxxxxxxxx>
>>Sent: 07 March 2025 21:52
>>To: Shiju Jose <shiju.jose@xxxxxxxxxx>; linux-edac@xxxxxxxxxxxxxxx;
>>linux- acpi@xxxxxxxxxxxxxxx; bp@xxxxxxxxx; tony.luck@xxxxxxxxx;
>>rafael@xxxxxxxxxx; lenb@xxxxxxxxxx; mchehab@xxxxxxxxxx;
>>leo.duran@xxxxxxx; Yazen.Ghannam@xxxxxxx
>>Cc: linux-cxl@xxxxxxxxxxxxxxx; dan.j.williams@xxxxxxxxx;
>>dave@xxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>;
>>dave.jiang@xxxxxxxxx; alison.schofield@xxxxxxxxx;
>>vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx; david@xxxxxxxxxx;
>>Vilas.Sridharan@xxxxxxx; linux-mm@xxxxxxxxx; linux-
>>kernel@xxxxxxxxxxxxxxx; rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx;
>>Jon.Grimm@xxxxxxx; dave.hansen@xxxxxxxxxxxxxxx;
>>naoya.horiguchi@xxxxxxx; james.morse@xxxxxxx; jthoughton@xxxxxxxxxx;
>>somasundaram.a@xxxxxxx; erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx;
>>duenwen@xxxxxxxxxx; gthelen@xxxxxxxxxx;
>wschwartz@xxxxxxxxxxxxxxxxxxx;
>>dferguson@xxxxxxxxxxxxxxxxxxx; wbs@xxxxxxxxxxxxxxxxxxxxxx;
>>nifan.cxl@xxxxxxxxx; tanxiaofei <tanxiaofei@xxxxxxxxxx>; Zengtao (B)
>><prime.zeng@xxxxxxxxxxxxx>; Roberto Sassu <roberto.sassu@xxxxxxxxxx>;
>>kangkang.shen@xxxxxxxxxxxxx; wanghuiqiang <wanghuiqiang@xxxxxxxxxx>;
>>Linuxarm <linuxarm@xxxxxxxxxx>
>>Subject: Re: [PATCH v2 3/3] ras: mem: Add memory ACPI RAS2 driver
>>
>>
>>> +static int ras2_hw_scrub_read_size(struct device *dev, void
>>> +*drv_data, u64 *size) {
>>> + struct ras2_mem_ctx *ras2_ctx = drv_data;
>>> + int ret;
>>> +
>>> + if (ras2_ctx->bg_scrub)
>>> + return -EBUSY;
>>> +
>>> + ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + *size = ras2_ctx->size;
>>> +
>>> + return 0;
>>> +}
>>
>>Calling ras2_update_patrol_scrub_params_cache here is problematic.
>>
>>Imagine:
>> echo 0x1000 > size
>> cat size
>> echo 0x2000000000 > addr
>>
>>What happens here? What happens is the scrub range is not what you
>>expect it to be. Once you cat size, you reset the size from what you initially set
>it to.
>>I don't think that is what anyone will expect. It certainly caused us
>>to stumble while testing.
>
>This is an expected behavior and this extra call was added here when changed
>using attribute 'addr' to start the on-demand scrub operation instead of
>previous separate attribute ' enable_on_demand' to start the on-demand scrub
>operation, according to Borislav's suggestion in v13.
>
> Please see the following comment in the ras2_hw_scrub_read_addr() fnction,
>"Userspace will get the status of the demand scrubbing through the address
>range read from the firmware. When the demand scrubbing is finished
>firmware must reset actual address range to 0. Otherwise userspace assumes
>demand scrubbing is in progress."
>
>Here sysfs attributes 'addr' and 'size' is reading the field: Actual Address Range
>of Table 5.87: Parameter Block Structure for PATROL_SCRUB, written by the
>firmware.
>
>In my opinion, reading back the address range size set in the sysfs before
>actually writing the address range to the firmware and starting the on-demand
>scrub operation doesn't hold much significance?

After further discussion, I will add a fix for this case to return the 'size' which the user set in the sysfs
until the scrubbing is started.

Thanks,
Shiju
>