Re: [PATCH 0/8] dax/kmem: atomic whole-device hotplug via sysfs

From: David Hildenbrand (Arm)

Date: Mon Apr 13 2026 - 15:41:41 EST


On 4/13/26 21:38, Gregory Price wrote:
> On Mon, Apr 13, 2026 at 05:47:01PM +0200, David Hildenbrand (Arm) wrote:
>>
>> So, really only user space can try offlining the memory after requested
>> onlining succeeded.
>>
>> I don't think any udev rules do that? The usually only request to
>> online, which should be fine.
>>
>
> In the offline case the block will cease to exist after offlin/remove
> returns, so what should happen is the race on offline just fails and
> the stale object cleans itself up on the way out after failure.
>
> Userland temporarily sees a stale memory block but can't do anything
> with it because sync'd on hotplug lock.

Exactly.

>
>> So if a user does that manually, good for him. We just have to make sure
>> that stuff keeps working as expected.
>>
>
> Yeah the only catch is if a user does something dumb like
>
> cat block/state -> online_movable
> echo offline > block/state
> echo online > block/state
>
> But like... don't do that :]

Yes :)

>
> Udev won't ever offline-online-race like this, so it's not a real issue.
>
>> Or am I missing a case?
>>
>
> So yeah, I'm fairly confident this just works.
>
>>
>> I'll note that offline_and_remove_memory() can take a long time/forever
>> to succeed. User space can abort it by sending a critical signal.
>>
>> For example, if you do
>>
>> $ echo "unplugged" > magic_device_file
>>
>> And it hangs, user space can kill the "echo" command, sending a fatal
>> signal and making offline_and_remove_memory() fail.
>>
>> The question is, if you want to do your best to revert the other offline
>> operations and try re-adding/onlining what you already offlined.
>>
>> offline_and_remove_memory() handles that much nicer internally, as it
>> tries to revert offlining, and only removes once everything was offlined.
>>
>> I think I raised it previously, but you could add a
>> offline_and_remove_memory_ranges() that consumes multiple ranges, and
>> would do this for you under a single lock_device_hotplug().
>>
>
> I don't think this is a very large lift, just a slightly larger hotplug
> locking scope. But then - the per-range thing in this set should just
> work, so let me know if it's worth the extra churn.

Well, you can rather cleanly undo the operation if any offlining fails.
If that's not a requirement, then no need to change it!

--
Cheers,

David