Re: [PATCH v1] proc/vmcore: don't fake reading zeroes on surprise vmcore_cb unregistration
From: Baoquan He
Date: Thu Nov 11 2021 - 22:31:22 EST
On 11/11/21 at 08:22pm, David Hildenbrand wrote:
> In commit cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback
> to more generic vmcore callbacks"), we added detection of surprise
> vmcore_cb unregistration after the vmcore was already opened. Once
> detected, we warn the user and simulate reading zeroes from that point on
> when accessing the vmcore.
>
> The basic reason was that unexpected unregistration, for example, by
> manually unbinding a driver from a device after opening the
> vmcore, is not supported and could result in reading oldmem the
> vmcore_cb would have actually prohibited while registered. However,
> something like that can similarly be trigger by a user that's really
> looking for trouble simply by unbinding the relevant driver before opening
> the vmcore -- or by disallowing loading the driver in the first place.
> So it's actually of limited help.
Yes, this is the change what I would like to see in the original patch
"proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks".
I am happy with this patch appended to commit cc5f2704c934.
>
> Currently, unregistration can only be triggered via virtio-mem when
> manually unbinding the driver from the device inside the VM; there is no
> way to trigger it from the hypervisor, as hypervisors don't allow for
> unplugging virtio-mem devices -- ripping out system RAM from a VM without
> coordination with the guest is usually not a good idea.
>
> The important part is that unbinding the driver and unregistering the
> vmcore_cb while concurrently reading the vmcore won't crash the system,
> and that is handled by the rwsem.
>
> To make the mechanism more future proof, let's remove the "read zero"
> part, but leave the warning in place. For example, we could have a future
> driver (like virtio-balloon) that will contact the hypervisor to figure out
> if we already populated a page for a given PFN. Hotunplugging such a device
> and consequently unregistering the vmcore_cb could be triggered from the
> hypervisor without harming the system even while kdump is running. In that
I am a little confused, could you tell more about "contact the hypervisor to
figure out if we already populated a page for a given PFN."? I think
vmcore dumping relies on the eflcorehdr which is created beforehand, and
relies on vmcore_cb registered in advance too, virtio-balloon could take
another way to interact with kdump to make sure the dumpable ram?
Nevertheless, this patch looks good to me, thanks.
Acked-by: Baoquan He <bhe@xxxxxxxxxx>
> case, we don't want to silently end up with a vmcore that contains wrong
> data, because the user inside the VM might be unaware of the hypervisor
> action and might easily miss the warning in the log.
>
> Cc: Dave Young <dyoung@xxxxxxxxxx>
> Cc: Baoquan He <bhe@xxxxxxxxxx>
> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Philipp Rudo <prudo@xxxxxxxxxx>
> Cc: kexec@xxxxxxxxxxxxxxxxxxx
> Cc: linux-mm@xxxxxxxxx
> Cc: linux-fsdevel@xxxxxxxxxxxxxxx
> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
> ---
> fs/proc/vmcore.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 30a3b66f475a..948691cf4a1a 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -65,8 +65,6 @@ static size_t vmcoredd_orig_sz;
> static DECLARE_RWSEM(vmcore_cb_rwsem);
> /* List of registered vmcore callbacks. */
> static LIST_HEAD(vmcore_cb_list);
> -/* Whether we had a surprise unregistration of a callback. */
> -static bool vmcore_cb_unstable;
> /* Whether the vmcore has been opened once. */
> static bool vmcore_opened;
>
> @@ -94,10 +92,8 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
> * very unusual (e.g., forced driver removal), but we cannot stop
> * unregistering.
> */
> - if (vmcore_opened) {
> + if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback unregistration\n");
> - vmcore_cb_unstable = true;
> - }
> up_write(&vmcore_cb_rwsem);
> }
> EXPORT_SYMBOL_GPL(unregister_vmcore_cb);
> @@ -108,8 +104,6 @@ static bool pfn_is_ram(unsigned long pfn)
> bool ret = true;
>
> lockdep_assert_held_read(&vmcore_cb_rwsem);
> - if (unlikely(vmcore_cb_unstable))
> - return false;
>
> list_for_each_entry(cb, &vmcore_cb_list, next) {
> if (unlikely(!cb->pfn_is_ram))
> @@ -577,7 +571,7 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
> * looping over all pages without a reason.
> */
> down_read(&vmcore_cb_rwsem);
> - if (!list_empty(&vmcore_cb_list) || vmcore_cb_unstable)
> + if (!list_empty(&vmcore_cb_list))
> ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
> else
> ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
>
> base-commit: debe436e77c72fcee804fb867f275e6d31aa999c
> --
> 2.31.1
>