[PATCH 06/10] mm/oom_debug: Add Select Vmalloc Entries Print

From: Edward Chron
Date: Mon Aug 26 2019 - 15:37:04 EST


Add OOM Debug code to allow select vmalloc entries to be printed output
at the time of an OOM event. Listing some portion of the larger vmalloc
entries has proven useful in tracking memory usage during an OOM event
so the root cause of the event can be determined.

Configuring this OOM Debug Option (DEBUG_OOM_VMALLOC_SELECT_PRINT)
------------------------------------------------------------------
To configure this option it needs to be selected in the OOM Debugging
configure menu. The kernel configuration entry can be found in the
config at: Kernel hacking, Memory Debugging, OOM Debugging with the
DEBUG_OOM_VMALLOC_SELECT_PRINT config entry that configures this option.

Two dynamic OOM debug settings for this option: enable, tenthpercent
--------------------------------------------------------------------
The oom debugfs base directory is found at: /sys/kernel/debug/oom.
The oom debugfs for this option is: vmalloc_select_print_
and for select options there are two files, the enable file and
the tenthpercent file are the debugfs files.

Dynamic disable or re-enable this OOM Debug option
--------------------------------------------------
This option may be disabled or re-enabled using the debugfs entry for
this OOM debug option. The debugfs file to enable this entry is found
at: /sys/kernel/debug/oom/vmalloc_select_print_enabled where the enabled
file's value determines whether the facility is enabled or disabled.
A value of 1 is enabled (default) and a value of 0 is disabled.

Specifying the minimum entry size (0-1000) in the tenthpercent file
-------------------------------------------------------------------
Also for DEBUG_OOM_VMALLOC_SELECT_PRINT the number of vmalloc entries
printed can be adjusted. By default if the DEBUG_OOM_VMALLOC_SELECT_PRINT
config option is enabled only entries that use 1% or more of memory are
printed. This can be adjusted to be entries as small as 0% of memory
or as large as 100% of memory in which case only a summary line is
printed, as no vmalloc entry could possibly use 100% of memory.
Adjustments are made through the debugfs file found at:
/sys/kernel/debug/oom/vmalloc_select_print_tenthpercent
Entry values that are valid are 0 through 1000 which represent memory
usage of 0% of memory to 100% of memory. Only entries that are using
at least one page of memory are printed even if the minimum entry
size is specified as 0, zero page entries have no memory assigned.

Content of Vmalloc entry records and Vmalloc summary record
-----------------------------------------------------------
The output is vmalloc entry information output limited such that only
entries equal to or larger than the minimum size are printed.
Unused vmallocs (no pages assigned to the vmalloc) are never printed.
The vmalloc entry information includes:
- Size (in bytes)
- pages (Number pages in use)
- Caller Information to identify the request

Additional output consists of summary information that is printed
at the end of the output. This summary information includes:
- Number of Vmalloc entries examined
- Number of Vmalloc entries printed
- minimum entry size for selection

Sample Output
-------------
Output produced consists of one line of output for each vmalloc entry
that is equal to or larger than the minimum entry size specified
by the percent_totalpages_print_limit (0% to 100.0%) followed by
one line of summary output. There is also a section header output
line and a summary line that are printed.

Sample Vmalloc entries section header:

Aug 19 19:27:01 coronado kernel: Vmalloc Info:

Sample per entry selected print line output:

Jul 22 20:16:09 yoursystem kernel: Vmalloc size=2625536 pages=640
caller=__do_sys_swapon+0x78e/0x1130

Sample summary print line output:

Jul 22 19:03:26 yoursystem kernel: Summary: Vmalloc entries examined:1070
printed:989 minsize:0kB


Signed-off-by: Edward Chron <echron@xxxxxxxxxx>
---
include/linux/vmalloc.h | 12 ++++++++++++
mm/Kconfig.debug | 28 +++++++++++++++++++++++++++
mm/oom_kill_debug.c | 21 ++++++++++++++++++++
mm/vmalloc.c | 43 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 104 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 9b21d0047710..09e3257fc382 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -227,4 +227,16 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
int register_vmap_purge_notifier(struct notifier_block *nb);
int unregister_vmap_purge_notifier(struct notifier_block *nb);

+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+/**
+ * Routine used to print select vmalloc entries on an OOM event so we
+ * can identify sizeable entries that may have a significant effect on
+ * kernel memory utilization. Output goes to dmesg along with all the OOM
+ * related messages when the config option DEBUG_OOM_VMALLOC_SELECT_PRINT
+ * is set to yes. The Option may be dyanmically enabled or disabled and
+ * the selection size is also dynamically configureable.
+ */
+extern void vmallocinfo_oom_print(unsigned long min_kb);
+#endif /* CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT */
+
#endif /* _LINUX_VMALLOC_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index c7d53ca95d32..ea3465343286 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -219,3 +219,31 @@ config DEBUG_OOM_SLAB_SELECT_PRINT
print limit value of 10 or 1% of memory.

If unsure, say N.
+
+config DEBUG_OOM_VMALLOC_SELECT_PRINT
+ bool "Debug OOM Select Vmallocs Print"
+ depends on DEBUG_OOM
+ help
+ When enabled, allows the number of vmalloc entries printed
+ to be print rate limited based on the amount of memory the
+ vmalloc entry is consuming.
+
+ If the option is configured it is enabled/disabled by setting
+ the value of the file entry in the debugfs OOM interface at:
+ /sys/kernel/debug/oom/vmalloc_select_print_enabled
+ A value of 1 is enabled (default) and a value of 0 is disabled.
+
+ When enabled entries are print limited by the amount of memory
+ they consume. The setting value defines the minimum memory
+ size consumed and are represented in tenths of a percent.
+ Values supported are 0 to 1000 where 0 allows all entries to be
+ printed, 1 would allow entries using 0.1% or more to be printed,
+ 10 would allow entries using 1% or more of memory to be printed.
+
+ If configured and enabled the rate limiting memory percentage
+ is specified by setting a value in the debugfs OOM interface at:
+ /sys/kernel/debug/oom/vmalloc_select_print_tenthpercent
+ If configured the default settings are set to enabled and
+ print limit value of 10 or 1% of memory.
+
+ If unsure, say N.
diff --git a/mm/oom_kill_debug.c b/mm/oom_kill_debug.c
index 2b5245e1134d..d5e37f8508e6 100644
--- a/mm/oom_kill_debug.c
+++ b/mm/oom_kill_debug.c
@@ -168,6 +168,9 @@
#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
#include "slab.h"
#endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+#include <linux/vmalloc.h>
+#endif

#define OOMD_MAX_FNAME 48
#define OOMD_MAX_OPTNAME 32
@@ -223,6 +226,12 @@ static struct oom_debug_option oom_debug_options_table[] = {
.option_name = "slab_select_print_",
.support_tpercent = true,
},
+#endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+ {
+ .option_name = "vmalloc_select_print_",
+ .support_tpercent = true,
+ },
#endif
{}
};
@@ -243,6 +252,9 @@ enum oom_debug_options_index {
#endif
#ifdef CONFIG_DEBUG_OOM_SLAB_SELECT_PRINT
SELECT_SLABS_STATE,
+#endif
+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+ SELECT_VMALLOC_STATE,
#endif
OUT_OF_BOUNDS
};
@@ -431,6 +443,15 @@ u32 oom_kill_debug_oom_event_is(void)
neightbl_print_stats("nd_tbl", &nd_tbl);
#endif

+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+ if (oom_kill_debug_enabled(SELECT_VMALLOC_STATE)) {
+ u16 ptenth = oom_kill_debug_tenthpercent(SELECT_VMALLOC_STATE);
+ unsigned long minkb = (K(totalram_pages()) * ptenth) / 1000;
+
+ vmallocinfo_oom_print(minkb);
+ }
+#endif
+
#ifdef CONFIG_DEBUG_OOM_TASKS_SUMMARY
if (oom_kill_debug_enabled(TASKS_STATE))
oom_kill_debug_tasks_summary_print();
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7ba11e12a11f..2cdc0f0cd0af 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3523,4 +3523,47 @@ static int __init proc_vmalloc_init(void)
}
module_init(proc_vmalloc_init);

+#ifdef CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT
+#define K(x) ((x) << (PAGE_SHIFT-10))
+/*
+ * Routine used to print select vmalloc entries on an OOM condition so
+ * we can identify sizeable entries that may have a significant effect on
+ * kernel memory utilization. Output goes to dmesg along with all the OOM
+ * related messages when the config option DEBUG_OOM_VMALLOC_SELECT_PRINT
+ * is set to yes. Both enable / disable and size selection value are
+ * dynamically configurable.
+ */
+void vmallocinfo_oom_print(unsigned long min_kb)
+{
+ struct vmap_area *vap;
+ struct vm_struct *vsp;
+ u_int32_t entries = 0;
+ u_int32_t printed = 0;
+
+ if (!spin_trylock(&vmap_area_lock)) {
+ pr_info("Vmalloc Info: Skipped, vmap_area_lock not available\n");
+ return;
+ }
+
+ pr_info("Vmalloc Info:\n");
+ list_for_each_entry(vap, &vmap_area_list, list) {
+ if (!(vap->flags & VM_VM_AREA))
+ continue;
+ ++entries;
+ vsp = vap->vm;
+ if ((vsp->nr_pages > 0) && (K(vsp->nr_pages) >= min_kb)) {
+ pr_info("vmalloc size=%ld pages=%d caller=%pS\n",
+ vsp->size, vsp->nr_pages, vsp->caller);
+ ++printed;
+ }
+ }
+
+ spin_unlock(&vmap_area_lock);
+
+ pr_info("Summary: Vmalloc entries examined:%u printed:%u minsize:%lukB\n",
+ entries, printed, min_kb);
+}
+EXPORT_SYMBOL(vmallocinfo_oom_print);
+#endif /* CONFIG_DEBUG_OOM_VMALLOC_SELECT_PRINT */
+
#endif
--
2.20.1