Re: [PATCH v2] acpi: fix acpi_get_table() leak / acpi-sysfs denial of service

From: Dan Williams
Date: Thu Apr 27 2017 - 10:48:48 EST


On Wed, Apr 26, 2017 at 11:49 PM, Zheng, Lv <lv.zheng@xxxxxxxxx> wrote:
> Hi, Rafael
>
>> From: linux-acpi-owner@xxxxxxxxxxxxxxx [mailto:linux-acpi-owner@xxxxxxxxxxxxxxx] On Behalf Of Rafael J.
>> Wysocki
>> Subject: Re: [PATCH v2] acpi: fix acpi_get_table() leak / acpi-sysfs denial of service
>>
>> On Tue, Apr 25, 2017 at 9:58 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>> > Reading an ACPI table through the /sys/firmware/acpi/tables interface
>> > more than 65,536 times leads to the following log message:
>> >
>> > ACPI Error: Table ffff88033595eaa8, Validation count is zero after increment
>> > (20170119/tbutils-423)
>> >
>> > ...and the table being unavailable until the next reboot. Add the
>> > missing acpi_put_table() so the table ->validation_count is decremented
>> > after each read.
>> >
>> > Cc: <stable@xxxxxxxxxxxxxxx>
>> > Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
>> > Cc: Rafael Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> > Cc: Kristin Jacque <kristin.jacque@xxxxxxxxx>
>> > Cc: Tiffany Kasanicky <tiffany.j.kasanicky@xxxxxxxxx>
>> > Cc: Ryon Jensen <ryon.jensen@xxxxxxxxx>
>> > Reported-by: Anush Seetharaman <anush.seetharaman@xxxxxxxxx>
>> > Fixes: 1c8fce27e275 ("ACPI: introduce drivers/acpi/sysfs.c")
>> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>>
>> I'm going to apply this, but your Fixes tag is not correct.
>
> Even we want to skip the so-called "short period".
> This fix is not 100% correct.
> It's written in a very casual way.
> I would think it was just a material to mention me to work on this issue.
> See, there are several acpi_get_table clones invoked in sysfs.c, it just fixed one of them.
> So it only fixes very limited cases.
> I'll post one for fixing sysfs calls later.

That's the point, the other problem areas can be fixed up incrementally.

>
> There are also several cases needing urgent care, for example, FACS table used between suspend/resume process.
> Which can also increase the count by 1 for each suspend/resume cycle.
> I'll post an urgent fix for this along with the sysfs one.
>
>>
>> validation_count was added to struct acpi_table_desc by commit
>>
>> commit 174cc7187e6f088942c8e74daa7baff7b44b33c9
>> Author: Lv Zheng <lv.zheng@xxxxxxxxx>
>> Date: Wed Dec 14 15:04:25 2016 +0800
>>
>> ACPICA: Tables: Back port acpi_get_table_with_size() and
>> early_acpi_os_unmap_memory()
>> from Linux kernel
>>
>> from the 4.10 time frame, so IMO it should be
>>
>> Fixes: 174cc7187e6f (ACPICA: Tables: Back port
>> acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux
>> kernel)
>
> And TBH, IMO, my posted patch is a regression fix fixing this tag.
> https://patchwork.kernel.org/patch/9700175/
> It fixes my damaged brain of trying to enable this mechanism.
> It's actually a mistake, the planned way is not that.
>
> I don't think fixing ~50 users could be easily achieved in just 1 cycle without triggering any other issues.
> Especially among them, there are design changes.
> In order to upstream this mechanism to ACPICA, I've already changed the original design.
> Originally it's done in a different way:
> A acpi_get_validate_table() API is prepared for those wants to pin the ACPI table in memory.
> And it is not reference counting based.
> Now all solution need to be re-considered due to the design change.
> So I don't think it will be a short period.
>
> On the contrary, if there is no one executing a script to read/write sysfs more than 60000 times,
> the error won't flood logs.

That's not true. Here is the log flood is how we discovered the
problem in the first place:

https://gist.github.com/djbw/ac03ac15bde6db0301a73a33bea70521

...this was not a deliberate loop. Now, we did find that this
application was reading the tables more often than it should, and
hopefully there aren't too many more applications like this out in the
wild.

> So I was actually trying to do the right things for Linux kernel.
> While just laying down a simple hammer without caring about user experiences could be destructive.
> It's a disaster if some servers need to be rebooted due to this issue.

It's not a disaster in that most systems should not be re-reading the
tables at a high enough frequency for this to matter. The fact that
this bug has shipped in Fedora and other places without issue helps
point out that this is hard to hit in practice.