Re: [PATCH 0/5] kasan: add workqueue and timer stack for generic KASAN

From: Qian Cai
Date: Mon Aug 10 2020 - 10:51:13 EST


On Mon, Aug 10, 2020 at 10:31:22PM +0800, Walter Wu wrote:
> On Mon, 2020-08-10 at 08:44 -0400, Qian Cai wrote:
> > On Mon, Aug 10, 2020 at 07:50:57PM +0800, Walter Wu wrote:
> > > On Mon, 2020-08-10 at 07:19 -0400, Qian Cai wrote:
> > > >
> > > > > On Aug 10, 2020, at 3:21 AM, Walter Wu <walter-zh.wu@xxxxxxxxxxxx> wrote:
> > > > >
> > > > > Syzbot reports many UAF issues for workqueue or timer, see [1] and [2].
> > > > > In some of these access/allocation happened in process_one_work(),
> > > > > we see the free stack is useless in KASAN report, it doesn't help
> > > > > programmers to solve UAF on workqueue. The same may stand for times.
> > > > >
> > > > > This patchset improves KASAN reports by making them to have workqueue
> > > > > queueing stack and timer queueing stack information. It is useful for
> > > > > programmers to solve use-after-free or double-free memory issue.
> > > > >
> > > > > Generic KASAN will record the last two workqueue and timer stacks,
> > > > > print them in KASAN report. It is only suitable for generic KASAN.
> > > > >
> > > > > In order to print the last two workqueue and timer stacks, so that
> > > > > we add new members in struct kasan_alloc_meta.
> > > > > - two workqueue queueing work stacks, total size is 8 bytes.
> > > > > - two timer queueing stacks, total size is 8 bytes.
> > > > >
> > > > > Orignial struct kasan_alloc_meta size is 16 bytes. After add new
> > > > > members, then the struct kasan_alloc_meta total size is 32 bytes,
> > > > > It is a good number of alignment. Let it get better memory consumption.
> > > >
> > > > Getting debugging tools complicated surely is the best way to kill it. I would argue that it only make sense to complicate it if it is useful most of the time which I never feel or hear that is the case. This reminds me your recent call_rcu() stacks that most of time just makes parsing the report cumbersome. Thus, I urge this exercise to over-engineer on special cases need to stop entirely.
> > > >
> > >
> > > A good debug tool is to have complete information in order to solve
> > > issue. We should focus on if KASAN reports always show this debug
> > > information or create a option to decide if show it. Because this
> > > feature is Dimitry's suggestion. see [1]. So I think it need to be
> > > implemented. Maybe we can wait his response.
> > >
> > > [1]https://lkml.org/lkml/2020/6/23/256
> >
> > I don't know if it is Dmitry's pipe-dream which every KASAN report would enable
> > developers to fix it without reproducing it. It is always an ongoing struggling
> > between to make kernel easier to debug and the things less cumbersome.
> >
> > On the other hand, Dmitry's suggestion makes sense only if the price we are
> > going to pay is fair. With the current diffstat and the recent experience of
> > call_rcu() stacks "waste" screen spaces as a heavy KASAN user myself, I can't
> > really get that exciting for pushing the limit again at all.
> >
>
> If you are concerned that the report is long, maybe we can create an
> option for the user decide whether print them (include call_rcu).
> So this should satisfy everyone?

Adding kernel config options is just another way to add complications with real
cost. The only other way I can think of right now is to create some kinds of
plugin systems for kasan to be able to run ebpf scripts (for example) to deal
with those special cases.