Re: [PATCH v2] fault-inject: support systematic fault injection
From: Dmitry Vyukov
Date: Sat Apr 08 2017 - 13:40:47 EST
On Fri, Apr 7, 2017 at 6:47 PM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
> 2017-04-07 3:33 GMT+09:00 Michal Hocko <mhocko@xxxxxxxxxx>:
>> [Let's add linux-api - please always cc this list when adding/modifying
>> user visible interfaces]
>>
>> On Tue 28-03-17 15:01:28, Dmitry Vyukov wrote:
>>> Add /proc/self/task/<current-tid>/fail-nth file that allows failing
>>> 0-th, 1-st, 2-nd and so on calls systematically.
>>> Excerpt from the added documentation:
>>
>> I didn't really get to read through details here but it just feels wrong
>> to add this debugging only feature into proc. It also smells like one
>> off thing as well.
>
> We have 'sched' (CONFIG_SCHED_DEBUG), 'latency' (CONFIG_LATENCYTOP)
> and 'make-it-fail' as debugging per-process proc files. So it doesn't
> look very wrong to me. But I would like to avoid per-process proc
> directory becoming messy. Do you think introducing /proc/<pid>/debug/
> directory for debugging stuff makes sense?
>
> Side note: 'fail-nth' was originally a single debugfs file
> /sys/kernel/debug/fail_once. But it actually read/write current task's
> fail_nth field, so I suggested changing per process procfs file.i
> This change enables to inject N-th fail to kernel threads, too.
/sys/kernel/debug/fail_once (or fail_nth) looks more appropriate to me
for a optional testing feature. The fact that it currently
reads/writes a task_struct field is merely an implementation detail.
I would also prefer ioctl's. Then we don't need to preserve "symmetry"
for no useful reason and deal with nonsensical uses like setting it
for a non-current task and running cat on it.
>>> ===
>>> Write to this file of integer N makes N-th call in the current task fail
>>> (N is 0-based). Read from this file returns a single char 'Y' or 'N'
>>> that says if the fault setup with a previous write to this file was
>>> injected or not, and disables the fault if it wasn't yet injected.
>>> Note that this file enables all types of faults (slab, futex, etc).
>>> This setting takes precedence over all other generic settings like
>>> probability, interval, times, etc. But per-capability settings
>>> (e.g. fail_futex/ignore-private) take precedence over it.
>>> This feature is intended for systematic testing of faults in a single
>>> system call. See an example below.
>>> ===
>>>
>>> Why adding new setting:
>>> 1. Existing settings are global rather than per-task.
>>> So parallel testing is not possible.
>>> 2. attr->interval is close but it depends on attr->count
>>> which is non reset to 0, so interval does not work as expected.
>>> 3. Trying to model this with existing settings requires manipulations
>>> of all of probability, interval, times, space, task-filter and
>>> unexposed count and per-task make-it-fail files.
>>> 4. Existing settings are per-failure-type, and the set of failure
>>> types is potentially expanding.
>>> 5. make-it-fail can't be changed by unprivileged user and aggressive
>>> stress testing better be done from an unprivileged user.
>>> Similarly, this would require opening the debugfs files to the
>>> unprivileged user, as he would need to reopen at least times file
>>> (not possible to pre-open before dropping privs).
>>>
>>> The proposed interface solves all of the above (see the example).
>>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>>> Cc: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>> Cc: linux-mm@xxxxxxxxx
>>>
>>> ---
>>> We want to integrate this into syzkaller fuzzer.
>>> A prototype has found 10 bugs in kernel in first day of usage:
>>> https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance
>>>
>>> Changes since v1:
>>> - change file name from /sys/kernel/debug/fail_once
>>> to /proc/self/task/<current-tid>/fail-nth as per
>>> Akinobu suggestion
>>>
>>> ---
>>> Documentation/fault-injection/fault-injection.txt | 78 +++++++++++++++++++++++
>>> fs/proc/base.c | 52 +++++++++++++++
>>> include/linux/sched.h | 1 +
>>> kernel/fork.c | 4 ++
>>> lib/fault-inject.c | 7 ++
>>> 5 files changed, 142 insertions(+)
>>>
>>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
>>> index 415484f3d59a..192d8cbcc5f9 100644
>>> --- a/Documentation/fault-injection/fault-injection.txt
>>> +++ b/Documentation/fault-injection/fault-injection.txt
>>> @@ -134,6 +134,22 @@ use the boot option:
>>> fail_futex=
>>> mmc_core.fail_request=<interval>,<probability>,<space>,<times>
>>>
>>> +o proc entries
>>> +
>>> +- /proc/self/task/<current-tid>/fail-nth:
>>> +
>>> + Write to this file of integer N makes N-th call in the current task fail
>>> + (N is 0-based). Read from this file returns a single char 'Y' or 'N'
>>> + that says if the fault setup with a previous write to this file was
>>> + injected or not, and disables the fault if it wasn't yet injected.
>>> + Note that this file enables all types of faults (slab, futex, etc).
>>> + This setting takes precedence over all other generic debugfs settings
>>> + like probability, interval, times, etc. But per-capability settings
>>> + (e.g. fail_futex/ignore-private) take precedence over it.
>>> +
>>> + This feature is intended for systematic testing of faults in a single
>>> + system call. See an example below.
>>> +
>>> How to add new fault injection capability
>>> -----------------------------------------
>>>
>>> @@ -278,3 +294,65 @@ allocation failure.
>>> # env FAILCMD_TYPE=fail_page_alloc \
>>> ./tools/testing/fault-injection/failcmd.sh --times=100 \
>>> -- make -C tools/testing/selftests/ run_tests
>>> +
>>> +Systematic faults using fail-nth
>>> +---------------------------------
>>> +
>>> +The following code systematically faults 0-th, 1-st, 2-nd and so on
>>> +capabilities in the socketpair() system call.
>>> +
>>> +#include <sys/types.h>
>>> +#include <sys/stat.h>
>>> +#include <sys/socket.h>
>>> +#include <sys/syscall.h>
>>> +#include <fcntl.h>
>>> +#include <unistd.h>
>>> +#include <string.h>
>>> +#include <stdlib.h>
>>> +#include <stdio.h>
>>> +#include <errno.h>
>>> +
>>> +int main()
>>> +{
>>> + int i, err, res, fail_nth, fds[2];
>>> + char buf[128];
>>> +
>>> + system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
>>> + sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
>>> + fail_nth = open(buf, O_RDWR);
>>> + for (i = 0;; i++) {
>>> + sprintf(buf, "%d", i);
>>> + write(fail_nth, buf, strlen(buf));
>>> + res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
>>> + err = errno;
>>> + read(fail_nth, buf, 1);
>>> + if (res == 0) {
>>> + close(fds[0]);
>>> + close(fds[1]);
>>> + }
>>> + printf("%d-th fault %c: res=%d/%d\n", i, buf[0], res, err);
>>> + if (buf[0] != 'Y')
>>> + break;
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +An example output:
>>> +
>>> +0-th fault Y: res=-1/23
>>> +1-th fault Y: res=-1/23
>>> +2-th fault Y: res=-1/23
>>> +3-th fault Y: res=-1/12
>>> +4-th fault Y: res=-1/12
>>> +5-th fault Y: res=-1/23
>>> +6-th fault Y: res=-1/23
>>> +7-th fault Y: res=-1/23
>>> +8-th fault Y: res=-1/12
>>> +9-th fault Y: res=-1/12
>>> +10-th fault Y: res=-1/12
>>> +11-th fault Y: res=-1/12
>>> +12-th fault Y: res=-1/12
>>> +13-th fault Y: res=-1/12
>>> +14-th fault Y: res=-1/12
>>> +15-th fault Y: res=-1/12
>>> +16-th fault N: res=0/12
>>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>>> index 6e8655845830..66001172249b 100644
>>> --- a/fs/proc/base.c
>>> +++ b/fs/proc/base.c
>>> @@ -1353,6 +1353,53 @@ static const struct file_operations proc_fault_inject_operations = {
>>> .write = proc_fault_inject_write,
>>> .llseek = generic_file_llseek,
>>> };
>>> +
>>> +static ssize_t proc_fail_nth_write(struct file *file, const char __user *buf,
>>> + size_t count, loff_t *ppos)
>>> +{
>>> + struct task_struct *task;
>>> + int err, n;
>>> +
>>> + task = get_proc_task(file_inode(file));
>>> + if (!task)
>>> + return -ESRCH;
>>> + put_task_struct(task);
>>> + if (task != current)
>>> + return -EPERM;
>>> + err = kstrtoint_from_user(buf, count, 10, &n);
>>> + if (err)
>>> + return err;
>>> + if (n < 0 || n == INT_MAX)
>>> + return -EINVAL;
>>> + current->fail_nth = n + 1;
>>> + return len;
>>> +}
>>> +
>>> +static ssize_t proc_fail_nth_read(struct file *file, char __user *buf,
>>> + size_t count, loff_t *ppos)
>>> +{
>>> + struct task_struct *task;
>>> + int err;
>>> +
>>> + task = get_proc_task(file_inode(file));
>>> + if (!task)
>>> + return -ESRCH;
>>> + put_task_struct(task);
>>> + if (task != current)
>>> + return -EPERM;
>>> + if (count < 1)
>>> + return -EINVAL;
>>> + err = put_user((char)(current->fail_nth ? 'N' : 'Y'), buf);
>>> + if (err)
>>> + return err;
>>> + current->fail_nth = 0;
>>> + return 1;
>>> +}
>>> +
>>> +static const struct file_operations proc_fail_nth_operations = {
>>> + .read = proc_fail_nth_read,
>>> + .write = proc_fail_nth_write,
>>> +};
>>> #endif
>>>
>>>
>>> @@ -3296,6 +3343,11 @@ static const struct pid_entry tid_base_stuff[] = {
>>> #endif
>>> #ifdef CONFIG_FAULT_INJECTION
>>> REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
>>> + /*
>>> + * Operations on the file check that the task is current,
>>> + * so we create it with 0666 to support testing under unprivileged user.
>>> + */
>>> + REG("fail-nth", 0666, proc_fail_nth_operations),
>>> #endif
>>> #ifdef CONFIG_TASK_IO_ACCOUNTING
>>> ONE("io", S_IRUSR, proc_tid_io_accounting),
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 543e0ea82684..7b50221fea51 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -1897,6 +1897,7 @@ struct task_struct {
>>> #endif
>>> #ifdef CONFIG_FAULT_INJECTION
>>> int make_it_fail;
>>> + int fail_nth;
>>> #endif
>>> /*
>>> * when (nr_dirtied >= nr_dirtied_pause), it's time to call
>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>> index 61284d8122fa..869c97a0a930 100644
>>> --- a/kernel/fork.c
>>> +++ b/kernel/fork.c
>>> @@ -545,6 +545,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>>>
>>> kcov_task_init(tsk);
>>>
>>> +#ifdef CONFIG_FAULT_INJECTION
>>> + tsk->fail_nth = 0;
>>> +#endif
>>> +
>>> return tsk;
>>>
>>> free_stack:
>>> diff --git a/lib/fault-inject.c b/lib/fault-inject.c
>>> index 6a823a53e357..d6516ba64d33 100644
>>> --- a/lib/fault-inject.c
>>> +++ b/lib/fault-inject.c
>>> @@ -107,6 +107,12 @@ static inline bool fail_stacktrace(struct fault_attr *attr)
>>>
>>> bool should_fail(struct fault_attr *attr, ssize_t size)
>>> {
>>> + if (in_task() && current->fail_nth) {
>>> + if (--current->fail_nth == 0)
>>> + goto fail;
>>> + return false;
>>> + }
>>> +
>>> /* No need to check any other properties if the probability is 0 */
>>> if (attr->probability == 0)
>>> return false;
>>> @@ -134,6 +140,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
>>> if (!fail_stacktrace(attr))
>>> return false;
>>>
>>> +fail:
>>> fail_dump(attr);
>>>
>>> if (atomic_read(&attr->times) != -1)
>>> --
>>> 2.12.2.564.g063fe858b8-goog
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
>>
>> --
>> Michal Hocko
>> SUSE Labs