Re: [PATCH] fault-inject: support systematic fault injection

From: Dmitry Vyukov
Date: Tue Mar 28 2017 - 09:03:29 EST


On Sat, Mar 25, 2017 at 10:54 AM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
> 2017-03-25 5:08 GMT+09:00 Dmitry Vyukov <dvyukov@xxxxxxxxxx>:
>> Add /sys/kernel/debug/fail_once file that allows failing 0-th, 1-st, 2-nd
>> and so on calls systematically. Excerpt from the added documentation:
>>
>> ===
>> Write to this file of integer N makes N-th call in the current task fail
>> (N is 0-based). Read from this file returns a single char 'Y' or 'N'
>> that says if the fault setup with a previous write to this file was
>> injected or not, and disables the fault if it wasn't yet injected.
>> Note that this file enables all types of faults (slab, futex, etc).
>> This setting takes precedence over all other generic settings like
>> probability, interval, times, etc. But per-capability settings
>> (e.g. fail_futex/ignore-private) take precedence over it.
>> This feature is intended for systematic testing of faults in a single
>> system call. See an example below.
>> ===
>
> The "/sys/kernel/debug/fail_once" contains per-task data.
>
> Should we introduce new per-task file like "/proc/<pid>/fail-nth"
> instead of adding a single global debugfs file?

Mailed v2 that uses /proc/self/task/tid/fail-nth.


>> Why adding new setting:
>> 1. Existing settings are global rather than per-task.
>> So parallel testing is not possible.
>> 2. attr->interval is close but it depends on attr->count
>> which is non reset to 0, so interval does not work as expected.
>> 3. Trying to model this with existing settings requires manipulations
>> of all of probability, interval, times, space, task-filter and
>> unexposed count and per-task make-it-fail files.
>> 4. Existing settings are per-failure-type, and the set of failure
>> types is potentially expanding.
>> 5. make-it-fail can't be changed by unprivileged user and aggressive
>> stress testing better be done from an unprivileged user.
>> Similarly, this would require opening the debugfs files to the
>> unprivileged user, as he would need to reopen at least times file
>> (not possible to pre-open before dropping privs).
>>
>> The proposed interface solves all of the above (see the example).