Re: [REGRESSION] Two issues that prevent process accounting (taskstats) from working correctly
From: Richard Genoud
Date: Wed Mar 08 2017 - 06:23:44 EST
Cc += Al Viro
2017-02-12 16:44 GMT+01:00 Dmitry Romanov <rdimanos@xxxxxxxxx>:
> On Mon, Dec 19, 2016 at 01:06:00PM +0100, Martin Steigerwald wrote:
>>
>> 1) Sometimes process accounting does not work at all.
>>
>> The acct() system call (to activate process accounting) return value 0,
>> which means that process accounting is activated successfully.
>> However, no process accounting records are written whatsoever. This
>> situation can be reproduced with the program 'acctdemo.c'
>> that you can find as attachment. When this program gives the message
>> "found a process accounting record!", the situation is okay
>> and process accounting works fine to the file '/tmp/mypacct'. When the
>> message 'No process accounting record yet....' is repeatedly given,
>> process accounting does not work and will not work at all. It might be
>> that you have to start this program several times before you get
>> this situation (preferably start/finish lots of processes in the mean time).
>> This problem is probably caused by a new mechanism introduced in the
>> kernel code (..../linux/kernel/acct.c) that is called 'slow accounting'
>> and has to be solved in the kernel code.
>>
>> I experience this problem on Debian8 with a 4.8 kernel and on CentOS7
>> with a 4.8 kernel.
>
> I reported same problem on bugzilla as:
>
> Bug 180841 - Process accounting sometimes do not append records for terminated
> processes
> https://bugzilla.kernel.org/show_bug.cgi?id=180841
>
> I think I found cause of this problem and can suggest patch which correct this
> problem.
>
> Problem arise in such situation:
>
> Problem arise if between process accounting activation with call acct(...) and
> first termination of process pass not less than one jiffy. (Note, acct() return
> successfully, with zero.) In such situation records for terminated processes
> do nod append to accounting file, until process accounting is restarted.
>
> I wrote test program test.c for illustration described problem for kernel
> version 3.17-rc1 and later. This program create empty file for accounting,
> call system call acct() with this file, sleep for not less than one jiffy,
> create new process and exit this process. Then, program test size of accounting
> file. If size of file remain zero, it seems problem and program display message
> "Accounting file size = 0, process accounting did not add record to accounting
> file!".
> On my system problem reproduce almost always by this test.c.
>
> ----------
> test.c
> ----------
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <stdlib.h>
>
> /* Accounting file name : */
> #define ACCTFILENAME "/var/log/pacct"
>
> int main()
> {
> int fd;
> int pid;
> struct stat sb;
>
> /* Turn off accounting */
> if ( acct(NULL) < 0 ) {
> perror("Process accounting off");
> return -1;
> }
>
> /* Create empty accounting file */
> fd = creat(ACCTFILENAME, S_IRUSR | S_IWUSR);
> if ( fd == -1 ) {
> perror("Accounting file creation");
> return -1;
> }
> if ( close(fd) == -1) {
> perror("Accounting file closing");
> return -1;
> }
>
> /* Switch accounting on */
> if ( acct(ACCTFILENAME) < 0 ) {
> perror("Process accounting on");
> return -1;
> }
>
> /* Delay must be at least 1 jiffy.
> For reproducing bug, some process exit must not happen during first jiffy.
> If HZ >= 100, need delay minimum 10 ms. */
> usleep(10000);
>
> /* Create and exit child process. The record for this process must be append
> by activated process accounting. */
> pid = fork();
> if (pid < 0) {
> perror("Child process creating");
> return -1;
> }
> /* Exit child process */
> if (pid == 0) {
> exit(0);
> }
> /* Wait for child process termination */
> wait(NULL);
>
> /* Get size of accounting file. */
> if ( stat(ACCTFILENAME, &sb) == -1 ) {
> perror("Getting accounting file status");
> return -1;
> }
>
> if ( sb.st_size == 0 ) {
> printf("Accounting file size = 0, process accounting did not add record to accounting file!\n");
> }
> else {
> printf("Accounting file size > 0, expected behaviour.\n");
> }
>
> /* Turn off accounting and remove accounting file*/
> if ( acct(NULL) < 0 ) {
> perror("Process accounting off");
> return -1;
> }
> if ( remove(ACCTFILENAME) == -1 ) {
> perror("Removing accounting file");
> return -1;
> }
>
> return 0;
>
> }
> ----------
>
> I suppose that this problem may be solve in kernel versions 3.17-rc1 and
> later by following patch:
>
> Signed-off-by: Dmitry Romanov <rdimanos@xxxxxxxxx>
> ---
> kernel/acct.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 74963d1..37f1dc6 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -99,7 +99,7 @@ static int check_free_space(struct bsd_acct_struct *acct)
> {
> struct kstatfs sbuf;
>
> - if (time_is_before_jiffies(acct->needcheck))
> + if (time_is_after_jiffies(acct->needcheck))
> goto out;
>
> /* May block */
> --
> 1.9.1
>
> Line "if (time_is_before_jiffies(acct->needcheck))" use for check whether the
> time for check free space. It seems function "time_is_after_jiffies" need here.
>
> ----------
> In kernel versions 3.3-rc1 - 3.16 activation of process accounting implemented
> differently, so delay between call acct(filename) and first process termination
> do not produce problem, and program test.c do not detect problem. But, it seems,
> using function time_is_before_jiffies is not right similarly.
> Another problem arise, if during work of process accounting happen that current
> jiffies is greater than acct->needcheck (for example, if between two consecutive
> process terminations happen interval greater than ACCT_TIMEOUT seconds).
> Then in lines:
>
> if (!file || time_is_before_jiffies(acct->needcheck))
> goto out;
>
> always will branch with "goto" and acct->needcheck will not change. So free
> space will not check more, until process accounting is restarted.
> Note, that in version 3.17-rc1 and later this problem is also present.
>
> Therefore I suppose that this problem for kernel version 3.3-rc1 - 3.16 may be
> solve by following patch:
>
> Signed-off-by: Dmitry Romanov <rdimanos@xxxxxxxxx>
> ---
> kernel/acct.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/acct.c b/kernel/acct.c
> index 808a86f..591bdcd 100644
> --- a/kernel/acct.c
> +++ b/kernel/acct.c
> @@ -107,7 +107,7 @@ static int check_free_space(struct bsd_acct_struct *acct, struct file *file)
>
> spin_lock(&acct_lock);
> res = acct->active;
> - if (!file || time_is_before_jiffies(acct->needcheck))
> + if (!file || time_is_after_jiffies(acct->needcheck))
> goto out;
> spin_unlock(&acct_lock);
>
> --
> 1.9.1
>
>
> In kernel 3.2 another method used for define time to check
> free space (by timer). So I did not find in this version such problem.
>
> Dmitry
Playing with bootchart, I also had some problems with BSD process
acounting V3 on v4.11-rc1.
The file kernel_pacct created by bootchartd was always empty.
After a git bisect, the culprit was :
commit 795a2f22a8ea ("acct() should honour the limits from the very beginning")
If I revert this commit, the accounting starts working again
test platform: ARM at91sam9g35
test software : busybox bootchartd
Richard.