RE: [PATCH v4 4/5] selftests/resctrl: Cleanup properly when an error occurs in CAT test

From: Shaopeng Tan (Fujitsu)
Date: Wed Nov 30 2022 - 03:35:15 EST


Hi Reinette,

> On 11/24/2022 12:17 AM, Shaopeng Tan (Fujitsu) wrote:
> > Hi Reinette,
> >
> >> On 11/16/2022 5:05 PM, Shaopeng Tan wrote:
> >>> After creating a child process with fork() in CAT test, if there is
> >>> an error occurs or such as a SIGINT signal is received, the parent
> >>> process will be terminated immediately, but the child process will
> >>> not be killed and also umount_resctrlfs() will not be called.
> >>>
> >>> Add a signal handler like other tests to kill child process, umount
> >>> resctrlfs, cleanup result files, etc. if an error occurs in parent
> >>> process. To use ctrlc_handler() of other tests to kill child
> >>> process(bm_pid), using global bm_pid instead of local bm_pid.
> >>>
> >>> Wait for child process to be killed if an error occurs in child process.
> >>>
> >>> Reviewed-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>
> >>> Signed-off-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
> >>> ---
> >>> tools/testing/selftests/resctrl/cat_test.c | 30
> >> ++++++++++++++--------
> >>> 1 file changed, 20 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/cat_test.c
> >> b/tools/testing/selftests/resctrl/cat_test.c
> >>> index 6a8306b0a109..1f8f5cf94e95 100644
> >>> --- a/tools/testing/selftests/resctrl/cat_test.c
> >>> +++ b/tools/testing/selftests/resctrl/cat_test.c
> >>> @@ -100,10 +100,10 @@ void cat_test_cleanup(void)
> >>>
> >>> int cat_perf_miss_val(int cpu_no, int n, char *cache_type) {
> >>> + struct sigaction sigact;
> >>> unsigned long l_mask, l_mask_1;
> >>> int ret, pipefd[2], sibling_cpu_no;
> >>> char pipe_message;
> >>> - pid_t bm_pid;
> >>>
> >>> cache_size = 0;
> >>>
> >>> @@ -181,17 +181,25 @@ int cat_perf_miss_val(int cpu_no, int n, char
> >> *cache_type)
> >>> strcpy(param.filename, RESULT_FILE_NAME1);
> >>> param.num_of_runs = 0;
> >>> param.cpu_no = sibling_cpu_no;
> >>> + } else {
> >>> + /*
> >>> + * Register CTRL-C handler for parent, as it has to kill
> >>> + * child process before exiting
> >>> + */
> >>> + sigact.sa_sigaction = ctrlc_handler;
> >>> + sigemptyset(&sigact.sa_mask);
> >>> + sigact.sa_flags = SA_SIGINFO;
> >>> + if (sigaction(SIGINT, &sigact, NULL) ||
> >>> + sigaction(SIGTERM, &sigact, NULL) ||
> >>> + sigaction(SIGHUP, &sigact, NULL))
> >>> + perror("# sigaction");
> >>
> >> Why keep going at this point?
> >>
> >> I tried this change but I was not able to trigger ctrlc_handler(). It
> >
> > I tested this change using kselftest framework, In this case, the
> > timeout command sent a SIGTERM signal, and then ctrlc_handler() was
> > triggered.
> > Since the handling of SIGINT and SIGHUP signals is overridden in
> > run_fill_buf(),
> > ctrl_handler() may be called if ctrl-c is received.
>
> I tested this by running "resctrl_tests -t cat" and when doing so this change
> does not behave as described.

Yes, the fix of v4 cannot satisfy "resctrl_tests -t cat"".
I will add new fix in next version.

> >> seems that the handler is changed soon after (see
> >> cat_val()->run_fill_buf()) and ctrl_handler() (note the subtle name
> >> difference) is run instead when a SIGINT is received. The value of
> >> ctrl_handler() is not clear to me though, and it could even be argued
> >> that it is broken because it starts with
> >> free(startptr) and startptr would likely already be free'd at this
> >> point without resetting its value to NULL.
> >>
> >> From what I understand ctrl_handler() and its installation from
> >> run_fill_buf() could be removed so that it does not override what is
> >> being done with this change. Otherwise, from what I can tell, this
> >> new handler will never run.
> >
> > I think removing ctrl_handler() is fine.
> > In CAT test, it overrides ctrlc_handler().
> > In other tests(CMT,MBA,MBM), the child process used ctrl_handler() to
> > cleanup itself,
>
> Is that explicit cleanup required? All I can see is free(startptr) and that pointer
> would usually be invalid as I mentioned earlier. If the process crashes while
> running fill_cache() and thus not get a chance to run free(startptr) then the OS
> would release the memory, no?

Sorry, my explanation was not clear.
I agree with you, I think removing ctrl_handler() is OK.

> > but the parent process cleanup the child process with ctrlc_handler()
> properly.
> > Also, avoid using signal().
> > fill_buf.c:run_fill_buf()
> > 201 /* set up ctrl-c handler */
> > 202 if (signal(SIGINT, ctrl_handler) == SIG_ERR)
> > 203 printf("Failed to catch SIGINT!\n");
> >
> > I removed ctrl_handler() and ran resctrl_tests with and without the kselftest
> framework.
> > There is no problem.
>
> Thank you. I only used the CAT test when I found the issue.

Removing ctrl_handler() is only part of the fix in the next version(v5).
All fixes as follows.

--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -98,12 +98,17 @@ void cat_test_cleanup(void)
remove(RESULT_FILE_NAME2);
}

+static void ctrlc_handler_child(int signum, siginfo_t *info, void *ptr)
+{
+ exit(EXIT_SUCCESS);
+}
+
int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
{
+ struct sigaction sigact;
unsigned long l_mask, l_mask_1;
int ret, pipefd[2], sibling_cpu_no;
char pipe_message;
- pid_t bm_pid;

cache_size = 0;

@@ -181,17 +186,33 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
strcpy(param.filename, RESULT_FILE_NAME1);
param.num_of_runs = 0;
param.cpu_no = sibling_cpu_no;
+
+ sigfillset(&sigact.sa_mask);
+ sigact.sa_sigaction = ctrlc_handler_child;
+ sigact.sa_flags = SA_SIGINFO;
+ if (sigaction(SIGINT, &sigact, NULL) ||
+ sigaction(SIGTERM, &sigact, NULL) ||
+ sigaction(SIGHUP, &sigact, NULL))
+ perror("# sigaction");
+ } else {
+ /*
+ * Register CTRL-C handler for parent, as it has to kill
+ * child process before exiting
+ */
+ sigact.sa_sigaction = ctrlc_handler;
+ sigemptyset(&sigact.sa_mask);
+ sigact.sa_flags = SA_SIGINFO;
+ if (sigaction(SIGINT, &sigact, NULL) ||
+ sigaction(SIGTERM, &sigact, NULL) ||
+ sigaction(SIGHUP, &sigact, NULL))
+ perror("# sigaction");
}

remove(param.filename);

ret = cat_val(&param);
- if (ret)
- return ret;
-
- ret = check_results(&param);
- if (ret)
- return ret;
+ if (ret == 0)
+ ret = check_results(&param);

if (bm_pid == 0) {
/* Tell parent that child is ready */
@@ -199,9 +220,11 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
pipe_message = 1;
if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
sizeof(pipe_message)) {
- close(pipefd[1]);
+ /*
+ * Just print the error message.
+ * Let while(1) run and wait for itself to be killed.
+ */
perror("# failed signaling parent process");
- return errno;
}

close(pipefd[1]);
@@ -226,5 +249,5 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
if (bm_pid)
umount_resctrlfs();

- return 0;
+ return ret;
}
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index 56ccbeae0638..322c6812e15c 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -33,14 +33,6 @@ static void sb(void)
#endif
}

-static void ctrl_handler(int signo)
-{
- free(startptr);
- printf("\nEnding\n");
- sb();
- exit(EXIT_SUCCESS);
-}
-
static void cl_flush(void *p)
{
#if defined(__i386) || defined(__x86_64)
@@ -198,12 +190,6 @@ int run_fill_buf(unsigned long span, int malloc_and_init_memory,
unsigned long long cache_size = span;
int ret;

- /* set up ctrl-c handler */
- if (signal(SIGINT, ctrl_handler) == SIG_ERR)
- printf("Failed to catch SIGINT!\n");
- if (signal(SIGHUP, ctrl_handler) == SIG_ERR)
- printf("Failed to catch SIGHUP!\n");
-
ret = fill_cache(cache_size, malloc_and_init_memory, memflush, op,
resctrl_val);
if (ret) {


Best regards,
Shaopeng Tan