Re: [PATCH] irqbalance, powerpc: add IRQs without settable SMPaffinity to banned list
From: Neil Horman
Date: Fri Sep 24 2010 - 06:40:37 EST
On Fri, Sep 24, 2010 at 04:56:34PM +1000, Michael Neuling wrote:
>
> > > > size_t size =3D 0;
> > > > FILE *file;
> > > > sprintf(buf, "/proc/irq/%i/smp_affinity", number);
> > > > - file =3D fopen(buf, "r");
> > > > + file =3D fopen(buf, "r+");
> > > > if (!file)
> > > > continue;
> > > > if (getline(&line, &size, file)=3D=3D0) {
> > > > @@ -89,7 +89,14 @@
> > > > continue;
> > > > }
> > > > cpumask_parse_user(line, strlen(line), irq->mask);
> > > > - fclose(file);
> > > > + /*
> > > > + * Check that we can write the affinity, if
> > > > + * not take it out of the list.
> > > > + */
> > > > + if (fputs(line, file) =3D=3D EOF)
> > > > + can_set =3D 0;
> >
> > > This is maybe a nit, but writing to the affinity file can fail for a few
> > > different reasons, some of them permanent, some transient. For instance,=
> > if
> > > we're in a memory constrained condition temporarily irq_affinity_proc_wri=
> > te
> > > might return -ENOMEM. =20
> >
> > Yeah true, usually followed shortly by your kernel going so far into
> > swap you never get it back, or OOMing, but I guess it's possible.
> >
> > > Might it be better to modify this code so that, instead
> > > of using fputs to merge the various errors into an EOF, we use some other=
> > write
> > > method that lets us better determine the error and selectively ban the in=
> > terrupt
> > > only for those errors which we consider permanent?
> >
> > Yep. It seems fputs() gives you know way to get the actual error from
> > write(), so it looks we'll need to switch to open/write, but that's
> > probably not so terrible.
>
> fclose inherits the error from fputs and it sets errno correctly. Below
> uses this to catch only EIO errors and mark them for the banned list.
>
> Mikey
>
> irqbalance, powerpc: add IRQs without settable SMP affinity to banned list
>
> On pseries powerpc, IPIs are registered with an IRQ number so
> /proc/interrupts looks like this on a 2 core/2 thread machine:
>
> CPU0 CPU1 CPU2 CPU3
> 16: 3164282 3290514 1138794 983121 XICS Level IPI
> 18: 2605674 0 304994 0 XICS Level lan0
> 30: 400057 0 169209 0 XICS Level ibmvscsi
> LOC: 133734 77250 106425 91951 Local timer interrupts
> SPU: 0 0 0 0 Spurious interrupts
> CNT: 0 0 0 0 Performance monitoring interrupts
> MCE: 0 0 0 0 Machine check exceptions
>
> Unfortunately this means irqbalance attempts to set the affinity of IPIs
> which is not possible. So in the above case, when irqbalance is in
> performance mode due to heavy IPI, lan0 and ibmvscsi activity, it
> sometimes attempts to put the IPIs on one core (CPU0&1) and lan0 and
> ibmvscsi on the other core (CPU2&3). This is suboptimal as we want lan0
> and ibmvscsi to be on separate cores and IPIs to be ignored.
>
> When irqblance attempts writes to the IPI smp_affinity (ie.
> /proc/irq/16/smp_affinity in the above example) it fails with an EIO but
> irqbalance currently ignores this.
>
> This patch catches these write fails and in this case adds that IRQ
> number to the banned IRQ list. This will catch the above IPI case and
> any other IRQ where the SMP affinity can't be set.
>
> Tested on POWER6, POWER7 and x86.
>
> Signed-off-by: Michael Neuling <mikey@xxxxxxxxxxx>
>
> Index: irqbalance/irqlist.c
> ===================================================================
> --- irqbalance.orig/irqlist.c
> +++ irqbalance/irqlist.c
> @@ -28,6 +28,7 @@
> #include <unistd.h>
> #include <sys/types.h>
> #include <dirent.h>
> +#include <errno.h>
>
> #include "types.h"
> #include "irqbalance.h"
> @@ -67,7 +68,7 @@
> DIR *dir;
> struct dirent *entry;
> char *c, *c2;
> - int nr , count = 0;
> + int nr , count = 0, can_set = 1;
> char buf[PATH_MAX];
> sprintf(buf, "/proc/irq/%i", number);
> dir = opendir(buf);
> @@ -80,7 +81,7 @@
> size_t size = 0;
> FILE *file;
> sprintf(buf, "/proc/irq/%i/smp_affinity", number);
> - file = fopen(buf, "r");
> + file = fopen(buf, "r+");
> if (!file)
> continue;
> if (getline(&line, &size, file)==0) {
> @@ -89,7 +90,13 @@
> continue;
> }
> cpumask_parse_user(line, strlen(line), irq->mask);
> - fclose(file);
> + /*
> + * Check that we can write the affinity, if
> + * not take it out of the list.
> + */
> + fputs(line, file);
> + if (fclose(file) && errno == EIO)
> + can_set = 0;
> free(line);
> } else if (strcmp(entry->d_name,"allowed_affinity")==0) {
> char *line = NULL;
> @@ -122,7 +129,7 @@
> count++;
>
> /* if there is no choice in the allowed mask, don't bother to balance */
> - if (count<2)
> + if ((count<2) || (can_set == 0))
> irq->balance_level = BALANCE_NONE;
>
>
>
Thank you, this looks good to me, I'll integrate this shortly.
Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/