> Each cpu can throttle at different times and have different counts, so
The counts should be per CPU, but everything else (timestamp, enabled)
etc. can be just global. You just don't want the logs flooded with events,
but if the rate limiting is global or local doesn't make much difference
That would give much simpler code and I believe
is sufficient for basically everybody
Currently your patch is a bit too large for the relatively simple things
it tries to attempt, so such ways to simplify it and slim it down are needed.
If you have other ideas to make it simpler that would be appreciated too.
> Then the caller has to maintain the thermal counter, which is what I
> don't want. I wanted the managmente code to be separate. I can make
> this an inline in the header if you want to get rid of the function
> call.
The function call doesn't matter, it is just that we don't want overabstracted
kernel in the Linux kernel and your patch currently definitely has too many
functions doing very simple things. That makes it harder to read etc.
> > I'm not sure what you need the timer for. Can't just just drive
> > that from the interrupt handlers and check time stamps to do
> > rate limiting? Then another timer wouldn't be needed.
>
> Problem is that the timestamps are in jiffies. You could potentially
> have very long periods of time in between thermal events, assuming the
> box has a long uptime. On 32bit systems, if the time delta in jiffies
> between 2 events is >= 0x80000000, the time elapsed will be negative,
> and will prevent you from logging events until jiffies wrap all the
> way around (the counter will still increment though). The timer takes
> care of clearing the flag after 5 minutes.
But Jiffies wrap is several years. Even if we get a bogus overheat event
after two years it isn't really a issue. Please just get rid of the timer
We have a 64bit jiffies for that now on 32bit too.
> > Instead of having this evinfo structure you could just directly
> > fill in struct mces in the caller.
>
> But the caller won't know what to fill the struct mce with since this
> function does the logic of figuring out the thermal event info. I
> can't have this function take "struct mce *" since that doesn't exist
> on i386. I could have it accept pointers to values as arguments, but
> that's messy.
Then either define struct mce for i386 or use two different functions
for i386/x86-64.
> > > + struct mce m;
> > > +
> > > + memset(&m, 0, sizeof(m));
> > > + m.cpu = evinfo->cpu;
> > > + m.bank = MCE_THERMAL_BANK;
> >
> > So how should user space distingush that event from other thermal events?
>
> What do you mean? This is already what x86_64 code does today for
> thermal throttle events.
It should be just clear where it came from since you added a new way
to generate them.