advice sought: practicality of SMP cache coherency implemented inassembler (and a hardware detect line)

From: Luke Kenneth Casson Leighton
Date: Fri Mar 25 2011 - 17:52:32 EST


folks, hi,

i've hit an unusual situation where i'm responsible for evaluating and
setting the general specification of a new multi-core processor, but
it's based around a RISC core that cannot be significantly changed
without going through some very expensive verification procedures. in
the discussions, the engineer responsible for it said that modifying
the cache is prohibitively expensive and time-consuming, but that one
possible workaround would be to have a hardware detection mechanism of
cache-write conflicts, to generate a software interrupt that you would
then simply run some assembly code to flush the 1st-level cache line.
the hardware detection mechanism could be tacked on, would be very
quick and easy to implement, and would generate interrupts to the
specific processor whose data required flushing.

now, whilst it tickles my hardware hacker fancy like anything, because
i feel that this could be used for many other purposes such as
implementing spin-locks, i have some concerns about the performance
implications that i'm not qualified or experienced enough to say one
way or the other if it's a stonking good idea or just outright mad.

so, bearing in mind that sensible answers will likely result in offers
of a consulting contract to actually *implement* the software /
assembly code for the linux kernel modifications required (yes, linux
is already available for this RISC processor type - but only in
single-core), i would greatly appreciate some help in getting answers
to these questions:

* is this even a good idea? does it "fly"?

* if it does work, at what point do the number of cores involved just
make it... completely impractical? over 2? over 4? 8? 16?

* i believe the cache lines in the 1st level data cache are 8 bytes
(and the AMBA / AXI bus on each is 64-bit wide) - is that reasonable?

* does anyone know of any other processors that have actually
implemented software-driven cache coherency, esp. ones with linux
kernel running on them, and if so, how does it do?

much appreciated considerate and informative answers - i must
apologise that i will be immediately unsubscribing from linux-kernel
list, and re-subscribing again in the near future, but will be
watching responses via web-based list archives: the number of messages
on lkml is too high to do otherwise. also for those of you who
remember it: whilst it was fun in a scary kind of way, if would be
nice if this didn't turn into the free-for-all whopper-thread that
occurred back in 2005 or so - this multi-core processor is going to be
based around an existing proven 20-year-old well-established RISC core
that has been running linux for over a decade, it just has never been
put into an SMP arrangement before and we're on rather short
timescales to get it done.

l.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/