Re: Avoid allocating during interleave from almost full nodes

From: Christoph Lameter
Date: Mon Nov 06 2006 - 15:59:20 EST


On Mon, 6 Nov 2006, Andrew Morton wrote:

> > Interleave is used for data accessed from many nodes otherwise one would
> > prefer to allocate from the current zone. The shared data may be very
> > frequently accessed from multiple nodes and one would like different NUMA
> > nodes to respond to these requests.
>
> But what is "shared data"?? You're using a new but very general term
> without defining it.

Data that is shared by applications or by the kernel. The user space
programs may allocate shared data with interleave policy. For certain data
the kernel may use interleave allocations. F.e. page cache pages in a
cpuset configured for memory spreading.

It depends what the application or the kernel designates to be shared
data.

> OK, but if two nodes have a lot of free pages and the rest don't then
> interleave will consume those free pages without performing any reclaim
> from all the other nodes. Hence hostpots or imbalances.
>
> Whatever they are. Why does it matter?

Hotspots create lots of requests going to the same numa node. The nodes
have a limited capability to service cacheline requests and the bandwidth
on the interlink is also limited. If too many processors request
information from the same remote node then performance will drop.

There are different kind of data in a NUMA system:

Data that is node local is only accessed by the local processor. For node
local data we have no such concerns since the interlink is not used. Quite
a lot of kernel data per node or per cpu and thus is not a problem.

For shared data that is known to be performance critical--and where we
know that the data is accessed from multiple nodes--there we need to
balance the data between multiple nodes to avoid overloads and
to keep the system running at optimal speed. That is where interleave
becomes important.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/