It makes sense to me to make the number of pools configurable and not tie
them directly to the number of nodes in a NUMA system.
In particular allow memory pools (i.e. instance of pg_dat_t) to be smaller
than a node size.
The smart things that I see has to happen is to allow a set of processes to
be attached to a set of memory pools and the OS basically enforcing
allocation in those constraints. I brought this up before and I think
Andrea proposed something similar. Allocation should take place in those
pools along the allocation levels based on GFP_MASK, so first allocate on
HIGH along all specified pools and if unsuccessful, then fallback on a
previous level.
With each pool we should associate a kswapd.
Making the size of the pools configurable allows to control the velocity at
which we can swap out. Standard Queuing theory: if we can't get the desired
througput, then increase the number of servers, here kswapd.
Comments...
-- Hubertus
Andrea Arcangeli <andrea@suse.de>@kvack.org on 05/02/2000 12:20:41 PM
Sent by: owner-linux-mm@kvack.org
To: riel@nl.linux.org
cc: Roger Larsson <roger.larsson@norran.net>,
linux-kernel@vger.rutgers.edu, linux-mm@kvack.org
Subject: Re: kswapd @ 60-80% CPU during heavy HD i/o.
On Tue, 2 May 2000, Rik van Riel wrote:
>That's a very bad idea.
However the lru_cache have definitely to be per-node and not global as now
in 2.3.99-pre6 and pre7-1 or you won't be able to do the smart things I
was mentining some day ago in linux-mm with NUMA.
My current tree looks like this:
#define LRU_SWAP_CACHE 0
#define LRU_NORMAL_CACHE 1
#define NR_LRU_CACHE 2
typedef struct lru_cache_s {
struct list_head heads[NR_LRU_CACHE];
unsigned long nr_cache_pages; /* pages in the lrus */
unsigned long nr_map_pages; /* pages temporarly out of the lru */
/* keep lock in a separate cacheline to avoid ping pong in SMP */
spinlock_t lock ____cacheline_aligned_in_smp;
} lru_cache_t;
struct bootmem_data;
typedef struct pglist_data {
int nr_zones;
zone_t node_zones[MAX_NR_ZONES];
gfpmask_zone_t node_gfpmask_zone[NR_GFPINDEX];
lru_cache_t lru_cache;
struct page *node_mem_map;
unsigned long *valid_addr_bitmap;
struct bootmem_data *bdata;
unsigned long node_start_paddr;
unsigned long node_start_mapnr;
unsigned long node_size;
int node_id;
struct pglist_data *node_next;
spinlock_t freelist_lock ____cacheline_aligned_in_smp;
} pg_data_t;
Stay tuned...
Andrea
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux.eu.org/Linux-MM/- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun May 07 2000 - 21:00:10 EST