When booting a 16TB system, unix_create1 fails due to integeroverflow.

From: Robin Holt
Date: Thu Sep 23 2010 - 08:17:14 EST



I do not know which direction to take, but here is the summary of the
problem.

We recently started trying to boot a customer's two new machines which
are configured with 384GB short of 16TB of memory.

We were seeing a failure which prevented boot. The kernel was incapable
of creating either a named pipe or unix domain socket. This comes down
to a common kernel function called unix_create1() which does:

atomic_inc(&unix_nr_socks);
if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

n = (mempages * (PAGE_SIZE / 1024)) / 10;
files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000). That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

We came up with a few possible solutions:

Our first response was to limit max_files to (INT_MAX / 2) This at
least got us past the problem and seemed reasonable.

We could also have changed the 2 * get_max_files() to 2UL *
get_max_files() and gotten past this point in boot. That was not tested.

We could also have changed the definition of max_files to at least an
unsigned int instead of an int and gotten past the problem, but again,
not tested.


Any suggestions for a direction would be appreciated.

Thank you,
Robin Holt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/