Re: [PATCH 1/7] mm: introduce simple_malloc()/simple_free()

From: Balbir Singh
Date: Sun Nov 16 2008 - 23:53:39 EST


Arjan van de Ven wrote:
> On Mon, 17 Nov 2008 07:39:55 +1000
> "Dave Airlie" <airlied@xxxxxxxxx> wrote:
>
>> On Mon, Nov 17, 2008 at 4:57 AM, Arjan van de Ven
>> <arjan@xxxxxxxxxxxxx> wrote:
>>> On Sun, 16 Nov 2008 00:19:26 -0800 (PST)
>>> David Miller <davem@xxxxxxxxxxxxx> wrote:
>>>
>>>> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
>>>> Date: Sat, 15 Nov 2008 20:52:29 -0800
>>>>
>>>>> On Sun, 16 Nov 2008 12:33:15 +0800
>>>>> Lai Jiangshan <laijs@xxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>> some subsystem needs vmalloc() when required memory is large.
>>>>>> but current kernel has not APIs for this requirement.
>>>>>> this patch introduces simple_malloc() and simple_free().
>>>>> I kinda really don't like this approach. vmalloc() (and
>>>>> especially, vfree()) is a really expensive operation, and
>>>>> vmalloc()'d memory is also slower (due to tlb pressure).
>>>>> Realistically, people should try hard to use small datastructure
>>>>> instead....
>>>> This is happening in many places, already, for good reason.
>>>>
>>>> There are lots of places where we can't (core hash tables, etc.)
>>>> and we want NUMA spreading and reliable allocation, and thus
>>>> vmalloc it is.
>>> vmalloc() isn't 100% evil; for truely long term stuff it's
>>> sometimes a quite reasonable solution.
>>>
>>> There are some issues with it still: the vmalloc() space is shared
>>> with ioremap, modules and others and it's not all that big on 32
>>> bit; on x86 you could well end up with only 64Mb total (after
>>> taking out the various ioremap's etc).
>>>
>>> Yes there's places where it's then totally fine to dip into this
>>> space at boot/init time. You mention a few very good users.
>>> (There's still the tlb miss cost on use but on modern cpus a tlb
>>> miss is actually quite cheap)
>>>
>>> But this doesn't make vmalloc() the magic bullet that solves the "oh
>>> Linux can't allocate large chunks of memory" problem. Specifically
>>> in driver space for things that get ported from other OSes.
>> So we keep the duplicated code? or we just audit new callers.... I
>> think this patch
>> makes it easier to spot new callers doing something stupid. As davem
>> said we duplicate
>> this code all over the place, so for that reason along a simple
>> wrapper makes things a lot
>> easier, and also possibly a lot easier to change in the future to a
>> new non-sucky API.
>>
>> So I'm all for it maybe with a non simple name.
>>
>
> I would go further than this.
>
> Make the code just use vmalloc(). Period.
>

But vmalloc() is always chunks of pages, not always desirable.

> But then make vmalloc() smart and try do a direct mapping allocation
> first, before falling back to a virtual mapping. (and based on size it
> wouldn't even try it for just big things)

If only slab/slub could do vmalloc() based caches, but vmalloc() is not the
common case worth optimizing for.

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/