[PATCH 0/4] zswap: Optimize compressed pool memory utilization

From: Srividya Desireddy
Date: Thu Feb 16 2017 - 08:05:31 EST



Could you please review this patch series and update if any corrections are needed in the patch-set.

-Srividya

On Fri, Aug 19, 2016 at 11:04 AM, Srividya Desireddy wrote:
> On 17 August 2016 at 18:08, Pekka Enberg wrote:
>> On Wed, Aug 17, 2016 at 1:03 PM, Srividya Desireddy
>> wrote:
>>> This series of patches optimize the memory utilized by zswap for storing
>>> the swapped out pages.
>>>
>>> Zswap is a cache which compresses the pages that are being swapped out
>>> and stores them into a dynamically allocated RAM-based memory pool.
>>> Experiments have shown that around 10-15% of pages stored in zswap are
>>> duplicates which results in 10-12% more RAM required to store these
>>> duplicate compressed pages. Around 10-20% of pages stored in zswap
>>> are zero-filled pages, but these pages are handled as normal pages by
>>> compressing and allocating memory in the pool.
>>>
>>> The following patch-set optimizes memory utilized by zswap by avoiding the
>>> storage of duplicate pages and zero-filled pages in zswap compressed memory
>>> pool.
>>>
>>> Patch 1/4: zswap: Share zpool memory of duplicate pages
>>> This patch shares compressed pool memory of the duplicate pages. When a new
>>> page is requested for swap-out to zswap; search for an identical page in
>>> the pages already stored in zswap. If an identical page is found then share
>>> the compressed page data of the identical page with the new page. This
>>> avoids allocation of memory in the compressed pool for a duplicate page.
>>> This feature is tested on devices with 1GB, 2GB and 3GB RAM by executing
>>> performance test at low memory conditions. Around 15-20% of the pages
>>> swapped are duplicate of the pages existing in zswap, resulting in 15%
>>> saving of zswap memory pool when compared to the baseline version.
>>>
>>> Test Parameters Baseline With patch Improvement
>>> Total RAM 955MB 955MB
>>> Available RAM 254MB 269MB 15MB
>>> Avg. App entry time 2.469sec 2.207sec 7%
>>> Avg. App close time 1.151sec 1.085sec 6%
>>> Apps launched in 1sec 5 12 7
>>>
>>> There is little overhead in zswap store function due to the search
>>> operation for finding duplicate pages. However, if duplicate page is
>>> found it saves the compression and allocation time of the page. The average
>>> overhead per zswap_frontswap_store() function call in the experimental
>>> device is 9us. There is no overhead in case of zswap_frontswap_load()
>>> operation.
>>>
>>> Patch 2/4: zswap: Enable/disable sharing of duplicate pages at runtime
>>> This patch adds a module parameter to enable or disable the sharing of
>>> duplicate zswap pages at runtime.
>>>
>>> Patch 3/4: zswap: Zero-filled pages handling
>>> This patch checks if a page to be stored in zswap is a zero-filled page
>>> (i.e. contents of the page are all zeros). If such page is found,
>>> compression and allocation of memory for the compressed page is avoided
>>> and instead the page is just marked as zero-filled page.
>>> Although, compressed size of a zero-filled page using LZO compressor is
>>> very less (52 bytes including zswap_header), this patch saves compression
>>> and allocation time during store operation and decompression time during
>>> zswap load operation for zero-filled pages. Experiments have shown that
>>> around 10-20% of pages stored in zswap are zero-filled.
>>
>> Aren't zero-filled pages already handled by patch 1/4 as their
>> contents match? So the overall memory saving is 52 bytes?
>>
>> - Pekka
>
> Thanks for the quick reply.
>
> Zero-filled pages can also be handled by patch 1/4. It performs
> searching of a duplicate page among existing stored pages in zswap.
> Its been observed that average search time to identify duplicate zero
> filled pages(using patch 1/4) is almost thrice compared to checking
> all pages for zero-filled.
>
> Also, in case of patch 1/4, the zswap_frontswap_load() operation requires
> the compressed zero-filled page to be decompressed. zswap_frontswap_load()
> function in patch 3/4 just fills the page with zeros while loading a
> zero-filled page and is faster than decompression.
>
> - Srividya