Re: [PATCH 0/5] Volatile Ranges (v12) & LSF-MM discussion fodder

From: John Stultz
Date: Mon Apr 07 2014 - 23:38:23 EST


On 04/07/2014 09:32 PM, Kevin Easton wrote:
> On Wed, Apr 02, 2014 at 10:40:16AM -0700, John Stultz wrote:
>> On Wed, Apr 2, 2014 at 9:36 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>>> I'm just dying to hear a "normal" use case then. :)
>> So the more "normal" use cause would be marking objects volatile and
>> then non-volatile w/o accessing them in-between. In this case the
>> zero-fill vs SIGBUS semantics don't really matter, its really just a
>> trade off in how we handle applications deviating (intentionally or
>> not) from this use case.
>>
>> So to maybe flesh out the context here for folks who are following
>> along (but weren't in the hallway at LSF :), Johannes made a fairly
>> interesting proposal (Johannes: Please correct me here where I'm maybe
>> slightly off here) to use only the dirty bits of the ptes to mark a
>> page as volatile. Then the kernel could reclaim these clean pages as
>> it needed, and when we marked the range as non-volatile, the pages
>> would be re-dirtied and if any of the pages were missing, we could
>> return a flag with the purged state. This had some different
>> semantics then what I've been working with for awhile (for example,
>> any writes to pages would implicitly clear volatility), so I wasn't
>> completely comfortable with it, but figured I'd think about it to see
>> if it could be done. Particularly since it would in some ways simplify
>> tmpfs/shm shared volatility that I'd eventually like to do.
> ...
>> Now, while for the case I'm personally most interested in (ashmem),
>> zero-fill would technically be ok, since that's what Android does.
>> Even so, I don't think its the best approach for the interface, since
>> applications may end up quite surprised by the results when they
>> accidentally don't follow the "don't touch volatile pages" rule.
>>
>> That point beside, I think the other problem with the page-cleaning
>> volatility approach is that there are other awkward side effects. For
>> example: Say an application marks a range as volatile. One page in the
>> range is then purged. The application, due to a bug or otherwise,
>> reads the volatile range. This causes the page to be zero-filled in,
>> and the application silently uses the corrupted data (which isn't
>> great). More problematic though, is that by faulting the page in,
>> they've in effect lost the purge state for that page. When the
>> application then goes to mark the range as non-volatile, all pages are
>> present, so we'd return that no pages were purged. From an
>> application perspective this is pretty ugly.
> The write-implicitly-clears-volatile semantics would actually be
> an advantage for some use cases. If you have a volatile cache of
> many sub-page-size objects, the application can just include at
> the start of each page "int present, in_use;". "present" is set
> to non-zero before marking volatile, and when the application wants
> unmark as volatile it writes to "in_use" and tests the value of
> "present". No need for a syscall at all, although it does take a
> minor fault.
>
> The syscall would be better for the case of large objects, though.
>
> Or is that fatally flawed?

Well, as you note, each object would then have to be page size or
smaller, which limits some of the potential use cases.

However, these semantics would match better to the MADV_FREE proposal
Minchan is pushing. So this method would work fine there.

thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/