Re: install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)

From: Andy Lutomirski
Date: Tue Mar 17 2015 - 21:45:08 EST


On Tue, Mar 17, 2015 at 6:43 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> On 03/16, Oleg Nesterov wrote:
>>
>> On 03/16, Andy Lutomirski wrote:
>> >
>> > Ick, you're probably right. For what it's worth, the vdso *seems* to
>> > be okay (on 64-bit only, and only if you don't poke at it too hard) if
>> > you mremap it in one piece. CRIU does that.
>>
>> I need to run away till tomorrow, but looking at this code even if "one piece"
>> case doesn't look right if it was cow'ed. I'll verify tomorrow.
>
> And I am still not sure this all is 100% correct, but I got lost in this code.
> Probably this is fine...
>
> But at least the bug exposed by the test-case looks clear:
>
> do_linear_fault:
>
> vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
> + vma->vm_pgoff;
> ...
>
> special_mapping_fault:
>
> pgoff = vmf->pgoff - vma->vm_pgoff;
>
>
> So special_mapping_fault() can only work if this mapping starts from the
> first page in ->pages[].
>
> So perhaps we need _something like_ the (wrong/incomplete) patch below...
>
> Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
> simply mmap the anon_inode file...

That's slightly tricky, I think, because it could start showing up in
/proc/PID/map_files or whatever it's called, and I don't think we want
that. I also don't want to commit to all special mappings everywhere
being semantically identical (there are already two kinds on both x86
and arm64, and I'd eventually like to have them vary per-process as
well). None of that precludes using non-null vm_file, but it's a
complication.

Your patch does look like a considerable improvement, though. Let me
see if I can find some time to fold it in with the rest of my special
mapping rework over the next few days.

--Andy

>
> Oleg.
>
> --- x/mm/mmap.c
> +++ x/mm/mmap.c
> @@ -2832,6 +2832,8 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
> return 0;
> }
>
> +bool is_special_vma(struct vm_area_struct *vma);
> +
> /*
> * Copy the vma structure to a new location in the same mm,
> * prior to moving page table entries, to effect an mremap move.
> @@ -2851,7 +2853,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
> * If anonymous vma has not yet been faulted, update new pgoff
> * to match new location, to increase its chance of merging.
> */
> - if (unlikely(!vma->vm_file && !vma->anon_vma)) {
> + if (unlikely(!vma->vm_file && !is_special_vma(vma) && !vma->anon_vma)) {
> pgoff = addr >> PAGE_SHIFT;
> faulted_in_anon_vma = false;
> }
> @@ -2953,6 +2955,11 @@ static const struct vm_operations_struct legacy_special_mapping_vmops = {
> .fault = special_mapping_fault,
> };
>
> +bool is_special_vma(struct vm_area_struct *vma)
> +{
> + return vma->vm_ops == &special_mapping_vmops;
> +}
> +
> static int special_mapping_fault(struct vm_area_struct *vma,
> struct vm_fault *vmf)
> {
> @@ -2965,7 +2972,7 @@ static int special_mapping_fault(struct vm_area_struct *vma,
> * We are allowed to do this because we are the mm; do not copy
> * this code into drivers!
> */
> - pgoff = vmf->pgoff - vma->vm_pgoff;
> + pgoff = vmf->pgoff;
>
> if (vma->vm_ops == &legacy_special_mapping_vmops)
> pages = vma->vm_private_data;
> @@ -3014,6 +3021,7 @@ static struct vm_area_struct *__install_special_mapping(
> if (ret)
> goto out;
>
> + vma->vm_pgoff = 0;
> mm->total_vm += len >> PAGE_SHIFT;
>
> perf_event_mmap(vma);
>



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/