Re: [RFC PATCH] xfs: support for non-mmu architectures

From: Brian Foster
Date: Fri Nov 20 2015 - 10:11:13 EST


On Thu, Nov 19, 2015 at 10:54:34PM +0200, Octavian Purdila wrote:
> On Thu, Nov 19, 2015 at 5:55 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> > On Wed, Nov 18, 2015 at 12:46:21AM +0200, Octavian Purdila wrote:
> >> Naive implementation for non-mmu architectures: allocate physically
> >> contiguous xfs buffers with alloc_pages. Terribly inefficient with
> >> memory and fragmentation on high I/O loads but it may be good enough
> >> for basic usage (which most non-mmu architectures will need).
> >>
> >> This patch was tested with lklfuse [1] and basic operations seems to
> >> work even with 16MB allocated for LKL.
> >>
> >> [1] https://github.com/lkl/linux
> >>
> >> Signed-off-by: Octavian Purdila <octavian.purdila@xxxxxxxxx>
> >> ---
> >
> > Interesting, though this makes me wonder why we couldn't have a new
> > _XBF_VMEM (for example) buffer type that uses vmalloc(). I'm not
> > familiar with mmu-less context, but I see that mm/nommu.c has a
> > __vmalloc() interface that looks like it ultimately translates into an
> > alloc_pages() call. Would that accomplish what this patch is currently
> > trying to do?
> >
>
> Hi Brian,
>
> Ah, that sounds nice! We could get rid of the #ifdefs and use a common
> path in that case. vmalloc should work on non-mmu and it seems that
> there is a vmalloc_to_page() we can use to get the physical pages.
> I'll give it a try.
>
> Is there a central place where we could enable the new _XBF_VMEM to be
> the default for non-mmu arches?
>

I think a generic mechanism is the appropriate short term goal as
opposed to a common codepath. IOWs, add the ability to support a new
XBF_VMEM buffer type and enforce use of that type when !CONFIG_MMU. So
you'd still have an ifdef somewhere, but it wouldn't be shoehorned into
the current separate alloc/map implementation.

I don't think we want to change the existing buffer mechanism straight
away to use this based on some of the things Dave calls out. My hope is
more that we can do the following for the generic (CONFIG_MMU) case:

1.) Enable this selectively/randomly when CONFIG_XFS_DEBUG is enabled to
provide some test coverage of the XFS buffer type without the need for a
nommu setup (not a replacement for nommu testing, of course).

2.) Over time, move the current buffer mechanism over to this type if we
can eventually clean up some of the warts that pin us to our current
implementation.

I'd suggest to let some of the parallel discussions play out before
going too far down this path, however...

Brian

> > I ask because it seems like that would help clean up the code a bit, for
> > one. It might also facilitate some degree of testing of the XFS bits
> > (even if utilized sparingly in DEBUG mode if it weren't suitable enough
> > for generic/mmu use). We currently allocate and map the buffer pages
> > separately and I'm not sure if there's any particular reasons for doing
> > that outside of some congestion handling in the allocation code and
> > XBF_UNMAPPED buffers, the latter probably being irrelevant for nommu.
> > Any other thoughts on that?
> >
> >> fs/xfs/xfs_buf.c | 27 +++++++++++++++++++++++++++
> >> 1 file changed, 27 insertions(+)
> >>
> >> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> >> index 8ecffb3..50b5246 100644
> >> --- a/fs/xfs/xfs_buf.c
> >> +++ b/fs/xfs/xfs_buf.c
> > ...
> >> @@ -816,11 +835,19 @@ xfs_buf_get_uncached(
> >> if (error)
> >> goto fail_free_buf;
> >>
> >> +#ifdef CONFIG_MMU
> >> for (i = 0; i < page_count; i++) {
> >> bp->b_pages[i] = alloc_page(xb_to_gfp(flags));
> >> if (!bp->b_pages[i])
> >> goto fail_free_mem;
> >> }
> >> +#else
> >> + bp->b_pages[0] = alloc_pages(flags, order_base_2(page_count));
> >> + if (!bp->b_pages[0])
> >> + goto fail_free_buf;
> >> + for (i = 1; i < page_count; i++)
> >> + bp->b_pages[i] = bp->b_pages[i-1] + 1;
> >> +#endif
> >
> > We still have a path into __free_page() further down in this function if
> > _xfs_buf_map_pages() fails. Granted, _xfs_buf_map_pages() should
> > probably never fail in this case, but it still looks like a land mine at
> > the very least.
> >
>
> OK. Adding a i = 1; right before the #endif should take care of that,
> if the vmalloc approach does not work.
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/