Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()

From: Cong Wang
Date: Tue Apr 16 2019 - 22:31:49 EST


On Tue, Apr 16, 2019 at 6:53 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> On Tue, Apr 16, 2019 at 04:47:55PM -0700, Cong Wang wrote:
> > 229 static void del_elem(struct ce_array *ca, int idx)
> > 230 {
> > 231 /* Save us a function call when deleting the last element. */
> > 232 if (ca->n - (idx + 1))
> > 233 memmove((void *)&ca->array[idx],
> > 234 (void *)&ca->array[idx + 1],
> > 235 (ca->n - (idx + 1)) * sizeof(u64));
> > 236
> > 237 ca->n--;
> > 238 }
> >
> > idx is ca->n and ca->n is MAX_ELEMS-1, then the above if statement
> > becomes true, therefore idx+1 is MAX_ELEMS which is just beyond
> > the valid range.
>
> Is that really the memmove() where we die? It looks like
> it has a special case for dealing with the last element.

Yes it is, I have a stacktrace in production which clearly shows
del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
PFN's. I can show you if you want, it is on 4.14, so very unlikely
it is interesting to anyone here.



>
> But this:
>
> 296 ret = find_elem(ca, pfn, &to);
> 297 if (ret < 0) {
> 298 /*
> 299 * Shift range [to-end] to make room for one more element.
> 300 */
> 301 memmove((void *)&ca->array[to + 1],
> 302 (void *)&ca->array[to],
> 303 (ca->n - to) * sizeof(u64));
> 304
>
> looks like it also needs a special case for when "to == MAX_ELEMS-1"
> (we don't need to memmove).

In the specific I talked about, find_elem() returns non-negative, so it won't
even hit this branch. Remember how it passed the if check
(this_pfn == pfn)? ;)


Thanks.