Re: What protection does sysfs_readdir have with SMP/Preemption?

From: Maneesh Soni
Date: Wed Nov 23 2005 - 09:01:18 EST

On Wed, Nov 23, 2005 at 07:56:39AM -0500, Steven Rostedt wrote:
> The bug that I've been fighting in my own kernel is a memory leak. So I
> started looking into this at what would happen in verious places if an
> allocation didn't work.
> In create_dir:
> error = sysfs_make_dirent(p->d_fsdata, *d, k, mode, SYSFS_DIR);
> // Above is where the entry is added to the parent link list.
> if (!error) {
> error = sysfs_create(*d, mode, init_dir);
> // If sysfs_create fails to allocate an inode, when below
> // does the element get removed from the parent?
> if (!error) {
> p->d_inode->i_nlink++;
> (*d)->d_op = &sysfs_dentry_ops;
> d_rehash(*d);
> }
> }
> if (error && (error != -EEXIST)) {
> sysfs_put((*d)->d_fsdata);
> // sysfs_put only seems to handle the kobject portion
> d_drop(*d);
> // d_drop handles the unhash
> }
> dput(*d);
> So I'm not sure an error from sysfs_create will remove the object from the
> link list. In fact, it might be worst since now there's an object on the
> link list that may no long even be an object.
> I'll test this by forcing a failure at sysfs_create.

hmm looks like we got some situation which is not desirable and could lead
to bogus sysfs_dirent in the parent list. It may not be the exact problem
in this case though, but needs fixing IMO.

After sysfs_make_dirent(), the ref count for sysfs dirent will be 2.
(one from allocation, and after linking the new dentry to it). On
error from sysfs_create(), we do sysfs_put() once, decrementing the
ref count to 1. And again when the new dentry for which we couldn't
allocate the d_inode, is d_drop()'ed. In sysfs_d_iput() we again
sysfs_put(), and decrement the sysfs dirent's ref count to 0, which will
be the final sysfs_put(), and it will free the sysfs_dirent but never
unlinks it from the parent list. So, parent list could still will having
links to the freed sysfs_dirent in its s_children list.

so basically list_del_init(&sd->s_sibling) should be done in error path
in create_dir().

Could you also put the appended patch in your trial runs..


o Following patch corrects the buggy create_dir() error path. This bug could
end up in having a bogus sysfs_dirent in the parent list. Now the newly
allocated and linked sysfs_dirent is also un-linked in case of error
resulting from sysfs_create()

o Many thanks to Steven Rostedt and Ingo Molnar for pointing this out.

Signed-off-by: Maneesh Soni <maneesh@xxxxxxxxxx>

linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletion(-)

diff -puN fs/sysfs/dir.c~fix-create_dir-error-path fs/sysfs/dir.c
--- linux-2.6.15-rc2-mm1/fs/sysfs/dir.c~fix-create_dir-error-path 2005-11-23 18:59:36.072449992 +0530
+++ linux-2.6.15-rc2-mm1-maneesh/fs/sysfs/dir.c 2005-11-23 19:07:53.475833184 +0530
@@ -112,7 +112,9 @@ static int create_dir(struct kobject * k
if (error && (error != -EEXIST)) {
- sysfs_put((*d)->d_fsdata);
+ struct sysfs_dirent *sd = (*d)->d_fsdata;
+ list_del_init(&sd->s_sibling);
+ sysfs_put(sd);

Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: maneesh@xxxxxxxxxx
Phone: 91-80-25044990
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at