[PATCH] pktcdvd: added sysfs interface + bio write queue handling fix

From: Thomas Maier
Date: Sun Sep 03 2006 - 14:22:10 EST


Hello,

this is a patch for the packet writing driver pktcdvd.
It adds a sysfs interface to the driver and a bio write
queue "congestion" handling.

The patch modifies following files of the Linux 2.6.17.11
source tree:
Documentation/cdrom/packet-writing.txt
include/linux/pktcdvd.h
drivers/block/pktcdvd.c
drivers/block/Kconfig
block/genhd.c

(genhd.c must be changed to export the block_subsys
symbol)

The bio write queue changes are in pktcdvd.c in functions:
pkt_make_request()
pkt_bio_finished()

Any comments and improvements are welcomed ;)


Why this patch?
===============
This driver uses an internal bio write queue to store
write requests from the block layer, passed to the driver
over its own make_request function.
I am using Linux 2.6.17 on an Athlon 64 X2, 2G RAM and while
writing huge files (>200M) to a DVDRAM using the pktcdvd driver,
the bio write queue raised >200000 entries! This led to
kernel out of memory Oops! e.g.:

----------------------------------------------------------
Aug 14 17:42:26 master vmunix: pktcdvd: 4473408kB available on disc
Aug 14 17:42:54 master vmunix: pktcdvd: write speed 4155kB/s
Aug 14 17:54:24 master vmunix: oom-killer: gfp_mask=0xd0, order=1
Aug 14 17:54:24 master vmunix: <c014346f> out_of_memory+0x12f/0x150 <c01452d0> __alloc_pages+0x280/0x2e0
Aug 14 17:54:24 master vmunix: <c015a52a> cache_alloc_refill+0x2ea/0x500 <c015a7a1> __kmalloc+0x61/0x70
Aug 14 17:54:24 master vmunix: <c039c0b3> __alloc_skb+0x53/0x110 <c03985b6> sock_alloc_send_skb+0x176/0x1c0
Aug 14 17:54:24 master vmunix: <c0399c5b> sock_def_readable+0x7b/0x80 <c041262b> unix_stream_sendmsg+0x1cb/0x310
Aug 14 17:54:24 master vmunix: <c039502b> do_sock_write+0xab/0xc0 <c0395720> sock_aio_write+0x80/0x90
Aug 14 17:54:24 master vmunix: <c011a609> __wake_up_common+0x39/0x60 <c015d984> do_sync_write+0xc4/0x100
Aug 14 17:54:47 master vmunix: printk: 10 messages suppressed.
Aug 14 17:54:47 master vmunix: oom-killer: gfp_mask=0xd0, order=0
Aug 14 17:54:47 master vmunix: <c014346f> out_of_memory+0x12f/0x150 <c01452d0> __alloc_pages+0x280/0x2e0
Aug 14 17:54:47 master vmunix: <c0258de2> __next_cpu+0x12/0x30 <c015a52a> cache_alloc_refill+0x2ea/0x500
Aug 14 17:54:47 master vmunix: <c015a23a> kmem_cache_alloc+0x4a/0x50 <c03987ea> sk_alloc+0x2a/0x150
Aug 14 17:54:47 master vmunix: <c03e3f8d> inet_create+0xed/0x320 <c03950a2> sock_alloc_inode+0x12/0x70
Aug 14 17:54:47 master vmunix: <c017790e> alloc_inode+0xce/0x180 <c03966f3> __sock_create+0x123/0x2f0
Aug 14 17:54:49 master vmunix: Total swap = 2152668kB
Aug 14 17:54:49 master vmunix: Free swap: 2152436kB
Aug 14 17:54:49 master vmunix: 524272 pages of RAM
Aug 14 17:54:49 master vmunix: 294896 pages of HIGHMEM
Aug 14 17:54:49 master vmunix: 5767 reserved pages
Aug 14 17:54:49 master vmunix: 238277 pages shared
Aug 14 17:54:49 master vmunix: 35 pages swap cached
Aug 14 17:54:49 master vmunix: 47682 pages dirty
Aug 14 17:54:49 master vmunix: 157861 pages writeback
Aug 14 17:54:49 master vmunix: 17359 pages mapped
Aug 14 17:54:49 master vmunix: 23835 pages slab
Aug 14 17:54:49 master vmunix: 176 pages pagetables
Aug 14 17:54:59 master vmunix: <c0145355> __get_free_pages+0x25/0x40
Aug 14 17:55:19 master vmunix: 294896 pages of HIGHM<6>5767 reserved pages
------------------------------------------------------------

It don't know exactly what is wrong in the kernel, but
it seems it must be something with the kernels memory handling.

To be able to use the pktcdvd driver now, i created this patch.
It simply limits the size of the bio write queue of the driver
to save kernel memory. Does not cure the "kernel bug", but the
symptom ;)
If the number of bio write requests would raise the bio
queue size over a high limit (congestion on), the
make_request function waits till the worker thread has
lowered the queue size below the "congestion off" mark.
The wait is similar to the wait in get_request_wait(),
called by the "normal" request function __make_request().

Peter Osterlund suggested to use the pair
clear_queue_congested()
blk_congestion_wait()
here. But i am not sure if this is the right way to do
it.


Also there is now a sysfs interface for the driver and the
procfs interface can be switched of by a kernel config
parameter.

Here are more informations about the new features of the driver,
that are added to packet-writing.txt by this patch:


Using the pktcdvd sysfs interface
---------------------------------

The pktcdvd module has a sysfs interface and can be controlled
by the tool "pktcdvd" that uses sysfs.

"pktcdvd" works similar to "pktsetup", e.g.:

# pktcdvd -a dev_name /dev/hdc
# mkudffs /dev/pktcdvd/dev_name
# mount -t udf -o rw,noatime /dev/pktcdvd/dev_name /dvdram
# cp files /dvdram
# umount /dvdram
# pktcdvd -r dev_name


The pktcdvd module exports these files in the sysfs:
( <pktdevname> is one of pktcdvd0..pktcdvd7 )
( <devid> is in format major:minor )

/sys/block/pktcdvd/
add (w) Write a block device id to create a
new pktcdvd device and map it the
block device.

remove (w) Write the pktcdvd device id or the
mapped block device id to it, to
remove the pktcdvd device.

device_map (r) Shows the device mapping in format:
<pktdevname> <pktdevid> <blkdevid>

packet_buffers (rw) Number of concurrent packets per
pktcdvd device. Used for new created
devices.


/sys/block/pktcdvd/<pktdevname>/packet/
stat (r) Show device status.

reset_stat (w) Write any value to it to reset some
pktcdvd device stat values, like
bytes read/written.

write_congestion_off (rw) If bio write queue size is below
this mark, accept new bio requests
from the block layer.

write_congestion_on (rw) If bio write queue size is higher
as this mark, do no longer accept
bio write requests from the block
layer and wait till the pktcdvd
device has processed enough bio's
so that bio write queue size is
below congestion off mark.

mapped_to Symbolic link to mapped block device
in the sysfs tree.




To use the pktcdvd sysfs interface directly, you can do:

# create a new pktcdvd device mapped to /dev/hdc
echo "22:0" >/sys/block/pktcdvd/add
cat /sys/block/pktcdvd/device_map
# assuming device pktcdvd0 was created, look at stat's
cat /sys/block/pktcdvd/pktcdvd0/packet/stat
# print the device id of the mapped block device
cat /sys/block/pktcdvd/pktcdvd0/packet/mapped_to/dev
# similar to
fgrep pktcdvd0 /sys/block/pktcdvd/device_map
# remove device, using pktcdvd0 device id 253:0
echo "253:0" >/sys/block/pktcdvd/remove
# same as using the mapped block device id 22:0
echo "22:0" >/sys/block/pktcdvd/remove


Bio write queue congestion marks
--------------------------------
The pktcdvd driver allows now to adjust the behaviour of the
internal bio write queue.
This can be done with the two write_congestion_[on|off] marks.
The driver does only accept up to write_congestion_on bio
write request from the i/o block layer, and waits till the
requests are processed by the mapped block device and
the queue size is below the write_congestion_off mark.
In previous versions of pktcdvd, the driver accepted all
incoming bio write request. This led sometimes to kernel
out of memory oops (maybe some bugs in the linux kernel ;)
CAUTION: use this options only if you know what you do!
The default settings for the congestion marks should be ok
for everyone.



-Thomas Maier

Attachment: pktcdvd-patch-2.6.17.11.bz2
Description: application/bzip2

Attachment: pktcdvd.bz2
Description: application/bzip2