How to get started in kernel hacking

Cameron MacKinnon (mackin@interlog.com)
Thu, 03 Apr 1997 02:16:24 -0500


Brian Smith wrote:

> Maybe a lot of the people wish to actually do kernel development, but
> haven't a clue how.. I'd love to work at it, and I have some ideas (stupid
> as they may be), and a rather decent knowledge of C.. however, I haven't
> the first clue how to get started in kernel hacking. I've been watching
> the list for about four months and reading code, etc., and just not
> getting anywhere... Problem is, I can't afford to go out and get a book or
> anythin. I'm on an incredibly limited budget, and $40 books just aren't on
> the list of things I can get. So, what this boils down to is, are there
> any free (i.e. Internet, whatever) resources I can get to help me learn?

Brian's got a point, but if he combs his hair right, nobody will notice
8-)

I used to be like you, Brian. Now I'm mucking around in the SCSI drivers
working on SCSI networking. I'm not a pro, but I generally know what's
going on for at least part of the time. Here's what I did:

I bought books. Here's reviews: LINUX Kernel Internals, Beck et al,
Addison Wesley, 0-201-87741-4. I read about a third of it. It's dated
(1.2 kernels) and doesn't have anything about SCSI in it, but it's the
only Linux kernel book out there. There's a new version out for 2.0
kernels, but only in the original German. "The Design and Implementation
of the 4.4 BSD Operating System", McKusick et al, Addison Wesley,
0-201-54979-4. A much more readable book, IMHO. It talks about the BSD
design in general, why things changed over time, why and how specific
performance tradeoffs were made, etcetera. Also, "The Magic Garden
Explained" or something like that, borrowed, pub. and ISBN unknown. This
book is a very thorough coverage of the design of System 5 Release 4
(SVR4), but not as easy to read as the BSD book. Bottom line: Beg,
borrow, check out or steal one book, any book, on the design of the UNIX
operating system. Sit in a library or a bookstore reading it, if you
haven't got the money. You need to understand how schedulers, pagers,
swappers, top and bottom halves, wait queues, inodes, ttys, the boot
process, init and some other stuff work. Most of this stuff will be
applicable to Linux at the concept level, regardless of the book (ignore
anything on SysV STREAMS). Unless you're extremely gifted, the concepts
won't reveal themselves to you from kernel source code. LEARN THE
CONCEPTS. The Linux community is not a good place to do this - this list
assumes that if you're here, you already know them. If you're one of
those truly unlucky people with no access to such a book, try to find
this info on the net. I've never really looked. If all else fails,
proceed to step two:

I read Michael Johnson's Kernel Hackers' Guide. It wasn't perfect when I
read it, but that was a while ago. 1) It's probably perfect by now. 2)
It's free. You can get it anywhere, including here:
http://www.redhat.com:8080/HyperNews/get/khg.html
It does a good job of mapping the concepts you just learned to actual
kernel function calls and processes in Linux. Also, many kernel
functions have man pages, though they're horribly out of date.

I subscribed to mailing lists. Initially I was all over: gcc, kernel, a
few scsi lists, security... Now I've got it down to a core of kernel,
two SCSI driver lists, DIALD, security and SMP. Don't be afraid to
subscribe to a lot of lists (read-only!) for a few weeks to see what
interests you. You can always unsubscribe later. Some people prefer
reading the lists via news, but I'd recommend mail: You SAVE the mail on
your hard disk. It becomes your personal reference library (N.B. UNIX
has some really great text search and processing tools). You read all
the mail. This gives you a feel for what's being worked on and what's
not, who knows what they're talking about and who doesn't, and what
snags are troubling other users. This is important so you can ask senior
developers PRIVATELY when you have questions relating to The Code -
unless you genuinely believe that a lot of list subscribers also want
the answer. Also, some of the news gateways appear to be brutally
broken, randomly mixing messages from different linux lists like a
cypherpunk remailer gone mad. I recommend going straight to the source:
send "help" to mailto:majordomo@vger.rutgers.edu

I quickly got over the idea that I could learn everything about the
kernel. Last time I looked, it was over 600,000 lines of source. I can
muck around with SCSI and network device drivers, I understand the mid
level SCSI code, and I've got a reasonably good handle on the scheduler.
That leaves high level networking, filesystems, the buffer cache and
memory management, to name a few, ABOUT WHICH I HAVEN'T A CLUE. Pick an
area you want to diddle with, and concentrate on that. If you don't
believe me, grab a dictionary and look up "hubris".

I read most (some?) of the important stuff in Documentation/ (you should
read it all) and then: I dove into the code, wholeheartedly, for nights
(days?) at a time. Pick drivers. Concentrate on the simple ones - you
want concepts, not nasty workarounds for buggy hardware. Try 'wc
*.c|sort' in your favourite directory. Pick ones that look well
formatted and well commented, and see how they're written and how they
interact with the higher level stuff. Go into each subdirectory in the
whole linux/ tree, and learn what lives there. You should be able to
identify what's what from the stuff you read in those books. Note
especially mm/ and kernel/, along with their counterparts under arch/.
Here lie most of the important functions for juggling memory,
interrupts, processes etcetera. Learn to use grep, find and xargs
effectively. If you have a strong constitution, look in the scripts/
directory and the Makefiles everywhere to see how the kernel actually
gets built. If you're a bit twiddler at heart, look at the low level
stuff for your favourite architecture under arch/.

If you've still got the lust for knowledge at this point, you will
probably have found "that special something" that interests you in the
kernel. You will know generally how things work from the source, and you
will know the right people to ask from the source and the mailing lists.
If you have a question, go ahead and ask it. I've found developers to be
very helpful when asked questions by someone who's obviously studied the
sources. Play around. Recompile. Benchmark. Test.

One thing that's probably overlooked by a lot of Linux people: BSD, "the
other free UNIX". I can't even tell you the difference between FreeBSD
and NetBSD, but for my purposes, I don't care. They're available free on
the net or a CD, just like Linux (http://ftp.freebsd.org and
http://www.freebsd.org). If you're stumped by something in Linux, seeing
how BSD does it is often helpful, especially for device drivers. Also
(ahem) BSD code sometimes seems to be commented and formatted somewhat
better. I don't run it, I just look at the source.

At this stage your hats will no longer fit, and your dog will have run
off with your girlfriend. No matter, because you'll be able to ask, and
sometimes answer, intelligent questions about kernel design, in your
particular specialty areas. You'll be fixing insidious bugs, improving
performance, and posting things like "this patch is from memory and
untested, but it will solve your problem on 2.1.87: [proper patch
syntax]"

I'm not at this stage yet, and I've been working at it for a while.
That's why I usually post answers to questions like "where do I begin"
rather than "why did it hang". The above is working for me, it might
work for you. May the Source be With You, Always.