This post is intended to be to be a very straightforward walkthrough in XV6. I will explain what XV6 is, how to navigate it, and how to do some very simple tasks in the kernel code. Hopefully, this will prevent XV6 from seeming so overwhelming.
While XV6 is a the first interaction with a “large” codebase for a lot of students, the actual codebase is relatively small compared to other operating systems. It is also relatively well documented for this kind of codebase, although it may not be obvious how to navigate it. Thankfully, we do not need to understand everything up front. In this post, we will discuss how it is organized but only go into detail about the parts that we need.
Part One: Getting the Code
The code is mirrored at my github: https://github.com/tallendev/xv6-cuspring2019
In order to get the code, let’s pull it from github using this command:
This will replicate the repository in our directory.
Optional: Set Up A Git Repo
Now that you have the source code, it would be good to start tracking your own changes. With github, our code will be available anywhere and you will have many layers of backups in case you need to roll back your changes. I will walk you through how to upload to your own git repository. I will be using Github for this.
- Create a Github account and log in (it’s free!): https://github.com/join
- Create a new repository: https://github.com/new
- Make sure it is private so that we can implement our projects here.
- Back on the command line, enter the xv6-cuspring2019 directory.
- cd xv6-cuspring2019
- Change our remote (where on the internet we will save our changes)
- git remote set-url origin [yourgitrepo]
- Finally, let’s push our repo “upstream” to our online repo.
- git push
- Note that I am using the ssh client. To do this, you will need to generate a key, which will take some more steps. It is up to you. The https version will just work, but you will have to enter your username and password every time you want to push updates. You can follow the ssh guide here: https://help.github.com/en/articles/connecting-to-github-with-ssh
- Now, if we return to our github page (https://github.com/[yourusername]/xv6 or similar), we see that all our code is there! Later, I will show you how we can use this to backup our changes.
Part Two: First Look at XV6
So, what is XV6? XV6 is an operating system. Other operating systems you may be
familiar with are Linux, Windows, and MacOS (which is secretly Unix). Now, if
you are reading this, it is probably because you are coding in XV6. The objective
of your project likely involves some “kernel hacking,” and, of course, the XV6 source
code includes the kernel source code. It also includes some basic user library
functionality so that we can write userspace programs for XV6. These are the
two main categories for files in XV6: user, and kernel. It is important to
understand this separation because, as you know, user-mode programs cannot
directly use kernel functionality. User programs must interact with the kernel
through “syscalls,” special functions that ask the kernel to execute some kernel
code with kernel privileges. First, let’s take a look at XV6.
One of the first things people notice about XV6 is that it has a lot of code.
If we do a simple ls…
That’s a lot of files! Surely we don’t have to navigate this ourselves.
Let’s turn to an important resource: The manual. The manual is composed of two parts: The manual proper and the code. We will look at the code first. The code is available here:
The first page of the code tells us some important things. First, it tells us how the files are organized:
Notice that things are grouped by their relevance. “Basic Headers” lays out a lot of constants and conventions for XV6. We don’t need to look at those right away. “Entering XV6” is the code that starts XV6. Note that XV6, the operating system, is a program just like any other program that you have ever written. It does have some special requirements, though. Since it is the first code that the CPU runs, it needs some special machinery to set up and launch the kernel itself; this is in entry.S Ignoring that for now, main.c contains the main function for the OS! You could trace it to see how the kernel gets set up, but I will leave that for another time. The “processes” section is also important - these files contain the key machinery for running user programs and enabling OS features like multitasking and multiprogramming. We will not go to a low-enough level in this course to investigate the low-level hardware section. Finally, in the user-level section, init.c contains the code for the first user process. init is always running, until the OS is shut down. An important missing file from this list is user.h, containing the definitions for user-level functionality.
If XV6 is an operating system, does that mean I need to install it to replace my Linux/Windows/MacOS machine? No - we will be using QEMU as our system emulator. QEMU is software that acts like a virtual computer - it can run the OS for us, so that we do not have to run XV6 over our main OS. We can also restart and debug without physically rebooting the machine. Let’s try the OS out.
To run XV6, use the command make qemu. This will not work if we are
not present at the physical machine - it is a “graphical” emulator, requiring “X
forwarding” support. Two things I will not discuss here. Alternatively, try running
which stands for “no X”.
After an involved compilation process, we’re in! This is XV6. Below we will find just the qemu portion:
We see QEMU starting up XV6, two emulated CPUs being initialized (configurable in the Makefile), and some printout statements about our file system. Finally, init starts sh, the shell that we will work with. You have certainly seen and used bash before, the shell most commonly used on Linux and MacOS systems. The shell is a very powerful tool; I recommend you try to use it as much as possible when developing. If you don’t have an IDE taking care of basic shell functions for you, and you don’t know the shell, your development process might be very slow. Let’s run a basic ls to see what we have to work with:
These are user programs/files provided by the XV6 implementation. Many of these are implementations of tools you can use bash as well. These are all the tools your system has to work with; XV6 is very, very simple. You can add files to XV6, but we will discuss that later. For now, let’s back out: press “ctrl+a” then “x” to exit XV6.
Part Three: Let’s Add a Program!
XV6 can run basic user programs. Let’s write a simple one to demonstrate. Primarily, we will be looking at (1) the minor irregularities in the XV6 standard library compared to the standard library that you are used to, and (2) the Makefile.
Here is our simple hello.c in our XV6 directory. Notice a few things. First, printf() has an argument before the string. This the file descriptor that we want to print to; in this case “1” is synonymous with “stdout.” This is closer to how fprintf() is implemented on systems that you are used to. This is just a quirk of the standard library in XV6. You should define “stdout” as “1” somewhere; probably user.h. Also notice that instead of a return, we have a call to exit() that takes no arguments. Returning after main is a special case that is not supported in XV6, so just remember to always exit() at the end of main instead of return. Otherwise, you will get a “trap 14,” which in this case is XV6-speak for “Segmentation Fault.” Finally, notice that our headers are different. There is no #include <stdio.h> as you might be used to. Again, this is because the standard library is different. Check out these headers to see what kind of user utilities you have available.
To test this, we need to add the file to our Makefile to be built, as well as
include it in the list of files migrated to XV6. First, append the file hello.c
to the Makefile variable EXTRA.
Then, append _hello to the Makefile variable UPROGS (user programs).
Now, we are ready to try it out!
- Run make qemu-nox
- Run hello
And that’s all it takes to make a simple program for our XV6 VM! The next section will be about the same thing, but in kernel space.
Part Four: Let’s Write A Syscall!
Syscalls are our way of interacting with the kernel. Why do we need to interact with the kernel? Well, the kernel plays many roles:
- It provides a common abstraction for interacting with hardware.
- Otherwise we would have to write assembly code specific to each hard drive, network interface, etc…
- It provides safe ways to interact with the scheduler and other kernel features
- Example: sleep relies on the scheduler to wake us up only after a specified amount of time.
- Example: kill lets us send signals to other processes.
- Technically, this is still hardware (CPU) abstraction.
By adding a syscall, you are adding a “function” to the kernel that user code
can call. It’s all about safety. The kernel makes syscalls available because they’re useful,
but doesn’t allow any arbitrary kernel code to be called by any process. Otherwise,
any rogue process could take down your system either maliciously or accidentally.
Let’s write the kernel equivalent of “Hello World.” Where do we start? If we crack
open the XV6 manual to page 9, we will find a list of syscalls.
Let’s see what they did to implement getpid(), a relatively simple syscall that gets the process id of the calling process. We will use one of the most useful tools available on most systems in one form or another: grep. This tool just searches for a pattern that you give it among a list of files. No need to dig through every file individually!
Ignoring the two test calls, it looks like there are six appearances of getpid in the kernel. Most likely, we need to replicate all of them for our hello syscall.
- First, we will look at syscall.h. All syscalls have a number starting
from one. We need to take the next-available syscall. For me here, it’s 22.
- Next, let’s look at sysproc.c to find our actual syscall. We find
getpid. For now, let’s just add our syscall below it. Notice that all
syscalls have the same header.
- Now that our code is in place, we need to add our syscall to the syscall table.
The table is in syscall.c. The syscall table, called syscalls,
is just an array of function pointers. Syscalls are a little different than
normal functions; your code is going to pass an integer to the kernel designating
what syscall it wants. Then, that syscall is looked up from this table. Let’s add ours to the bottom, using the
syscall number that we already selected.
- Still in syscall.c, let’s add the prototype for our function. They use the
keyword extern here to indicate that the actual code is in another file
that will be linked against this one.
- We are almost through this beurocratic process. The kernel part is done - now
we just need to make our syscall available to users. First, we look in usys.S.
In this file, we find a macro definition, and then this macro is wrapped around all of
the syscall names. This is how we define our syscall in userspace. At the end
of the file, we append our SYSCALL(hello). Let’s stop for a second and look
at this macro. This function-style macro creates an assembly-style function. Note
the .globl name indicating that our argument (hello) will be used as as
a global label (function). When code jumps to this label, it will move the value of
$SYS_hello (our syscall number) into the eax register - passing
an argument. Then it will issue the int instruction with the value $T_SYSCALL.
The instruction int is a software-issued hardware interrupt. The value indicates
which interrupt - if you check traps.h, you will find the codes for all hardware
interrupts, and find that $T_SYSCALL is 64. traps.c contains
the entry point, that will eventually call the function sys_hello that we
wrote via the syscalls table.
- One last step! Everything is in place, so let’s add our syscall to the
user.h header file so that user programs can see it.
Alright, let’s modify our hello.c code to call our syscall.
You probably need to run a make clean before your make qemu-nox. Then just run our program again as normal:
And that’s it! We made and ran our syscall! I should note that printing in the kernel is usually quite bad. In this case it is the intended behavior, but on real systems kernel printing is reserved for logging information and will not work the same way as regular prints. A real syscall should do something more involved, but this will suffice for now.
A minor note: other syscalls take arguments. You will need to read an example of those in sysproc.c, because it works differently than a traditional syscall. In particular, you must use argint or argptr to fetch them out of memory.
Optional: Submit our Changes to Github
Now that we have made some changes, let’s add them to Github for safe-keeping.
- git status will show us what git knows about our files.
- All files must be “added” to git. First, let’s add our changed files using git add -u (-u for update).
- Our hello.c is new, so let’s add it manually. git add hello.c
- Now all our changes are “staged” by git. Let’s create a “commit.” A commit is when you “commit” your changes to git, saying “git remember the changes to these files.” Let’s issue a
git commit -m “adding hello syscall and testfile”
The -m indicates the commit message. In git, you are able to review all of your commits over time. The message helps you remember what you changed. Between commits, none of your changes are saved, so commit often so that you have many versions to roll back to.
- A last step, let’s do a git push to push our changes to github! Check your github page, and you will see the changes listed.