illumetrics – Tracking the growth of the illumos community

Even though last semester was probably the busiest semester I had so far in my college career, I managed to get more involved with the illumos community. I submitted my first bite-size bugfix to illumos-gate and then quickly moved to illumos-core to kill a warlock. In the midst of exploring the internals of Illumos and interacting with the community, I started having uneasy doubts about the future of the Illumos project.

I would never expect the illumos community to have the rapid growth of the Linux community, still in my eyes it seemed questionable that the community is even growing. I expressed my concerns to my good friend Nick Zivkovic whose been using illumos for years and who introduced me to the whole project. We couldn’t come to a conclusion without any actual data, and since I didn’t have that much time back then, I asked him if he could quickly hack together a script that we could use to gather data from the commits in illumos-gate.

Nick took my suggestion very seriously and basically made a whole infrastructure that gathers data from commits and activity on git-based projects. He also came up with the name “illumetrics” for this small project and decided to put it on github. Don’t get confused by the name, the project is not illumos-specific and can be extended to work with any project that uses git to manage its codebase. Excited by the new project, Nick created the illumetrics blog, where micro-reports and findings around the growth of the illumos community will be posted.

I will be also volunteering for the project when time allows. My personal hypothesis is that the number of people that are actually working on the internals of illumos-gate has been steady, if not declining over the years. Hopefully, the data from the gate (or its forks from illumos vendors like Joyent or OmniTI) will prove me wrong. Still, no matter what the case is, knowing where we stand in terms of manpower can help in deciding what needs to be done in terms of outreach in the future.

If you want to volunteer or add a project to the illumetrics list, feel free to email any of its volunteers (me and Nick for now) or submit a pull request to the github repository. Any ideas, comments or feedback are also welcome!

Hopefully 2015 will be a good year for the community. Robert Mustacchi released an excellent developer’s guide recently and I know that Garrett D’Amore and a couple of others from illumos will be at FOSDEM at the end of this month. What is more,  there is a guide for writing device drivers and the “Solaris Internals” book is still considered a very relevant reference for illumos. Given the above, I don’t see why aspiring kernel developers would not give illumos a try.

I am curious to see what illumetrics will have to say during the course of this year!

Notes on Testing: Nuking Legacy code at illumos-core

Last Friday, Robert Mustacchi shared a great guide that new developers can reference when working with the illumos build system. As I was reading it, I came across the section on testing changes in the basic workflow chapter. In that section Mustacchi gave some general guidelines on how a developer should decide the way in which their changes should be tested.

Recently on illumos-core, I was working on an issue that had the goal of removing some legacy code and required me to delete content from many files (around 293). After I made my edits and did a full nightly build, no errors showed up in my logs and I was able to boot into my new system. Garrett D’Amore reviewed my code to make sure that nothing suspicious was going on.

The problem was that many of the files that I had changed were drivers/systems that my system didn’t even use (random device drivers or SPARC-specific files). Since I am fairly new to illumos and kernel development in general, I had no idea how to test these specific changes. Luckily, D’Amore pointed me to the wsdiff(1) utility.

As you can see from the man page, if you keep your proto area before applying your changes, you can use it with wsdiff to find differences in the binaries between “before” and “after” your changes. One thing to remember is that DEBUG builds tend to carry debug macros which emit line numbers (e.g LINE macro) so if you run wsdiff between two DEBUG builds to test your changes, be sure that some, if not all, differences will come from there. (Check option -d of wsdiff for other relevant info)

Taking the issue that I mentioned above as an example, running plain wsdiff after my changes, a report with a huge amount of differences was generated, due to line numbers. Thus, I had to do two full non-DEBUG builds to make sure that my binaries stayed the same after my changes. Apparently it is also possible to invoke wsdiff through nightly(1), looking at Example 4 from the wsdiff man page.

Besides my case-study of removing legacy code though, I believe that wsdiff has its place in testing for other situations as well. Examples can be changes that touch a lot of files or have to do with libraries. Basically, any change that risks changing binaries that you are not supposed to, by mistake.

EDIT: A big thanks to Nick Zivkovic for pointing out mistakes in the initial version of this post.
EDIT 2: I suggested adding a section about this to the original guide. The above information is now part of the dev-guide.

Scripting GDB

Motivation

Two semesters ago, I was introduced to gdb in my systems programming course. Since this was the first time that I used an actual debugger, I was really impressed by the things it could do. I wanted to see what gdb was capable of, so I tried every single command on simple programs. Unfortunately, by the time I actually needed gdb in my class projects I forgot most of them.

To be honest, I was just using gdb whenever I got segmentation faults from my programs. I would quickly run to the point where the program had the segmentation fault and print a bunch of variables to see what was wrong with it. Sometimes, I’d also do a backtrace. This is probably more than enough for small and simple programs but when complexity grows it barely helps debugging. On many occasions, the bug originates in a point in the code that is distant from the point in the program where the existence of the bug became apparent. In these cases, just skimming to the end where the error shows up, doesn’t really help.

Being a tutor for CS classes in my school, I had a lot of students that had buggy programs which misbehaved this way. Since their programs had a large number of lines for course projects, I decided to encourage them to use gdb more often and learn more commands. Although that helped 30% of the students, the majority of them still had problems. They didn’t want to use gdb because it was really tedious and repetitive for them.

They were right! Most of their bugs would show up randomly in a function after the 20th time that function was executed and still they weren’t able to tell if the bug started before that function was called. Trying to figure out the root cause of the problem by running gdb in interactive mode was a pain. They would prefer looking at their source again and again to figure out the problem. Some of them would even rewrite their code using different (and usually more complex) program logic, hoping that the bug would go away.

I decided to have a look at the scripting capabilities of gdb and maybe try to help the students efficiently. I did find some useful information actually and managed to help a lot of students by showing them how to run gdb scripts. This whole experience motivated me to write this small tutorial on basic gdb scripting. Don’t get me wrong, I secretly like the challenge of debugging in programming. Also, scripting gdb is really easy and it doesn’t take that much time to teach. On the other hand, I would prefer spending my time as a tutor actually teaching people new material and not have my brain fried after trying to understand/debug some person’s code for hours.

Tutorial

By default during startup, gdb executes the file .gdbinit. This is where you write your gdb code. In case you want to have many scripts that test different things, you can tell gdb to look at other scripts besides the default one by adding the --command=<filename> argument when running gdb.

So let’s say that you want a backtrace every time a specific function is called. This is extremely useful when debugging recursive functions!. You would write the following code:

set pagination off
set logging file gdb.output
set logging on

# This line is a comment
break function_name
  command 1
  backtrace
  continue
end

run

set logging off
quit

So what happens here? I turn pagination off so I don’t have to press Enter for every page of gdb output. I declare that I want to log all the output of gdb in a file called gdb.output. Then I start logging gdb’s output and I set a breakpoint in my function. I use command <breakpoint number> (this case 1, because function_name is our first break point) to provide the commands that I want to be ran whenever gdb hits that breakpoint. As you can see I just print a backtrace and then continue until gdb hits another breakpoint. Now that everything is set, I run the program and when that finishes, I stop logging the output and quit gdb. That’s it! Now I can run gdb once and take a look at its output by opening gdb.output.

By the way, if your program takes certain arguments you can pass them as arguments in gdb. For example, run gdb --args ./a.out arg1 arg2 …etc . Another way is finding the line that you execute run on your script and change it to run arg1 arg2 …etc .

Most of the time though, you don’t really need all this output. What if you needed to break in a function when a certain parameter of that function is passed a specific value? What if you needed to break to a certain line of code the 3rd time it is executed? You can do all these as I am going to show in the example below. So let’s say, I want to break in function1 when one of its arguments, param1, is 32. I also want to break in line 142 of file.c when the variable x (which is in the same scope with the statement in line 142) is bigger than 4. Finally I want to set a breakpoint in function2 and I want gdb to break the first 3 times this function is executed.

set pagination off
set logging file gdb.output
set logging on

set $var = 0 # yes, you can declare variables ...

break function1 if param1 == 32
  command 1
  print param2
  print param3->member1
  continue
end

break file.c:142 if x > 4
  command 2
  print y
  call checker_function
  continue
end

break function2 if $var++ < 3
  command 3
  print $var
  backtrace full
  continue
end

run

set logging off
quit

That’s it! This is the end of this tutorial. In my humble opinion the above are more or less enough for programs of little to medium complexity. You can go ahead and learn more advanced features of gdb or research dynamic tracing that has been slowly coming to Linux (most UNIX platforms have that already). Even if you don’t program actively on your own for the time being, knowing how to run and script a debugger helps a lot in the improvement of open source programs. For example, if your favourite program suddenly crashes and you can send a crash/bug report, attaching a simple backtrace and other gdb output that displays extra information will save a lot of time for the developers that will try to fix your problem.

Some Final Notes

Before I finish this post, I just want to give a piece of advice when using gdb. If your program crashes for whatever reason and you have a rough idea where that happens, just look at the relevant part of code for a while to see if it makes sense. We often type things by mistake when we are absent-minded that compile and give off-by-one and other type of errors. If that part of code is complex just go through it using gdb in interactive mode, don’t start writing/editing scripts right away that target a bug you just found. You may actually spend more time writing a script to find a bug than actually stepping through your program in your head or using a debugger.

I would also like to encourage writing more checker functions in your source code that check conditions (or print out debug info on the program output during development) than huge gdb scripts with complicated logic. Checker function in your source can be called directly from your program AND from your gdb scripts too (see 2nd example script). Besides, checker function provide more flexible access to the internals of your program and they are written in the same language as the rest of your program.

EDIT: From a stylistic point of view, you should generally indent with tabs or spaces any statements between command <num> and end, so they kinda look like functions for breakpoints. I am sorry I wasn’t able to do so but I have a hard time using tabs with WordPress.

EDIT2: A big thanks to Nick Zivkovic for pointing out some mistakes in the initial version of this post.

EDIT3: Tabs in code sections seem to work (kinda) now that WordPress supports Markdown.