Macromancer's Spellbook: Using Macros to Stay Dry

31 May, 2018

It is sometimes hard to see past the thick haze of caution surrounding the use of C Preprocessor macros in your code. Indeed, the dangers of Macromancy are well-stated in many corners of the internet. You would do well to heed them.

However, there are times where a judicious use of macros can help reduce duplication in your code and make it tidier and easier to reason with. Every time a block of code is duplicated by hand, it is likely to become yet another maintenance burden to contend with. If it just so happens that some of these blocks of code contains a bug… well… you get to go fix it in more than one place.

I know! Why don’t we just extract the duplicated code into a function? Good instinct. In many cases, this may be the perfect solution to your problem. In my case, it wasn’t.

I recently upgraded the scheduling algorithm for my distribution of xv6 (a learning OS). I abstracted away the process table (an array of processes) by building multiple circular-linked state lists on top of it. Now, we have the infinitely preferable solution of not having to iterate over the entire process table when we want to find and manipulate some active processes.

Shortly into the implementation, I met my DRY monster for the project. There are many areas in the scheduling and process-handling routines where I need to traverse these state lists the exact same way, but interact with the processes differently while I’m traversing.

In my situation, the redundant code was the traversal of one or more state lists. However, embedded in those traversals were routine-specific blocks of code that operated on one or more unique local variables to that routine. As you can see, I don’t have a one-size fits all operation that could be neatly extracted into a function. I have a control structure and context-specific behavior to do within.

I could have certainly just copied and pasted my state list traversal logic into the four different routines where I needed it, but it felt so painful. What if I made a mistake? If four system-critical process routines were compromised with some stupid bug it would be was an absolutely hellish nightmare to debug. Trust me.

Here’s what it looked like at first:

struct proc **lists[] = {
  &ptable.plists.ready,
  &ptable.plists.run,
  &ptable.plists.sleep,
  // ... snipped..
};

for (int i = 0; i < NELEM(lists); ++i) {
  struct proc *rear = *(lists[i]);
  if (rear) {
  p = rear->next;
  do {

    // routine specific stuff goes here!

    p = p->next;
  } while (p != rear);
 }
}

It seems simple enough. It takes an array of circular-linked lists for it to traverse, and then inside the inner do-while is the routine-specific logic.

Copying and pasting it four times is too painful. C Preprocessor macros are just token expansions. Let’s have the compiler copy and paste it for us. The control structure for looping over an array and traversing circular-linked lists therein is the same for all four functions.

#define STATELIST_TRAVERSE(lists, cursor, code) for (int i = 0; i < NELEM(lists); ++i) { \
  if (*(lists[i])) { \
    cursor = (*(lists[i])); \
    struct proc *next = cursor->next; \
    do { \
      { \
        cursor = next; \
        next = cursor->next; \
        code \
      } \
    } while (*(lists[i]) && cursor != *(lists[i])); \
  } \
} \

It’s just a little bit hideous at first, but I got used to it as soon as I simply used that macro four times and everything just worked.

Let’s dissect this a little bit.

The backward slashes on the end of each line are so that the C Preprocessor knows that the macro substitution I’m defining will span multiple lines.

The macro takes three arguments, lists, cursor, and code. I designed it this way because I want the contract to be very clear regarding the macro’s usage. Whoever wants to use the macro must provide an array of circular-linked lists for me to traverse, they must provide their pointer by which the lists will be traversed (I call this the “cursor”), and they must supply the code they want executed with each iteration. The macro requires the caller to supply the “cursor” to the macro so that they may interact directly with whichever process the loop is currently visiting through that cursor. Basically, they can use the cursor like a really simple iterator for access to the element.

By doing this, I believe that the macro is transparent and does not obfuscate or try to take control from whoever is using it. It does not heavily pollute the routine with variable declarations within. It is a reliable scaffolding for its one purpose: traversing through a number of circular linked lists while exposing the elements to the programmer for whatever purposes they need.

Let’s see it in action. Here’s a part of the exit() routine for when a process is terminating:

// ...snipped
  struct proc **lists[] = {
    &ptable.plists.ready,
    &ptable.plists.run,
    &ptable.plists.zombie,
    &ptable.plists.sleep,
  };

  // Pass abandoned children to init.
  STATELIST_TRAVERSE(lists, p, {
      if (p->parent == curproc) {
        p->parent = initproc;
        if (p->state == ZOMBIE) {
          wakeup1(initproc);
        }
      }
  });
// ...snipped

Granted, it does look a little weird at first. As you can see, I create an array of circular-linked lists that I want the macro to traverse. Different routines will need to visit different numbers of lists, so this set-up is required for each call-site.

I pass in my pointer that I declared earlier in the routine. This will be used as my “cursor” to traverse the circular-linked list of processes.

The third argument, which I wrap in curly braces, is the routine-specific code block that I want executed at each process in the list. As you can see, I use the pointer that I passed in as the “cursor” so that I may interact directly with the element within the list.

I did it the hard way without the macro and it was painful. I abstracted it away into a macro and it trimmed down the redundant lines of code significantly across my process management routines.

Using a macro shouldn’t be your first choice for everything. If function extraction is not possible, or you are working in an environment that cannot support the overhead of function calls, etc. a macro can certainly help.

Remember to be cautious about what your macros are doing, what variables they introduce, and how they operate on arguments passed in to them.

Bonus story: I also created control commands for printing the contents of my various state lists with different formatting. Guess what really came in handy for that? The macro you just read about.