Debugging Foreign Function Interface Weirdness in a Release Build

Our continuous integration setup over at Enarx has evolved into a full-blown testing matrix which exercises the codebase in both debug and release profiles across all feature configurations. This has helped dredge up a number of bugs and allowed us to clean up some conditional compilation issues that we don’t normally see.

Interestingly enough, one of my recent pull requests was passing tests across the board except for one release build configuration. Its debug counterpart passed without issue. To tell the truth, I never tried the release build when I was developing the pull request in the first place.

I thought that was pretty strange, so I thought the obvious starting place would be to reproduce it in my development environment. Sure enough, I could reproduce it 100% of the time.

The code in my pull request was based on interfacing with KVM using ioctls on /dev/kvm and I could see from the stack trace that one of my ioctl calls was failing. How weird that it would only fail in a release build but not a debug build!

Regardless, I still had some good information from the stack trace. It looks like when the ioctl fails, the error code I get from the kernel is -ENOMEM. The ioctl that’s failing is KVM_MEMORY_ENCRYPT_REG_REGION. Generally a good place to look when you get an error code from the kernel would be the kernel’s message buffer. A quick peek at that with dmesg reveals:

[1203871.615759] SEV: 34155266049 locked pages exceed the lock limit of 16.

34,155,266,049 pages?! No way. I’m trying to lock only a single page with my ioctl. Alright, well, let’s turn to the age-old printf debug strategy for some quick feedback as to what I’m actually sending into the kernel.

Here’s the struct that the kernel ioctl expects to carry the ioctl arguments:

struct kvm_enc_region {
	__u64 addr;
	__u64 size;
};

I can print out the number of pages I’m sending into this by dividing the size field by the size of a page (4096). My printf debugging confirms that I’m only sending in a single page in both debug and release builds.

It’s not the root cause, but it’s still helpful for helping me figure out where to look next. It has to be my code, because I can’t think of a reason why the Linux kernel would behave differently based on whether the calling process’s binary was compiled in release or debug modes.

Let’s take a look at the error case I’m running into inside the kernel. grep shows a few different places that KVM_MEMORY_ENCRYPT_REG_REGION is used.

case KVM_MEMORY_ENCRYPT_REG_REGION: {

That looks pretty promising. Let’s take a look at the call site in arch/x86/kvm/x86.c. Nothing here looks like it could return -ENOMEM very early, so I looked at some of the other functions called by this one.

There’s only one function here that looks like it influences the return value, so let’s jump into svm_register_enc_region. The first couple of error checks aren’t interesting here because they don’t return -ENOMEM.

There is one that returns -ENOMEM, and that’s only if kzalloc fails. Unlikely. And furthermore, I haven’t seen anything so far that would result in this message appearing in the kernel log:

[1203871.615759] SEV: 34155266049 locked pages exceed the lock limit of 16.

Immediately under that though is a call to a function called sev_pin_memory. Aha! This is promising. grep says this is defined in arch/x86/kvm/svm/sev.c. Not far into that function I see this:

	/* Calculate number of pages. */
	first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
	last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
	npages = (last - first + 1);

	locked = sev->pages_locked + npages;
	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
		pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", locked, lock_limit);
		return ERR_PTR(-ENOMEM);
	}

Nice. That’s the exact message that shows up in my kernel buffer! It looks like this code does a similar calculation to count the number of pages that the calling process wants to pin and then checks to see if it exceeds the maximum number of locked pages allowed on the system.

Something is happening that’s causing npages to be a huge number when it should only be 1. It’s time to look more closely at what I’m sending across the foreign function interface.

The Linux kernel wants to copy a struct from the calling process that looks like this:

struct kvm_enc_region {
	__u64 addr;
	__u64 size;
};

Here’s my corresponding structure in Rust:

#[repr(C)]
struct EncryptedRegion<'a> {
    addr: u64,
    len: u32,
    _phantom: PhantomData<&'a ()>,
}

The length field is wrong. The kernel expects a 16-byte structure but I am accidentally sending in a 12-byte structure.

The kernel is copying 16 bytes from my process, 12 of which are legitimate, however, the 4 bytes that correspond to my len field are getting clumped together with 4 other garbage bytes from my calling process, which would definitely explain why the kernel is calculating such an enormous number of pages.

But why is this only happening in release builds? Shouldn’t the number of pages be wrong in both builds? I don’t know why, exactly. I made a minimal reproducer that I believe represents my theory above and I noticed the same results. I think it’s a reasonable assumption that my stack frame in a debug build could be much different than a stack frame in an optimized release build.

Regardless of my understanding here, updating my struct to correctly mirror the kernel’s definition seemed like low-hanging fruit:

#[repr(C)]
struct EncryptedRegion<'a> {
    addr: u64,
    len: u64,
    _phantom: PhantomData<&'a ()>,
}

After doing that, everything works again on both build profiles! Yay!

What’s the moral of the story? If you’re going to use the foreign function interface (FFI), quadruple check both sides of that interface. Make sure they both match up.

Alternatively, base your code on automatically generated bindings to help reduce the possibility of this happening in the first place.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *