Jumping the Kernel-Userspace Boundary – Procfs and Ioctl

I recently had a need to have a very fast and scalable way to share moderate chunks of data between my experimental kernel module and the userspace application. Of course, there are many ways already available. Some of them are documented very nicely here. I will be writing in a few blog posts sharing what all mechanisms I have used to transfer data and provide such interfaces.

Procfs

I have used the Procfs before (with the seq_file API) when I needed to read my experimental results back in userspace and perform aggregation and further analysis there only. It usually consisted of a stream of data which I sent to my /proc/foo file. From a userspace perspective, it is essentially a trivial read-only operation in my case,

/* init stuff */
static struct proc_dir_entry *proc_entry;
/* Create procfs entry in module init */
proc_entry = proc_create("foo", 0, NULL, &foo_fops);
/* The operations*/
static const struct file_operations foo_fops = {
    .owner = THIS_MODULE,
    .open = foo_open,
    .read = seq_read,
    .llseek = seq_lseek,
    .release = single_release,
};
/ *Use seq_printf to provide access to some value from module */
static int foo_print(struct seq_file *m, void *v) {
    seq_printf(m, val);
    return 0;
}

static int foo_open(struct inode *inode, struct  file *file) {
    return single_open(file, foo_print, NULL);
}
/* Remove procfs entry in module exit */
remove_proc_entry("foo", NULL);

Ioctl

I also used ioctl before (More importantly, I call them eye-awk-till. *grins*). They are used in situations when the interaction between your userspace applicatoin and the module resembles actual commands on which action from the kernel has to be performed. With each command, the userspace can send a message containing some data which the module can use to take actions. As an example, consider a device driver for a device which measures temperature from 2 sensors in a cold room. The driver can provide certain commands which are executed when the userspace makes ioctls. Each commad is associated with a number called as ioctl number which the device developer chooses. In a smiliar fashion to Procfs interface, file_operations struct can be defined with a new entry and initializations are done in the module,

/* File operations */
static const struct file_operations temp_fops = {  
       .owner = THIS_MODULE,  
       .unlocked_ioctl = temp_ioctl, 
};
/* The ioctl */
int temp_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) {
	switch(cmd) {
	case READ_TEMP_SENSOR_1:	
		copy_to_user((char *)arg, temp_buff, 8);
		break;	
	case READ_TEMP_SENSOR_2:
		copy_to_user((char *)arg, temp_buff, 8);
		break;
        }
}

There are other complexities involved as well, such as using _IO(), _IOR() macros to define safe ioctl numbers. To know more about ioctl() call and how it is used, I suggest you read Chapter 7 from LKMPG. Note that newer kernels have some minor changes in code, hence refer to some device drivers using ioctls inlatest kernel releases. Each ioctl in our case means we have to use copy data from user to kernel or from kernel to user using copy_fom_user() or copy_to_user() functions. There is also no way to avoid the context switch. For small readings done ocassionally, this is an OK mechanism I would say. Consider that in a parallel universe, this sensor system aggregates temperature as well as a high quality thermal image in addition to each measurement. Also, there are thousands of such sensors spread across a lego factory and are being read each second from a common terminal. For such huge chunks of data accessed very frequently this each additional copy is a performance penalty. For such scenarios, I used the mmap() functionalty provided to share a part of memory between the kernel and userspace. I shall discuss more about Mmap in my next post.

Advertisements

Ease your kernel tracing struggle with LTTng Addons

If you are new to Linux tracing and/or LTTng, go no further. Head on to the new and awesome LTTng Docs to know what this stuff is all about. I wrote an article on basics of LTTng and then followed it up with some more stuff a few month back too.

Ok, so now for those who have been using LTTng for sometime and especially, the kernel tracer, I am pretty sure, you must have faced a moment when you would have asked yourself – what if I could just modify those default tracepoints provided by LTTng and maybe add some more functionality to them. As an example, here is an interesting use-case I recently encountered –

Consider the netif_receive_skb event. A tracepoint is present in a function of the same name in the kernel. This function notifies the kernel whenever the data is received in the socket buffer for any of the net devices and when this event in enabled, the tracepoint is hit and skbaddr, data length and device name are recorded by default. For more info on how this works, refer to the TRACE_EVENT macro in the kernel. There are a few articles on LWN explaining how this all works. The LTTng tracepoint definitions are no alien to this mechanism. They basically act as a probe to hook onto these tracepoints in the kernel and are provided in the form of kernel modules. Refer to the internals of lttng-modules package and do have a look at what LTTng Docs have to say about this. Coming back to our use-case, so now consider that I want to just record a netif_receive_skb event only when its a localhost device (skb->dev->name == "lo"). Hmm, interesting.

Now, instead of forcing you to understand the deep internals of how the LTTng’s macros were magically working behind the scenes, Francis Giraldeau did some little sorcery and churned out…*drum roll*… lttng-addons! Checkout the addons branch in the repo. Apart from a massive help in running research experiments rapidly, it can be used for some practical scenarios too. Do have a look at the current modules available in that. I have added a new netif_receive_skb_filter event (provided in addons/lttng-skb-recv.c) to explain the use-case which we were discussing about previously. It can also act as a mini template for adding your own addon modules. Basically the flow is – create your module C file, make entry in Makefile, add the custom TRACE_EVENT entry in instrumentation/events/lttng-module/addons.h for your module, build and install modules, modprobe your new module, fire lttng-sessiond as root and then enable your custom event. Such happiness, much wow!

Once you have built the modules and installed them, restart lttng-sessiond as root and try to see if your newly created events are available:

$ lttng-list -k | grep netif
netif_receive_skb (loglevel: TRACE_EMERG (0)) (type: tracepoint)
netif_rx (loglevel: TRACE_EMERG (0)) (type: tracepoint)
netif_receive_skb_filter (loglevel: TRACE_EMERG (0)) (type: tracepoint)

Do the usual stuff next and have a look at the trace:

$ lttng create
$ lttng enable-event -k netif_receive_skb_filter
$ lttng start
$ ping -c2 localhost
$ ping -c2 suchakra.in
$ lttng stop
$ lttng view
[22:51:28.120319188] (+?.?????????) isengard netif_receive_skb_filter: { cpu_id = 3 }, { skbaddr = 18446612135363067648, len = 84, name = "lo" }
[22:51:28.120347949] (+0.000028761) isengard netif_receive_skb_filter: { cpu_id = 3 }, { skbaddr = 18446612137085857024, len = 84, name = "lo" }
[22:51:29.120071966] (+0.999724017) isengard netif_receive_skb_filter: { cpu_id = 3 }, { skbaddr = 18446612137085856768, len = 84, name = "lo" }
[22:51:29.120102320] (+0.000030354) isengard netif_receive_skb_filter: { cpu_id = 3 }, { skbaddr = 18446612137085857280, len = 84, name = "lo" }

And there you have it, your first filtered kernel trace with your first custom addon module. Happy tracing!