Jumping the Kernel-Userspace Boundary – Procfs and Ioctl

I recently had a need to have a very fast and scalable way to share moderate chunks of data between my experimental kernel module and the userspace application. Of course, there are many ways already available. Some of them are documented very nicely here. I will be writing in a few blog posts sharing what all mechanisms I have used to transfer data and provide such interfaces.

Procfs

I have used the Procfs before (with the seq_file API) when I needed to read my experimental results back in userspace and perform aggregation and further analysis there only. It usually consisted of a stream of data which I sent to my /proc/foo file. From a userspace perspective, it is essentially a trivial read-only operation in my case,

/* init stuff */
static struct proc_dir_entry *proc_entry;
/* Create procfs entry in module init */
proc_entry = proc_create("foo", 0, NULL, &foo_fops);
/* The operations*/
static const struct file_operations foo_fops = {
    .owner = THIS_MODULE,
    .open = foo_open,
    .read = seq_read,
    .llseek = seq_lseek,
    .release = single_release,
};
/ *Use seq_printf to provide access to some value from module */
static int foo_print(struct seq_file *m, void *v) {
    seq_printf(m, val);
    return 0;
}

static int foo_open(struct inode *inode, struct  file *file) {
    return single_open(file, foo_print, NULL);
}
/* Remove procfs entry in module exit */
remove_proc_entry("foo", NULL);

Ioctl

I also used ioctl before (More importantly, I call them eye-awk-till. *grins*). They are used in situations when the interaction between your userspace applicatoin and the module resembles actual commands on which action from the kernel has to be performed. With each command, the userspace can send a message containing some data which the module can use to take actions. As an example, consider a device driver for a device which measures temperature from 2 sensors in a cold room. The driver can provide certain commands which are executed when the userspace makes ioctls. Each commad is associated with a number called as ioctl number which the device developer chooses. In a smiliar fashion to Procfs interface, file_operations struct can be defined with a new entry and initializations are done in the module,

/* File operations */
static const struct file_operations temp_fops = {  
       .owner = THIS_MODULE,  
       .unlocked_ioctl = temp_ioctl, 
};
/* The ioctl */
int temp_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) {
	switch(cmd) {
	case READ_TEMP_SENSOR_1:	
		copy_to_user((char *)arg, temp_buff, 8);
		break;	
	case READ_TEMP_SENSOR_2:
		copy_to_user((char *)arg, temp_buff, 8);
		break;
        }
}

There are other complexities involved as well, such as using _IO(), _IOR() macros to define safe ioctl numbers. To know more about ioctl() call and how it is used, I suggest you read Chapter 7 from LKMPG. Note that newer kernels have some minor changes in code, hence refer to some device drivers using ioctls inlatest kernel releases. Each ioctl in our case means we have to use copy data from user to kernel or from kernel to user using copy_fom_user() or copy_to_user() functions. There is also no way to avoid the context switch. For small readings done ocassionally, this is an OK mechanism I would say. Consider that in a parallel universe, this sensor system aggregates temperature as well as a high quality thermal image in addition to each measurement. Also, there are thousands of such sensors spread across a lego factory and are being read each second from a common terminal. For such huge chunks of data accessed very frequently this each additional copy is a performance penalty. For such scenarios, I used the mmap() functionalty provided to share a part of memory between the kernel and userspace. I shall discuss more about Mmap in my next post.

Advertisements

Custom Kernel on Fedora 20

I recently stared experimenting with eBPF patches by Alexei Starovoitov which apparently are extending and restructuring the Berkeley Packet Filter infrastructure in the kernel. BPF is used for a lot of tasks like syscall filtering, packets filtering etc. and obviously deserves a separate and grand blog entry. Well, we’ll see what can be done about it later but for now, I’ll just explain how to do the trivial task of installing a custom kernel (possibly patched/upstream) on your Fedora machine. Its nothing grand, just those familiar textbook tasks. I am using the master from Alexei’s repo.

Build the kernel

git clone https://kernel.googlesource.com/pub/scm/linux/kernel/git/ast/bpf
cd bpf

I changed the EXTRAVERSION string in the Makefile just to differentiate it as I may have to use multiple versions while testing.

make menuconfig
make -j4

Set it up

sudo make modules_install
sudo cp arch/x86_64/boot/bzImage "/boot/vmlinuz-"`make kernelrelease`
sudo cp System.map "/boot/System.map-"`make kernelrelease`
sudo dracut "" `make kernelrelease`
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Restart your machine and enjoy the new kernel. Once you are done with the experiments, boot into the old kernel and remove the initrd, kernel image, and the custom system map. Then update the grub config once more. Thats it! I’ll post more about my experiments with eBPF soon.

 References

[1] http://fedoraproject.org/wiki/BuildingUpstreamKernel
[2] http://broken.build/2012/02/12/custom-kernel-on-fedora