Memory mapped operation from user space on devices is a powerful technique to improve runtime performance of a user space application. The technique, conceptually is simple, yet is often hard to do correctly.
The technical basics are mostly summarized on a stackoverflow QA: Mapping DMA buffers to userspace. The key steps of different alternatives, are copied below:
- LDD chapter 15, page 435, Direct IO operations. Use kernel call of
- Asynchronous IO may achieve the same result, but without userspace application having to wait for the read to finish.
- Check the
infibanddrivers which goes to much effort to make zero-copy DMA and RDMA to user space work.
- The person also commented: Doing DMA directly to user space memory mappings is full of problems. Copying DMA’d data into the userspace buffers will save much of the grief.
- Preallcate n buffers with
myAddr[i] = pci_alloc_consistent(blah,size,&pci_addr[i])until it fails. On a machine with 4G space usually it gets 2.5G of buffers each 4MiB. Cat /proc/buddyinfo to verify. Tell the device to DMA data into the buffer and send interrupt to tell the driver which buffer has been filled. In user space mmap the buffer, then wait on read or ioctl till the driver tells it which buffer is usable.
get_user_pagesto pin the user pages and to get an array of
struct page *.
struct page *to get the DMA addresses, this also creates an IOMMU mapping. Tell the device to do the DMA.
dma_sync_single_for_cputo flush etc. Then
- Sometimes, all the normal operations as above will not work. The vendor (like on iMX51) provides special API for SDMA.
Though the topic is about DMA memory used by device operation, the approach is the same if a memory mapped memory is shared by multiple processes on different virtual mappings.