The /proc Filesystem & Parsing the /proc/[pid]/map File
The Linux operating system contains a pseudo file system named the /proc file system. It is created as the system starts up, and is dissolved as soon as the system shuts down. This system acts as an interface to internal data structures in the kernel. The /proc file system may be used to obtain information about your current running Linux system. At runtime, the /proc file system may also be used to change certain Linux kernel parameters.
The Proc File System is located in a directory named /proc, and is divided into sub-directories which contain text files. There is one sub-directory for each process that is currently running on the system. These processes are listed according to their Process IDs or PIDs. So, the sub-directories look something like this: /proc/<pid>/. There are about 20 individual sub-directories in the /proc/ directory, however, in this article, I will only be discussing a few of these sub-directories for the purpose of describing how to parse the proc/maps folder.
Inside the proc/<pid> file, we find that multiple sub-directories are listed. The most important ones, at least for the purpose of parsing the maps file are:
proc/<pid>/status;
proc/<pid>/fd;
proc/<pid>/cmdline;
proc/<pid>/environ;
proc/<pid>/mem and
proc/<pid>/maps.
The proc/<pid>/status file contains several important pieces of information including the name of the command being run by the process, the pid and ppid of the process, and the uid of the user who owns the process.
The proc/<pid>/fd file is a list of symbolic links. These symbolic links are named for their file descriptor, hence the name “fd” So, 0 would be listed as standard input, 1 would be standard output, 2 is standard error, etc and so on.
The proc/<pid>/cmdline file holds the complete command line for the process. Something cool is that this file gets updated after an execve() call. The arguments will always show up as a set of null byte terminated strings.
The proc/<pid>/environ file contains the initial environment that was set at the time of running execve(). The list entries are separated by null bytes, however, there may not be a null byte at the end of the list. A quick example to print the environment for process 1, you would type into your terminal:
$ cat /proc/1/environ | tr ‘\000’ ‘\n’
One thing to remember is that this process will NOT update the environment if something like putenv() is run, or doing something that would directly modify the environ global variable.
The proc/<pid>/maps/ file contains six regions of information. They are address, permissions, offset, device, inode, and pathname. The most useful to us in terms of parsing the maps file, are the address field and the perms field. The address field is the memory space that the process is currently mapped. The perms field is a set of permissions. The permissions are pretty standard: r is read, w is write, x is execute, s is shared, and p is private. There is also the heap field, which will help us determine if we are in the correct area of memory. So, using the address, permissions and heap fields, it is possible for us to see if we are in the correct area of memory, if that memory matches, and if we have read / write access to that memory. If we have all of those, we can theoretically overwrite the file to be whatever we want. The /maps file must be parsed in order to access the various sections in the /mem file.
The proc/<pid>/mem/ contains a file that can be used to access the process’s memory file. We can access the process using open(), read(), and lseek(). When we open and read the process’ memory, we can see that the pid’s memory is mapped the same way it is in the process. That’s how we can make sure we are finding the memory we are looking for. Another thing to remember is that unless the user is root, it is not possible to read this file from a different process.
The technique I used to parse the maps file was to create a python script to automate the process. The first thing the script does, is it goes in and finds the desired process’ PID. Next, we match that PID to the one we want to overwrite, and use that folder of the /proc/<pid>. That way we know we are in the correct folder and will not overwrite something important. From here, we locate our process in the heap, then compare heap memory locations. Upon validation of the memory location, we can then, in theory, overwrite the mem file with whatever we want.
Hopefully that makes sense. I wrote a python script that will take a single looping process and replace it with a string. That code can be found here:
https://github.com/dreeseh/holbertonschool-system_linux/blob/main/0x01-proc_filesystem/read_write_heap.py
take a peek and run it on your own virtual machine if you’re really into it.
Have a nice day!