
During the process of building Kubernetes containers from scratch, I had some trouble understanding how nsenter works and how to use it effectively. I decided to experiment with nsenter and related tools to gain a better understanding of their behavior and usage patterns. This document summarizes my findings and provides a guide for others who may want to explore similar concepts. The nsenter version used is from alpine-minirootfs-3.20.3.
1. Set Up Minimal Root Filesystems
mkdir -p /root/tung/{pause,a0,a1}
cd /root/tung
# You may want to change to your OS architecture, eg. `x86_64` or `aarch64`
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/aarch64/alpine-minirootfs-3.20.3-aarch64.tar.gz
tar -xzf alpine-minirootfs-3.20.3-aarch64.tar.gz -C pause
tar -xzf alpine-minirootfs-3.20.3-aarch64.tar.gz -C a0
tar -xzf alpine-minirootfs-3.20.3-aarch64.tar.gz -C a1Create marker files to identify each root:
touch /a-host
touch /root/tung/pause/a-pause
touch /root/tung/a0/a-a0
touch /root/tung/a1/a-a12. Launch a Container in New Namespaces
Start a shell in new namespaces using unshare:
unshare -Cunimpf chroot /root/tung/pause /bin/sh
# in the new shell
mount -t proc proc /proc
ps
# should see
PID USER TIME COMMAND
1 root 0:00 /bin/sh
3 root 0:00 ps- The
unsharecommand creates a new PID namespace and a new mount namespace before thechrootandshcommands are executed - The
--mount-procflag tells unshare to automatically mount a new, privateprocfs(theprocfilesystem) at/root/tung/pause/proc --mount-proc=/root/tung/pause/procdoesn't work in my Ubuntu 22.04 with kernel version5.15.0-141-generic, but works in Alpine Linux with kernel version6.1.0-37-arm64.
3. Find the Container PID
In another terminal:
# find the PAUSE process's PID, the process that is forked from the unshare command
ps aux | grep /bin/sh
PAUSE_PID=<pid>
# change current dir
cd /root/tung4. Explore nsenter Usage Patterns
Case 1: Run nsenter without arguments
nsenter- Program
${SHELL}is run by default, the value of${SHELL}is get from current namespaces lsshows files in/root/tungpwdreturns/root/tungls /shows host root, including/a-hostmount | grep /procshowsproc on /proc type procps auxshows host's processes
Case 2: Change Root with --root
nsenter --root=/root/tung/a0 /bin/sh- The program
/bin/shmust exist in the new root --rootchanges the filesystem root likechroot, eg.chroot /root/tung/a0lsshows files in/root/tung, even though the root is/root/tung/a0- The
--rootflag only changes the root directory for the new process, but it doesn't change the current working directory (CWD) - When we run the command above, the new shell's root directory is indeed set to
/root/tung/a0. However, becausensenterdoesn't change the CWD, the shell's current directory remains the same as the directory we were in when you executed the nsenter command
- The
pwdreturns empty- This is because of the combination of a changed root and a CWD that's outside of the new root
- The
pwdcommand attempts to show the absolute path of our current directory. However, since the current working directory/root/tungis not under the new root/root/tung/a0,pwdcan't construct a valid path to display
ls /shows files in/root/tung/a0, should seea-a0mountshowsmount: no /proc/mountsps auxshows nothingmount -t proc proc /procruns successfullyps auxshows host's processes- Clean:
umount /proc
Case 3: Enter PAUSE's Namespaces
nsenter -t $PAUSE_PID- Same as Case 1, but inside PAUSE's namespaces.
Case 4: Set Root to PAUSE's Root (Fails if Shell Missing)
nsenter -t $PAUSE_PID -r- Program is not given,
${SHELL}is run, which is/bin/bashindebianby default -rsets the root dir. If no dir is specified, set the root dir to the root dir of the target process, which is the PAUSE process- However, this command causes error
nsenter: failed to execute /bin/bash: No such file or directory, because/bin/bashdoesn't exist in the root/of the PAUSE process
Case 5: Explicitly Run /bin/sh in PAUSE's Root
nsenter -t $PAUSE_PID -r /bin/sh- Use
-rto set the root dir to the PAUSE process's root dir, which is/root/tung/pause lsshows files in/root/tung, even though the root is/root/tung/pause- Note:
/proc/$PAUSE_PID/rootis a symbolic link of/root/tung/pause
- Note:
pwdreturns emptyls /shows files in/root/tung/pause, should seea-pausemountshowsmount: no /proc/mounts- The
mountcommand, when run without arguments, looks for/proc/mountsfile to list the currently mounted filesystems. Without the--mount/-mflag, nsenter does not enter the target's mount namespace. Therefore, the shell's view of the mounts is still the host's, and the host's mount table doesn't have a newprocfsmounted at/procwithin that specific chrooted environment - Hence,
cat /proc/mountsshowscat: can't open '/proc/mounts': No such file or directory, even thoughls /procstill shows PAUSE's processes
- The
ps auxshows PAUSE's processes in the PAUSE's PID namespaces but not the currentps auxprocessps auxcommand (alsotoporhtop) relies on the contents of the/procfilesystem, not/proc/mounts
mount -t proc proc /procshows errormount: mounting proc on /proc failed: Invalid argument- This is because we didn't enter the new mount namespace first
- When we run
mount -t proc proc /procwithout being in the correct mount namespace, the command tries to mount aprocfsonto a directory that might already have a different filesystem mounted or isn't a valid mount point. In our case, the/root/tung/pause/procis already mounted and we are trying to remount from the host's namespaces
Case 6: Change Root to a0 in PAUSE's Namespaces
nsenter -t $PAUSE_PID --root=/root/tung/a0 /bin/shlsshows files in/root/tung, even though the root is/root/tung/a0pwdreturns emptyls /shows files in/root/tung/a0, shoule seea-a0mountshowsmount: no /proc/mountsps auxshows nothingmount -t proc proc /procruns successfullyps auxshows host's processes- Clean:
umount /proc
Case 7: Join All Namespaces of PAUSE and Change Root
nsenter -t $PAUSE_PID -a --root=/root/tung/a0 /bin/shlsshows files in host's root/, should seea-host, even though the root is/root/tung/a0pwdreturns emptyls /shows files in/root/tung/a0, shoule seea-a0mountshowsmount: no /proc/mounts- This is because
mountcommand looks for the content in/proc/mountsfile, which is/root/tung/a0/proc, which is empty or unexists
- This is because
ps auxshows nothing- This is because
ps auxlooks for the processes in/proc, which is/root/tung/a0/proc, which is empty
- This is because
mount -t proc proc /procshows errormount: mounting proc on /proc failed: Invalid argument- This error is a result of the environment
- The
mountcommand tries to mount aprocfsat the/procpath, which is/root/tung/a0/proc - However, without a pre-existing mount point or a properly configured mount table, this action fails
- The command likely fails because the
procfsis a special filesystem that needs a valid mount point to be created - This is quite difficult to explain as we may need to look at the
nsenter's implementation
Case 8: Join All Namespaces of PAUSE
nsenter -t $PAUSE_PID -a /bin/shlsshows files in host's root/, should seea-hostpwdshows host's root/ls /shows files in host's root/, should seea-hostmountshowsproc on /proc type procandproc on /root/tung/pause/proc type procps auxshows host's processes
Case 9: Join All Namespaces and Set Root to PAUSE's Root
nsenter -t $PAUSE_PID -a -r /bin/shlsshows files in host's root/, should seea-host, even though the root is/root/tung/pausepwdreturns emptyls /shows files in/root/tung/pause, should seea-pausemountshowsproc on /proc type procpsshows:1 root 0:00 /bin/shbelonging to the/bin/shcommand fromunsharecommand5 root 0:00 /bin/shbelonging to the/bin/shcommand from currentnsenterprocess7 root 0:00 psbelonging to the currentpscommand
- This case is somewhat what we expected, except that
lsandpwdshow weird output- To resolve this, we can run this instead
nsenter -t $PAUSE_PID -a -r /bin/sh -c "cd /; exec /bin/sh", but this is quite complicated
- To resolve this, we can run this instead
Case 10: Change Root to PAUSE's Root Before Entering Namespaces
nsenter -t $PAUSE_PID -a --root=/root/tung/pause /bin/shlsshows files in host's root/, should seea-host, even though the root is/root/tung/pausepwdreturns emptyls /shows files in/root/tung/pause, should seea-pausemountshowsmount: no /proc/mounts- Why when using
nsenter -t $PAUSE_PID -a -r /bin/sh,mountshowsproc on /proc type proc, but when usingnsenter -t $PAUSE_PID -a --root=/root/tung/pause /bin/sh,mountshowsmount: no /proc/mounts? - The problem is that the
chrootoperation with an explicit path happens beforensenterenters the target'smountnamespace. Whennsentertries to use/root/tung/pauseas the root, it's doing so from the context of the host's filesystem - The key difference is that
-ris the correct and safe way to enter the target's root directory, as it's based on the actual process state.--root=...is flawed because it hardcodes a path on the host filesystem that doesn't correspond to the newmountnamespace's root, causing thechrootoperation to fail to find the correctprocfsmount
- Why when using
mount -t proc proc /procshows errormount: mounting proc on /proc failed: Invalid argument- The
procfsis already mounted because we are in the PAUSE's namespaces
- The
Case 11: Create Container a0 using chroot in PAUSE's Namespaces
nsenter -t $PAUSE_PID -a chroot /root/tung/a0 /bin/shlsshows files in/root/tung/a0, should seea-a0pwdreturns/ls /shows files in/root/tung/a0, should seea-a0mountshowsmount: no /proc/mountsmount -t proc proc /procruns successfullypsshows:1 root 0:00 /bin/shbelonging to the/bin/shcommand fromunsharecommand5 root 0:00 /bin/shbelonging to the/bin/shcommand from currentnsenterprocess7 root 0:00 psbelonging to the currentpscommand
- Clean by running
umount /proc, otherwise the/root/tung/a0will always be mounted inside the PAUSE's namespaces- Note: in host's namespaces,
/procis not mounted in/root/tung/a0/proc, but in PAUSE's namespaces,/procis mounted in/root/tung/a0/proc. We can verify by going to host's namespaces and runls /root/tung/a0/proc, it shows empty - If we don't want to clean, we must use
overylayfs, eg.mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged
- Note: in host's namespaces,
With nsenter with chroot, we first enter all namespaces of the target process (including mount namespace), and then run chroot inside those namespaces. This means our shell is running with the same mount namespace as the PAUSE process, and we have the correct permissions and context to perform mounts like mount -t proc proc /proc.
| Command | When is root changed? | Which /proc do you see? |
|---|---|---|
--root=/root/tung/pause | Before entering namespaces | Host’s (may be empty/missing) |
chroot /root/tung/pause or chroot /proc/$PAUSE_PID/root | After entering namespaces | PAUSE’s (mounted and populated) |
This is is exactly what we expected when we want to create container A0 in the same namepsaces with the container PAUSE.
Case 12: Use PAUSE Container in Its Own Namespaces
nsenter -t $PAUSE_PID -a chroot /root/tung/pause /bin/shlsshows files in/root/tung/pause, should seea-pausepwdreturns/ls /shows files in/root/tung/pause, should seea-pausepsshows:1 root 0:00 /bin/shbelonging to the/bin/shcommand fromunsharecommand5 root 0:00 /bin/shbelonging to the/bin/shcommand from currentnsenterprocess7 root 0:00 psbelonging to the currentpscommand
This is is exactly what we expected when we want to execute commands in a container.
5. Key Learnings
- Use
nsenterto join namespaces of another process - Use
chrootto change the filesystem root is recommended, eg. cases 11, 12 - Use
--root/-ris not recommended to change the filesystem root as it will change the root before entering the process's namespaces, eg. cases 2, 5, 7, 10