Boot failure: /init: line 256: /sbin/zpool: Text file busy (AMD 7950x3d)
Over the last couple of months I'd noticed a strange message which appeared at boot time:
error: out of memory.
Press any key to continue...
Up until now I hadn't had to do anything except let the system sit for a few seconds as it would appear to 'self-correct' and boot me into Ubuntu 22.04. Today I experienced a different issue where my zfs root partition will not mount and I'm dropped to a busybox shell and left to wonder what went wrong?
Begin: Running /scripts/local-premount ... done.
[ 8.710953] ZFS: Loaded module v2.1.5-1ubuntu6, ZFS pool version 5000, ZFS filesystem version 5
Begin: Importing ZFS root pool 'rpool' ... Begin: Importing pool 'rpool' using defaults ... Failure: 126
Begin: Importing pool 'rpool' using cachefile. ... Failure: 126
Command /sbin/zpool import -c /etc/zfs/zpool.cache -N 'rpool'
Message: /init: line 256: /sbin/zpool: Text file busy
Error: 126
Failed to import pool 'rpool'.
Manually import the pool and exit.
When I try to directly re-run the zpool command above, I get a Text File Busy error, which is entirely unhelpful...
(initramfs) /sbin/zpool import -c /etc/zfs/zpool.cache -N 'rpool'
Message: /init: line 256: /sbin/zpool: Text file busy
Error: 126
Eventually I was able to find the solution to my boot issue.
References
- Comment #17 on Can't boot: "error: out of memory." immediately after the grub menu [bugs.launchpad.net]
- Flickering or constant solid white screen with kernel >=6.1.4 w/ 64GB+ RAM [gitlab.freedesktop.org]
- Ubuntu 22.04 Live USB fails, reports "Out of Memory" with no details, even after working on other machines [askubuntu.com]
- If you cannot boot the new 6.0 "oem" kernel with "out of memory" error right after GRUB [reddit.com/r/tuxedocomputers]
System Configuration
- AMD 7950X3D
- 192GB RAM
- Gigabyte X670 AORUS ELITE AX (rev. 1.0)
- 3440x1440 widescreen display
- Ubuntu 22.04
- Kernel 5.19.0-50
Troubleshooting
I spent a couple of hours trying to investigate why zpool would fail in such a strange way (ETXTBSY signal) which yielded essentially no helpful information... With that path blocked I wondered if the 'out of memory' error I'd been seeing had anything to do with the issue. Upon investigation, I found that even though I have 192GB RAM there is likely an issue with the 'lower addressable RAM' filling up. This is largely due to loading NVIDIA drivers (as I have both an RTX 4090 and a RTX 6000 ADA installed).
The recommended solution involved me updating the grub gfxmode to 800x600 to 'free up' lower-addressable RAM space . This allowed me to successfully boot my system. Interestingly enough... after rebooting and performing an 'apt-get upgrade' a new kernel dropped (6.2, seemingly an 'upgrade' from the 5.19 base I had been running previously...) and this kernel does NOT require the gfxmode work-around to boot correctly. I no longer see any 'out of memory' errors and the zfs root volume mounts without errors or troubles.
Unfortunately, this spurred a new issue where anytime I clicked on a window or title bar in gnome that the entire screen would flash white. To address this issue I had to adjust the VRAM allocated to the Integrated Graphics Processor (IGP), which was a bit complicated. However, once I adjusted the bios this fixed the flashing white screen issue
Work-around Steps
- Boot to grub
- Press e to edit the kernel boot parameters
- On the gfxmode line, change it to this:
- gfxmode=800x600
- Press F10 to boot the system using this kernel option
I performed a system update and reboot at this point, which landed me a new kernel which does not have this issue:
- apt update && sudo apt upgrade -y
- sudo shutdown -r now
To address the flashing white screen issue, I had to go to the Gigabyte BIOS. Here's a screencap of the specific settings I had to change:
Forcing the VRAM to 4GB ensures the flashing goes away for now...
Final Thoughts
It's been awhile since I've built and operated a 'desktop' system. While the performance is undeniable, it is frustrating to have to go through exercises like this to tune the bugs out of a system. Generally speaking my Linux laptops have been fairly trouble free for the last 10 years. Hopefully this is the last of the ghosts to excise for now!