devconf.cz Maintaining the kernel of an enterprise distro is not only hard work, it also involves conflicting goals.
A talk by Red Hat Principal Kernel Engineer Jiří Benc at this year’s DevConf.cz event covered some of the inherent contradictions in keeping an enterprise distro’s kernel on its feet. Or at least someone’s – or something’s – limbs, as its title suggests: “CentOS Frankenkernel: Add Your Limb.”
He focuses on the CentOS Stream kernel, which will eventually become the kernel of the next point-release of RHEL 9 – at the time of writing, that will be RHEL 9.3, but like other versions of RHEL 9, it will have a kernel 5.14 – released on August 29 2021. How can they achieve this?
The goals of any kernel update are simple: stability, obviously. There are no returns, and that also means no returns in performance. No API change, and no internal ABI change: in fact, no behavior change. But, at the same time, customers want new features, and support for new hardware, including new drivers; they want updates, at least any outstanding security updates. All without destroying whatever they’re currently using, because that’s what they’re paying for.
It’s a big question, and the result is bound to be a compromise. The team tries to deliver no functional regression, and limit performance regression to the essentials. To not make backwards-incompatible microAPI changes, and to avoid kernel ABI changes for important things. The problem is that people want new features… and new or updated drivers.
So, what the team is creating is a Frankenstein’s monster, pieced together from different codebases. Although the base kernel is still version 5.14, it is full of backports from upstream. It has the XFS filesystem code from kernel 6.0, the USB subsystem – complete with drivers – and BPF subsystem from kernel 6.2, the wireless stack and all drivers from kernel 6.3, and the multipath TCP/IP code from the kernel 6.4 – which at the time of the discussion had not yet been released into the stream. (It was released last weekend.)
It works because a many of testing and a very careful release process. Of course, the developer tests it himself, but it also undergoes continuous integration testing thanks to tools from the CKI project, as well as network-stack testing using LNST tools. Then, it undergoes preverification, meaning someone – someone other than the author – manually checks the change. Only then is the change merged into the CentOS kernel tree, after which it undergoes integration testing: checks against another 150 or so work-in-progress changes. Then, once all that is passed, it undergoes normal QA testing with the rest of the OS.
The results can be seen on CentOS Stream Gitlab – Benc is keen to emphasize that this is all happening publicly, and it’s all documented. In fact, anyone can open a request for such a change, by filing a bug in Bugzilla, or opening an issue in JIRA, according to a prescribed format: product… Version… Components… Subcomponent… Benefit… Trials. Similarly, there is also a very strict format for merge requests (which is Gitlab’s equivalent of Github’s pull requests), and for commit messages – and they must be followed exactly, because the messages have been- parsed by machines as well as by humans.
As long as the format is followed precisely, automation will come into play. It adds multiple labels, reviews subsequent fixes and patches from upstream, tags different people who should review and review the change, and more. All discussions are handled in the MR comments itself in Gitlab – except for dependencies, like drivers, as Gitlab is currently unable to handle these.
If you listen carefully to the Youtube stream of the conversation, the first question is from the Reg FOSS desk, asking if this does not overlap with the work of long-term support releases from upstream kernel developers. Benc told us that he thinks Red Hat’s level of testing and quality control exceeds that of the upstream LTS kernels, and it doesn’t deliver the level of stability an enterprise distro needs.
That was a little surprising to us, but it was an undeniably impressive amount of work and level of attention to detail. In light of the ongoing uproar that followed Red Hat’s removal of the RHEL source code from publication, the statement highlighted the enormous amount of work that goes into maintaining a distro, complete with a kernel version, for a life cycle of a full decade. RHEL 9.10, for example, is not planned to lose support until 2032.
This is the work Red Hat wants to get paid for, and the reason it’s still trying to find ways to exclude downstream rebuilds – as it has been doing for a dozen years. ®
#takes #Frankenkernel #enterprise #alive