What a day. Hopefully no more of these for a while.
So. PHP has a feature where it will save the code for the pages it runs (PHP scripts are basically a program that gets run on each page view) in some form after it’s digested the PHP into something it can run fasterm because it skips some of the repetitive prep work to run it. Savvy readers know this as “bytecode”.
This feature is called OPcache, and it’s enabled by default in most PHP installations. But here’s the thing - there’s also an older version of the same idea, called APC. I’m pretty sure it’s mutually incompatible with OPcache, and that’s the problem: CentOS, because its packages are fairly modular, includes OPcache as an optional install, not a default one.
As it turns out, this means that every page load was slamming the disk to re-read the PHP script (somewhat expensive), turning it into bytecode (very expensive), then running it (so-so), when ideally, it would only need to do the last of those steps. Any measure to reduce the number of times a page has to be produced (by using a cache) was, therefore, not really addressing the root issue - burning too much CPU per page load - and the server was buckling under the load.
Since that change, the server has seen a dramatic dropoff in CPU usage - in some cases 20x less, though under load, more like 5x less CPU consumption. I really, /really/ hope this one sticks!