I rebooted my workstation today.
It was kind of overdue for a reboot, on general principles, having been running continuously for... hmmm... not all that long; after all, I had everything turned off for the Grand Reconfiguration of just about a month ago.
Also, there was a new kernel version installed, and the update manager was nagging me for a reboot.
It did not go smoothly.
In fact, the reboot wouldn't complete. It got smpboot errors about not being able to wake up any of the CPUs, and then it rebooted.
Oh. Crap.
That's the affects-some-processors bug in the Intel microcode update that's meant to help with the Meltdown/Spectre thing isn't it?
As it turns out, probably not. It looked remarkably like the BIOS settings were getting corrupted (like, completely garbling the mapping for my US-English USB keyboard). And sometimes I could persuade the machine to boot a rescue CD, which would then proceed to work, at least somewhat.
Off to family dinner with the problem still not solved. Possible solution, if it turns out to be something in the Linux boot process: copy everything off the SSD to a USB drive, re-install everything, and copy stuff back. Oog.
Back from dinner. Consider the possibility that it's a hardware problem. Swap around the RAM modules, and wiggle various connections. Re-apply power, and...
... Oops.
Now it doesn't manage to power back up. It turns on a few lights, the CPU and video-card fans spin up, but then it shuts off for a few seconds, then tries again.
Well, maybe it's just a failing power supply...?
I guess in the morning I'll try something radical, like disconnecting everything that can be disconnected and seeing if it powers up. If not, off to - wait - I have a decent-sized ATX power supply around here, I think. Practically new. I can try that. If all else fails, I may be heading off to Central Computers for a new MB/CPU set. Preferably something that's compatible with the 16GB of RAM I have in that thing.
And, while I'm at it, one of the case fans does need replacing.
Update 1: This laptop is not really an acceptable substitute, what with its miserable little 15" screen and low-profile, compact keyboard. Also, while all the important stuff is accessible on, or from, the laptop, my morning-funnies links are on the workstation. I suppose I could retrieve them from the backup, but maybe I'll just find something else to prop against the milk carton this morning.
Also, I'm noticing that some sites render very differently in Chrome-on-the-laptop this morning than they did in Chrome-on-the-workstation yesterday. But, then, Chrome-on-the-phone reconfigured its default start page, again, within the last couple of days, so who knows what's going on? Google gaslighting, or site redesigns?
Update 2: Just after 0630 Pacific, and Chipzilla is up over three bucks from yesterday's close. Apparently yesterday afternoon's earnings report was better than expected. However, my contingency plan from 2 years back for INTC-at-$50 is no longer applicable, for various reasons. Probably. Maybe it could be adapted, though.
Update 3: This update is brought to you by the phrase Have you tried unplugging all the internal cables and modules and plugging them back in again? and by the number 137.036-j42.
Ol' workstation is back up. I solved the problem by the Socratic method, i.e., I started asking questions systematically, and persisted until the problem solved itself. Or is that the Columbic method? Ask increasingly irritating questions until the suspect confesses? Only I didn't get a confession, just obedience.
I think maybe I should find some sort of bootable diagnostics thingy to try on this, and maybe run an overnight RAM test or something. But, for now, 'tis working. I should still round up a replacement for the worn-out case fan before I button it back up.
Meanwhile, Chipzilla's share price, oblivious both to my plight and to the revelation that I wouldn't have to run out and buy a new CPU this week, is up even more. Just a couple of days ago, the pundits were proclaiming gloom, doom, and a $43 price target. Hmmm.
Update 4: Failure to boot. Keyboard behavior erratic. Could this have been a Gate A20 issue? (Oops. My age is showing. Maybe I need a nap, now that the machine is working again and the panic is over.)
Update 5: Friday night, I left memtest86 running for 7¾ hours. No errors. So where does that leave us?
Well, clearly I can't blame Intel nor Debian for the initial string of smpboot failures, given that, eventually, things returned to normal with no changes to the software configuration. The cause of the motherboard/CPU frotzment remains a mystery.
The keyboard weirdness also remains a mystery, but it may be a separate one; I'm developing a suspicion that it's something to do with the new KVM switch - perhaps some curious interaction between it and the BIOS.
Whatever. The machine is running again. I have a vague urge to tweak the BIOS settings for CPU speed, to take advantage once again of the unlocked CPU, but I don't think that's really necessary, as I haven't been doing lengthy, CPU-intensive things lately, and speeding up 5-second compiles by a few percent isn't that big a deal.
This leaves a couple of nagging annoyances: KDE tends, on reboot, to forget where some of the more recently added icons belong on my desktop and what size they should be (they end up tiny, and placed randomly, often in a cluttered area), and if I leave the apt entry for Google Earth Pro intact I get frequent pop-ups informing me that the Packages file for the i386 architecture is missing (I'm on amd64, of course, but have multiarch turned on in support of some legacy software that's 32-bit-only).
Oh, here we go. Second irritation solved; sticking [arch=amd64]
into the APT source line makes it not look for other architectures.
Recent Comments