Can be exciting!
Or, as in this case, merely frustrating. I think the gremlin I've spent the last week chasing was actually a ground loop, of the electrical variety.
See, there's this engineering lash-up, prototyping the various subsystems of a product under development. Being as how it's a lash-up, with each subsystem on its own board, there are wires everywhere.
There are four boards in this lash-up: one of type A, one of type B, and two of type C. Various boards of type D (or D', D'', etc.) may or may not be connected to the type C boards.
There's been a Heisenbug afflicting board C0, and not board C1. Replacing C0 changes the problem, but doesn't actually fix it. So... are all the type C boards faulty, except for the enchanted C1? Or...
The type C board has a power connection (power and ground), and a data connection (USB). There's a quad UART on the board, powered from the bus (the power connection is for some medium-power drivers).
Certain sorts of activity cause the UART to drop off the bus, and then re-enumerate. This only happens with the power supply at or near rated voltage; running at half voltage, there's no problem.
For the past few days, I've been chasing the notion that the UART (or the Linux FTDI driver) freaked out if certain modem-control inputs (in particular, DCD) changed too often; some of the early type C boards had interesting problems that led to, e.g., 1.4 MHz square waves on all four DCD lines once a certain voltage threshold was crossed.
But, this hypothesis really wasn't consistent with what I was seeing this afternoon. It really seemed to be position-related. And... um.
The one thing different about the C0 location? The power wiring is longer than C1.
So I added a second, shorter ground wire to C0. Now... it seems to be behaving itself.
Still gotta do more testing. A bunch more. But it looks like certain switching events, with the supply at full voltage, created big enough current spikes to cause disruptive ground bounce on the USB cable, leading to bus errors.
In the production item, of course, the USB will all be on the one unified board, running down the center, with a nice quiet digital ground plane under it, and the power for the drivers will be routed around the periphery, with strategic plane cuts to avoid this sort of thing (as well as ground offsets when there's a lot of current flowing).
Update: As currently wired, the problem reappears if I crank the voltage up to 15% above nominal. Action plan for Monday involves re-wiring, using heavier wire for the ground, and adding some inductance in the supply paths. The gadget is designed to operate from 60% below, to 25% above, nominal supply, and I'd really like to see the lash-up perform over the full range.
Update 2: Oho! This would also explain why it was sometimes bombing when I connected a ground lead to one of the ground test points.