Sunday, November 26, 2017


Every now and then my propeller beanie comes out.

Here I go again.

Last year during the week between Christmas and New Years, I had my annual lunch with two of my geekiest friends.

That was a time to be remembered.

I've already discussed one of the topics we covered.

One of the lunch mates is a long time employee of a global logistics firm that has multiple data centers, one at a high altitude, i.e. more than a mile high.

Somehow he got off onto failure rates related to altitude. He attributed these failures to neutrons.

I kid you not.

He had noticed that some equipment seemed to fail more often at the mile high data center. The vendors of the failing equipment didn't buy the idea of neutron density at altitude causing the failures.

The logistics company did a tightly controlled experiment at Memphis and at the mile high data center.

The results were convincing. Certain equipment from certain vendors failed way more often at the mile high data center.

While you may say "That doesn't apply to me. My data center is not a mile high."

Don't speak too fast.

When I worked for this global logistics firm we used to say that the problems we were encountering were going to be everybody else's problem in 5 years.

The same goes with neutrons.

Here's why: Ice Lake.

Read this from AnandTech.

A 10nm processor is coming your way and soon.

I won't miss this year's lunch for anything.

Here's a reading list on neutrons.

Cosmic rays creating energetic neutrons and protons

Cisco Blamed A Router Bug On 'Cosmic Radiation'
We did send a system to a POP in Denver (altitude 5000+ ft) and saw on this system a statistically significant increase in recoverable memory ECC errors.
When the affected board was returned to San Jose and retested (basically sea level) the errors could not be reproduced.
So we returned the hardware back to the Denver POP, and the recoverable ECC errors returned. No amount of swapping memory DIMMs (various vendors) made a difference.
Problem background
...research has shown that the majority of one-off soft errors in DRAM chips occur as a result of background radiation, chiefly neutrons from cosmic ray secondaries, which may change the contents of one or more memory cells or interfere with the circuitry used to read or write to them.[2] Hence, the error rates increase rapidly with rising altitude; for example, compared to the sea level, the rate of neutron flux is 3.5 times higher at 1.5 km and 300 times higher at 10–12 km (the cruising altitude of commercial airplanes)
How Cosmic Rays Cause Computer Downtime
Neutron intensity increases dramatically with altitude.

No comments: