<div dir="ltr">><span style="color:rgb(0,0,0);font-size:12.8000001907349px">Both E5645 and X5680 are Westmere-EP CPUs, does it occur with other</span><br style="color:rgb(0,0,0);font-size:12.8000001907349px"><span style="color:rgb(0,0,0);font-size:12.8000001907349px">>microarchitectures as well? </span><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px"><br></span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px">Hi Henrik,</span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px"><br></span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px">I received an off-list email from another person who made the same suggestion. He had personal experience with a bug in the Nehalem micro architecture which was caused by specific sequences of instructions, including some in the SSE2 family.</span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px"><br></span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px">This matches up with what I am seeing - we have never seen this problem on Sandy Bridge, and we have only seen it when using x264 builds the use SSE2.</span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px"><br></span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px">It's difficult to know what specific bug it is, but we are testing with what I believe is the latest microcode, so Intel has chosen not to fix it.</span></div><div><span style="color:rgb(0,0,0);font-size:12.8000001907349px"><br></span></div><div><font color="#000000"><span style="font-size:12.8000001907349px">All this adds up pretty well. What is truly annoying about it is that neither or MB manufacturer or Intel have been any help whatsoever in chasing this. When you sell my motherboards, and I can generate Machine Check errors with user mode code, I feel that the onus is on you to figure out what is wrong. Our MB vendor was simply unable to do so, with or without Intel's help.</span></font></div><div><font color="#000000"><span style="font-size:12.8000001907349px"><br></span></font></div><div><span style="font-size:12.8000001907349px;color:rgb(0,0,0)">Intel produces a document called "Debugging Check Exceptions on Embedded IA Platforms". It's 17 pages long but boils down to this: try changing things until the problem goes away.</span><br></div><div><font color="#000000"><span style="font-size:12.8000001907349px"><br></span></font></div><div><font color="#000000"><span style="font-size:12.8000001907349px">In a perfect world I would expect that if I said "</span></font>STOP 0x9C" to my vendor they would immediately have a reference from Intel that describes how this can be caused by existing bugs.</div><div><br></div><div>Anyway, that's a load of complaining that is relatively off-topic. Based on hearing from someone else who had a nearly identical problem, I am going to believe that this characterizes the problem.</div><div><br></div><div><font color="#000000"><span style="font-size:12.8000001907349px">It's possible we could fix this by modding x264, but there are two big issues there. One, we don't actually know what code sequence is breaking things - the crashes are not conveniently pointing to x264 code. Second, we are using this as a third party library in our product, and it would be difficult to devote someone to becoming adept at x264 internals for the sake of fixing this. So instead we work around it by just turning off SSE2.</span></font></div><div><font color="#000000"><span style="font-size:12.8000001907349px"><br></span></font></div><div>One final note. It took a long time to pin down x264 as the source of the problem. One reason was that, despite that fact that we have NEVER seen the problem on a system that wasn't encoding using x264, the machine check did not occur in the x264 code. A typical stack dump is shown below. It's almost as though hitting this defect required one core to be encoding while another was calculating MD5s. </div><div><br></div><div><p class="MsoNormal"><span style="color:rgb(31,73,125)">STACK_TEXT: </span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af248 0000ffff</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af24c 00009200</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af250 97908887</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af254 e00398b8</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af258 0000ffff</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af25c 00009200</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af260 3000ffff</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af264 f6409331
Fips!TransformMD5+0x281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af268 3000ffff</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af26c f6409331
Fips!TransformMD5+0x281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af270 3000ffff</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af274 f6409331
Fips!TransformMD5+0x281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af278 e003f120</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af27c 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af280 e003f128</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af284 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af288 e003f130</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af28c 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af290 e003f138</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af294 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af298 e003f140</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af29c 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2a0 e003f148</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2a4 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2a8 e003f150</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2ac 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2b0 e003f158</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2b4 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2b8 e003f160</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2bc 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2c0 e003f168</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f65af2c4 00000000</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)"> </span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">FOLLOWUP_IP: </span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Fips!TransformMD5+281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">f6409331 8bc2 mov
eax,edx</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">SYMBOL_NAME:
Fips!TransformMD5+281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">FOLLOWUP_NAME:
MachineOwner</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">MODULE_NAME: Fips</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">IMAGE_NAME:
Fips.SYS</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">DEBUG_FLR_IMAGE_TIMESTAMP:
480251f7</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">FAILURE_BUCKET_ID:
0x9C_GenuineIntel_Fips!TransformMD5+281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">BUCKET_ID:
0x9C_GenuineIntel_Fips!TransformMD5+281</span><span style="color:black"></span></p>
<p class="MsoNormal"><span style="color:rgb(31,73,125)">Followup: MachineOwner</span></p></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature"><span style="border-collapse:collapse"><p style="margin:0px"><font face="arial, sans-serif">------------------------------------------------------------------------------</font></p><p style="margin:0px"><font face="arial, sans-serif">Mark Nelson – <a href="mailto:markn@ieee.org" target="_blank">markn@ieee.org</a> - <a href="http://marknelson.us" target="_blank">http://marknelson.us</a></font></p></span></div></div>
<br><div class="gmail_quote">On Wed, Mar 11, 2015 at 5:54 PM, Henrik Gramner <span dir="ltr"><<a href="mailto:henrik@gramner.com" target="_blank">henrik@gramner.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Tue, Mar 10, 2015 at 7:45 PM, Mark Nelson <<a href="mailto:markn@ieee.org">markn@ieee.org</a>> wrote:<br>
> Using recent videolan builds of the x264 windows command line executable,<br>
> (x264-r2491-24e4fed.exe), I have some hardware that experiences BSOD errors<br>
> due to Machine Check 9C. This is seen when using the the default auto-detect<br>
> CPU flags.<br>
><br>
> The BSODs are very rare. On a machine that is using close to 100% of its<br>
> cycles on encoding, the average rate of failure is perhaps 1/week.<br>
><br>
> The error has been seen on Xeon E5645 @ 2.4 GHz CPUs running XP, and on Xeon<br>
> X5680 @3.33 GHz CPUs running Server 2008 R2.The crash is not associated with<br>
> specific machines, it seems to occur on any machine of a specific model and<br>
> CPU type.<br>
><br>
> On both types of system, running the encoders with --asm 0x1400EE<br>
> eliminates the problem - thousands and thousands of hours with no crashes.<br>
><br>
> Getting to the bottom of Machine Check errors on Intel CPUs seems very<br>
> problematic. It doesn't seem like our MB manufacturer or Intel has a good<br>
> way to actually catch this in the act and and explain why it happens. All<br>
> the advice for fixing this error is along the lines of eliminating possible<br>
> problems, mostly by pointing fingers at things that can go bad on the MB,<br>
> faulty memory, bad BIOS settings etc.<br>
><br>
> All of that is fine, but these same machines never experience that BSOD<br>
> error when running other types of software at the same high rates - close to<br>
> 100% CPU utilization. There is something about the default CPU options being<br>
> selected by x264 that is causing the unique event:<br>
><br>
> x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2<br>
><br>
><br>
> I realize it is *way* outside the scope of this mailer to debug CPU, MB, and<br>
> chipset defects, but it would be interesting to know if anyone has ever seen<br>
> this, either in the context of x264 or elsewhere.<br>
><br>
> I don't think there is any way a Machine Check 9C can be generated by user<br>
> mode code, so I have all along been working on the theory that this is a<br>
> result of either a hardware defect or configuration error. To no avail.<br>
><br>
><br>
> ------------------------------------------------------------------------------<br>
><br>
> Mark Nelson – <a href="mailto:markn@ieee.org">markn@ieee.org</a> - <a href="http://marknelson.us" target="_blank">http://marknelson.us</a><br>
><br>
><br>
</div></div><span class="">> _______________________________________________<br>
> x264-devel mailing list<br>
> <a href="mailto:x264-devel@videolan.org">x264-devel@videolan.org</a><br>
> <a href="https://mailman.videolan.org/listinfo/x264-devel" target="_blank">https://mailman.videolan.org/listinfo/x264-devel</a><br>
><br>
<br>
</span>That indeed sounds like a hardware issue since a user space<br>
application shouldn't be able to cause a BSOD.<br>
<br>
Both E5645 and X5680 are Westmere-EP CPUs, does it occur with other<br>
microarchitectures as well? If not it could possibly be a CPU bug<br>
(those exist in a much larger number than you'd expect), see<br>
<a href="http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5600-specification-update.pdf" target="_blank">http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5600-specification-update.pdf</a><br>
for an errata summary of the 5600 series.<br>
<span class=""><font color="#888888"><br>
<br>
Henrik<br>
</font></span><div class=""><div class="h5">_______________________________________________<br>
x264-devel mailing list<br>
<a href="mailto:x264-devel@videolan.org">x264-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x264-devel" target="_blank">https://mailman.videolan.org/listinfo/x264-devel</a><br>
</div></div></blockquote></div><br></div></div>