Message boards :
Number crunching :
Amd x87 patch
Message board moderation
Author | Message |
---|---|
Bogie Send message Joined: 21 Dec 06 Posts: 84 Credit: 75,755,114 RAC: 52 |
Saw this at the overclockers forum http://www.overclockers.com/forums/showthread.php?t=737243, would this patch help? with Fx cpu's and Phenom, cpu's ty |
cov_route Send message Joined: 13 Sep 12 Posts: 342 Credit: 10,270,618 RAC: 0 |
X87 is mostly deprecated. The apps all use some version of SSE. Maybe there is some x87 code for secondary processing but there can't be much to gain from improving it. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too. FWIW, I think it would definitely improve the Whetstone benchmark in 32 bit BOINC builds. OTOH, nobody knows why AMD has deliberately chosen to operate X87 in Bulldozer and its descendants at less than max capability. My hunch is they found a design glitch which they have not yet been able to fix, if so using The Stilt patch may cause bad results under some specific unknown conditions. The patch is not applicable to earlier AMD CPUs. Joe |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Would described be applicable to Trinity APU ? One another possible reason for any limitations of performance besides wrong design is wrong thermal design. In other words CPU could become just too hot when operates in full speed. Their deliberate core downclocking in Trinity APUs (and not only there) is very in the same line... EDIT: found answer on own question in article: Parts affected: AMD Barracuda (Zambesi, Vishera), AMD Comal (Trinity, Richland), AMD Virgo (Trinity, Richland) Also: egative effects: TBD, none found yet. The performance in non x87 applications remains the same or improves very slightly. No instability, increased power consumption Well... worth to test at least! SETI apps news We're not gonna fight them. We're gonna transcend them. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Would described be applicable to Trinity APU ? I read that as No increased power consumption. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
yes, that means no errors (against Joe's assumption) and no increase in heating (against my own assumption). So, definitely worth to try. Will try when will have more time. Maybe tomorrow. Will post bench results here then. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is promised benchmark of this patch. For benchmarking I chose ATi AP (it's APU after all) running heavy blanked test task. After reboot, no patch applied: WU : sigind_v5.wu AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose : Elapsed 289.990 secs CPU 125.628 secs Patch applied: WU : sigind_v5.wu AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose : Elapsed 287.664 secs CPU 126.813 secs Patch disabled: WU : sigind_v5.wu AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose : Elapsed 289.055 secs CPU 127.359 secs Enabled again: WU : sigind_v5.wu AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose : Elapsed 287.648 secs CPU 126.985 secs Pity. No significant difference :/ SETI apps news We're not gonna fight them. We're gonna transcend them. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I was under the impression that when this issue was discovered, AMD quickly worked with Microsoft to include a patch for their CPUs to help the Windows task scheduler deal with the AMD's architectural differences. If so, that may explain why you see little to no difference with the patch enabled. |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
I was under the impression that when this issue was discovered, AMD quickly worked with Microsoft to include a patch for their CPUs to help the Windows task scheduler deal with the AMD's architectural differences. If so, that may explain why you see little to no difference with the patch enabled. Yes, i remember it the same way. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Could Windows Server 2008 SP2 include this fix ? SETI apps news We're not gonna fight them. We're gonna transcend them. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
I'd doubt it. The release date for Server 2008 SP2 was July 22nd, 2009. I think the AMD patch was a hot-fix. Not sure which one it would be. I'd have to do some research on it. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes). |
cov_route Send message Joined: 13 Sep 12 Posts: 342 Credit: 10,270,618 RAC: 0 |
Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes). That was a different issue, the OS scheduling of threads between cores and modules. The x87 patch gets right in and fiddles with the microcode of the CPU itself. I never heard about AMD patching that, it would be in the form of a BIOS update. I suppose they could have issued such changes directly to OEMs without announcing it but that would be unlikely. Edit: also, in his comments, The Stilt said that it wouldn't work on Zambezi-based processors. |
cov_route Send message Joined: 13 Sep 12 Posts: 342 Credit: 10,270,618 RAC: 0 |
Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too. Joseph, some compilers (ie gcc) can use the SSE units to process non-SIMD floating-point math. IE code that historically would have run on the X87 unit is run on the SIMD units in scalar mode. With gcc you use the flag -mfpmath=sse. Is it definite that x87 code is generated and not scalar SIMD code? |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes). Hotfixes can be used as work-arounds to existing/known issues that aren't fixed in microcode. Are we certain that this isn't the same issue? [Edit] Hmmm.. maybe you're right. I can't seem to confirm that hotfix directly fixes the x87 issue. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too. There is one main problem with direct mapping of x87 fpu to Scalar SIMD. x87 uses 80 bit intermediates while SSE onward observe the IEEE-754 standards at reduced precision. This means without a lot of attention to choices of algorithms the results can turn out quite different. For a simple example finding the mean of as few as 4096 or more floats can be different in the 2nd-3rd decimal places, amplifying problems with the use of absolute thresholds for reporting & validation. (no hysteresis) V7 multibeam received some attention to bring this cross platform difference in linebetween GPUs, CPUs, and makes direct AMD64/Intel64(no x87 allowed) builds feasible from the stock codebase (not workable under V6). It's also meant that near direct ports should be moreportable to other devices without too much variation (e.g. recent ARM/Android) To my knowledge AP hasn't received this attention yet, but looking at the high ratio of AP tasks to WUs waiting to purge, it would need that attention to bring the platforms closer numerically. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Eric Mcintosh of the LHC at Home Classic project has been working on issues of numeric reproducibility. He has recently posted his first notes online: CV and Notes on Floating-Point. x87 is covered in section 3.1.1 of the final piece (The pitfalls of verifying floating-point computation, David Monniaux 2008). I haven't checked the others. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too. I had experimented with that GCC option more than a year ago and decided to stick with the default usage of x87 for 32 bit builds, so all the AKv8c builds do use x87. But I agree it's time to recheck by making and testing some builds with -mfpmath=sse. I have a Trinity A10-4600M laptop which will be suitable for testing. Josef |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Eric Mcintosh of the LHC at Home Classic project has been working on issues of numeric reproducibility. Very handy thanks! There are amazingly few solid references in this particular area of Computer Science. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.