Amd x87 patch

Message boards : Number crunching : Amd x87 patch
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Bogie
Volunteer tester
Avatar

Send message
Joined: 21 Dec 06
Posts: 84
Credit: 75,755,114
RAC: 52
United States
Message 1415287 - Posted: 13 Sep 2013, 20:42:31 UTC

Saw this at the overclockers forum http://www.overclockers.com/forums/showthread.php?t=737243, would this patch help? with Fx cpu's and Phenom, cpu's ty
ID: 1415287 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1415374 - Posted: 14 Sep 2013, 0:33:47 UTC

X87 is mostly deprecated. The apps all use some version of SSE. Maybe there is some x87 code for secondary processing but there can't be much to gain from improving it.
ID: 1415374 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1415425 - Posted: 14 Sep 2013, 3:17:38 UTC

Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too.

FWIW, I think it would definitely improve the Whetstone benchmark in 32 bit BOINC builds.

OTOH, nobody knows why AMD has deliberately chosen to operate X87 in Bulldozer and its descendants at less than max capability. My hunch is they found a design glitch which they have not yet been able to fix, if so using The Stilt patch may cause bad results under some specific unknown conditions. The patch is not applicable to earlier AMD CPUs.
                                                                   Joe
ID: 1415425 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1415902 - Posted: 15 Sep 2013, 11:02:40 UTC - in response to Message 1415425.  
Last modified: 15 Sep 2013, 11:10:29 UTC

Would described be applicable to Trinity APU ?

One another possible reason for any limitations of performance besides wrong design is wrong thermal design. In other words CPU could become just too hot when operates in full speed. Their deliberate core downclocking in Trinity APUs (and not only there) is very in the same line...

EDIT: found answer on own question in article:

Parts affected: AMD Barracuda (Zambesi, Vishera), AMD Comal (Trinity, Richland), AMD Virgo (Trinity, Richland)

Also: egative effects: TBD, none found yet. The performance in non x87 applications remains the same or improves very slightly. No instability, increased power consumption

Well... worth to test at least!
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1415902 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1415997 - Posted: 15 Sep 2013, 15:42:50 UTC - in response to Message 1415902.  

Would described be applicable to Trinity APU ?

One another possible reason for any limitations of performance besides wrong design is wrong thermal design. In other words CPU could become just too hot when operates in full speed. Their deliberate core downclocking in Trinity APUs (and not only there) is very in the same line...

EDIT: found answer on own question in article:

Parts affected: AMD Barracuda (Zambesi, Vishera), AMD Comal (Trinity, Richland), AMD Virgo (Trinity, Richland)

Also: egative effects: TBD, none found yet. The performance in non x87 applications remains the same or improves very slightly. No instability, increased power consumption

Well... worth to test at least!


I read that as No increased power consumption.

ID: 1415997 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1416130 - Posted: 15 Sep 2013, 20:38:16 UTC - in response to Message 1415997.  

yes, that means no errors (against Joe's assumption) and no increase in heating (against my own assumption). So, definitely worth to try. Will try when will have more time. Maybe tomorrow.
Will post bench results here then.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1416130 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1416345 - Posted: 16 Sep 2013, 12:09:36 UTC
Last modified: 16 Sep 2013, 12:10:23 UTC

Here is promised benchmark of this patch.

For benchmarking I chose ATi AP (it's APU after all) running heavy blanked test task.

After reboot, no patch applied:

WU : sigind_v5.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose :
Elapsed 289.990 secs
CPU 125.628 secs

Patch applied:

WU : sigind_v5.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose :
Elapsed 287.664 secs
CPU 126.813 secs

Patch disabled:

WU : sigind_v5.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose :
Elapsed 289.055 secs
CPU 127.359 secs

Enabled again:

WU : sigind_v5.wu
AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -verbose :
Elapsed 287.648 secs
CPU 126.985 secs

Pity. No significant difference :/
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1416345 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1416361 - Posted: 16 Sep 2013, 13:20:38 UTC - in response to Message 1416345.  

I was under the impression that when this issue was discovered, AMD quickly worked with Microsoft to include a patch for their CPUs to help the Windows task scheduler deal with the AMD's architectural differences. If so, that may explain why you see little to no difference with the patch enabled.
ID: 1416361 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34257
Credit: 79,922,639
RAC: 80
Germany
Message 1416371 - Posted: 16 Sep 2013, 13:47:44 UTC - in response to Message 1416361.  

I was under the impression that when this issue was discovered, AMD quickly worked with Microsoft to include a patch for their CPUs to help the Windows task scheduler deal with the AMD's architectural differences. If so, that may explain why you see little to no difference with the patch enabled.


Yes, i remember it the same way.



With each crime and every kindness we birth our future.
ID: 1416371 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1416381 - Posted: 16 Sep 2013, 14:18:58 UTC
Last modified: 16 Sep 2013, 14:19:25 UTC

Could Windows Server 2008 SP2 include this fix ?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1416381 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1416392 - Posted: 16 Sep 2013, 14:52:20 UTC - in response to Message 1416381.  

I'd doubt it. The release date for Server 2008 SP2 was July 22nd, 2009. I think the AMD patch was a hot-fix. Not sure which one it would be. I'd have to do some research on it.
ID: 1416392 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1416400 - Posted: 16 Sep 2013, 15:06:54 UTC
Last modified: 16 Sep 2013, 15:07:58 UTC

Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes).
ID: 1416400 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1416411 - Posted: 16 Sep 2013, 15:30:59 UTC - in response to Message 1416400.  
Last modified: 16 Sep 2013, 15:35:01 UTC

Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes).

That was a different issue, the OS scheduling of threads between cores and modules. The x87 patch gets right in and fiddles with the microcode of the CPU itself. I never heard about AMD patching that, it would be in the form of a BIOS update.

I suppose they could have issued such changes directly to OEMs without announcing it but that would be unlikely.

Edit: also, in his comments, The Stilt said that it wouldn't work on Zambezi-based processors.
ID: 1416411 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1416414 - Posted: 16 Sep 2013, 15:54:10 UTC - in response to Message 1415425.  

Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too.

Joseph, some compilers (ie gcc) can use the SSE units to process non-SIMD floating-point math. IE code that historically would have run on the X87 unit is run on the SIMD units in scalar mode.

With gcc you use the flag -mfpmath=sse. Is it definite that x87 code is generated and not scalar SIMD code?
ID: 1416414 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1416430 - Posted: 16 Sep 2013, 16:38:19 UTC - in response to Message 1416411.  
Last modified: 16 Sep 2013, 16:42:06 UTC

Here we go: the hotfix was released by Microsoft in KB2645594. This seems to have been released post Server 2008 SP2 as well as post Server 2008 R2 SP1 (and their corresponding client OSes).

That was a different issue, the OS scheduling of threads between cores and modules. The x87 patch gets right in and fiddles with the microcode of the CPU itself. I never heard about AMD patching that, it would be in the form of a BIOS update.

I suppose they could have issued such changes directly to OEMs without announcing it but that would be unlikely.


Hotfixes can be used as work-arounds to existing/known issues that aren't fixed in microcode. Are we certain that this isn't the same issue?

[Edit] Hmmm.. maybe you're right. I can't seem to confirm that hotfix directly fixes the x87 issue.
ID: 1416430 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1416449 - Posted: 16 Sep 2013, 17:09:55 UTC - in response to Message 1416414.  
Last modified: 16 Sep 2013, 17:16:42 UTC

Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too.

Joseph, some compilers (ie gcc) can use the SSE units to process non-SIMD floating-point math. IE code that historically would have run on the X87 unit is run on the SIMD units in scalar mode.

With gcc you use the flag -mfpmath=sse. Is it definite that x87 code is generated and not scalar SIMD code?


There is one main problem with direct mapping of x87 fpu to Scalar SIMD. x87 uses 80 bit intermediates while SSE onward observe the IEEE-754 standards at reduced precision. This means without a lot of attention to choices of algorithms the results can turn out quite different. For a simple example finding the mean of as few as 4096 or more floats can be different in the 2nd-3rd decimal places, amplifying problems with the use of absolute thresholds for reporting & validation. (no hysteresis)

V7 multibeam received some attention to bring this cross platform difference in linebetween GPUs, CPUs, and makes direct AMD64/Intel64(no x87 allowed) builds feasible from the stock codebase (not workable under V6). It's also meant that near direct ports should be moreportable to other devices without too much variation (e.g. recent ARM/Android)

To my knowledge AP hasn't received this attention yet, but looking at the high ratio of AP tasks to WUs waiting to purge, it would need that attention to bring the platforms closer numerically.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1416449 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1416461 - Posted: 16 Sep 2013, 17:29:59 UTC - in response to Message 1416449.  

Eric Mcintosh of the LHC at Home Classic project has been working on issues of numeric reproducibility.

He has recently posted his first notes online: CV and Notes on Floating-Point. x87 is covered in section 3.1.1 of the final piece (The pitfalls of verifying floating-point computation, David Monniaux 2008). I haven't checked the others.
ID: 1416461 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1416464 - Posted: 16 Sep 2013, 17:36:12 UTC - in response to Message 1416414.  

Where practical, SSE or better is used for inner loop calculations of course. However, there are significant sections of 32 bit builds for both SaH v7 and AP v6 which do use x87 code. The patch might make a noticeable speedup, and one of the uses is in the generation of data to implement blanking in AP so it might help the OpenCL AP apps too.

Joseph, some compilers (ie gcc) can use the SSE units to process non-SIMD floating-point math. IE code that historically would have run on the X87 unit is run on the SIMD units in scalar mode.

With gcc you use the flag -mfpmath=sse. Is it definite that x87 code is generated and not scalar SIMD code?

I had experimented with that GCC option more than a year ago and decided to stick with the default usage of x87 for 32 bit builds, so all the AKv8c builds do use x87. But I agree it's time to recheck by making and testing some builds with -mfpmath=sse. I have a Trinity A10-4600M laptop which will be suitable for testing.
                                                                   Josef
ID: 1416464 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1416493 - Posted: 16 Sep 2013, 18:21:45 UTC - in response to Message 1416461.  

Eric Mcintosh of the LHC at Home Classic project has been working on issues of numeric reproducibility.

He has recently posted his first notes online: CV and Notes on Floating-Point. x87 is covered in section 3.1.1 of the final piece (The pitfalls of verifying floating-point computation, David Monniaux 2008). I haven't checked the others.



Very handy thanks! There are amazingly few solid references in this particular area of Computer Science.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1416493 · Report as offensive

Message boards : Number crunching : Amd x87 patch


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.