SSE2 or SSE3?

Questions and Answers : Windows : SSE2 or SSE3?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 747092 - Posted: 2 May 2008, 19:32:05 UTC

When Boinc starts up it identifies my processor as being SSE2. Here are the first few lines of the messages:

5/2/2008 3:15:00 PM||Starting BOINC client version 5.10.45 for windows_intelx86
5/2/2008 3:15:00 PM||log flags: task, file_xfer, sched_ops
5/2/2008 3:15:00 PM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
5/2/2008 3:15:00 PM||Executing as a daemon
5/2/2008 3:15:00 PM||Data directory: C:\\Program Files\\BOINC
5/2/2008 3:15:00 PM||BOINC is running as a service and as a non-system user.
5/2/2008 3:15:00 PM||No application graphics will be available.
5/2/2008 3:15:00 PM|SETI@home|Found app_info.xml; using anonymous platform
5/2/2008 3:15:00 PM||Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3500+ [x86 Family 15 Model 47 Stepping 2]
5/2/2008 3:15:00 PM||Processor features: fpu tsc pae nx sse sse2 3dnow mmx
5/2/2008 3:15:00 PM||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00)
5/2/2008 3:15:00 PM||Memory: 2.00 GB physical, 3.85 GB virtual
5/2/2008 3:15:00 PM||Disk: 111.78 GB total, 70.64 GB free
5/2/2008 3:15:00 PM||Local time is UTC -4 hours

CPUZ identifies the processor as being SSE3 capable. The optimized application I have been using up to now is only SSE2 and I would like to use the AK application which has no version for SSE2.

Who do I believe? BOINC manager or CPUZ? I looked it up in google and I read that this CPU (Toledo) is SSE3 capable.

ID: 747092 · Report as offensive
Profile the silver surfer
Avatar

Send message
Joined: 24 Feb 01
Posts: 131
Credit: 3,739,307
RAC: 0
Austria
Message 747135 - Posted: 2 May 2008, 20:38:32 UTC

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt

ID: 747135 · Report as offensive
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 747175 - Posted: 2 May 2008, 22:11:52 UTC - in response to Message 747135.  
Last modified: 2 May 2008, 22:12:52 UTC

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt


Not a problem your joining in. It is an open forum after all.

I may have found the answer to my problem at the AMD site. This is what is says about the Athlon processors:

# Sixteen 128-bit SSE/SSE2/SSE3 registers
# AMD Digital Media XPressâ„¢ provides support for SSE, SSE2, SSE3 and MMX instructions

So I will assume the chip is SSE3 and try the new application.
ID: 747175 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 747219 - Posted: 3 May 2008, 1:08:07 UTC - in response to Message 747175.  

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt


Not a problem your joining in. It is an open forum after all.

I may have found the answer to my problem at the AMD site. This is what is says about the Athlon processors:

# Sixteen 128-bit SSE/SSE2/SSE3 registers
# AMD Digital Media XPressâ„¢ provides support for SSE, SSE2, SSE3 and MMX instructions

So I will assume the chip is SSE3 and try the new application.

Your operating system also has to know about the extension set. The OS is in charge of doing context switches (when one application stops and another starts on the CPU - can happen several times per second). During a context switch, ALL of the registers must be saved so that the old application can gets its context back when it is switched back onto the CPU.

For example. Application 1 is in the middle of a calculation and leaves the value pi (3.14159...) in a register. The OS does not know about this register. Application 2 comes along and sticks the value sqrt(2) in the register (1.414...). Now Application 1 comes back, and it sees sqrt(2) instead of pi, and continues along assuming that the value is pi. (I hope that you can see where this is going).

BOINC asks the OS what the CPU can do. This is safe, if the OS recognizes the extension, then the OS can handle the context switches that use that registry set. CPUZ uses its own code to recognize the hardware - this is dangerous because the OS may NOT be able to do a context switch.


BOINC WIKI
ID: 747219 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 747228 - Posted: 3 May 2008, 1:19:51 UTC - in response to Message 747135.  

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt

If the OS says it is SSE3, then the OS knows how to recoginze SSE3 and do context switches on that extension. This should be safe.


BOINC WIKI
ID: 747228 · Report as offensive
Profile the silver surfer
Avatar

Send message
Joined: 24 Feb 01
Posts: 131
Credit: 3,739,307
RAC: 0
Austria
Message 747386 - Posted: 3 May 2008, 8:23:10 UTC - in response to Message 747228.  
Last modified: 3 May 2008, 8:24:46 UTC

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt

If the OS says it is SSE3, then the OS knows how to recoginze SSE3 and do context switches on that extension. This should be safe.


Thanks for your reply, John,

Call it Bad Luck, Murphys Law, Kismet or whatever - Fact is, my motherboard broke
in the very second I started Boinc. Now this is no issue regarding the new app, it
broke anyway, as the guy in the store told me this morning. Once repaired, I`ll give it another try !

Best regards,
Kurt

ID: 747386 · Report as offensive
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 747403 - Posted: 3 May 2008, 9:12:01 UTC - in response to Message 747219.  

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt


Not a problem your joining in. It is an open forum after all.

I may have found the answer to my problem at the AMD site. This is what is says about the Athlon processors:

# Sixteen 128-bit SSE/SSE2/SSE3 registers
# AMD Digital Media XPressâ„¢ provides support for SSE, SSE2, SSE3 and MMX instructions

So I will assume the chip is SSE3 and try the new application.

Your operating system also has to know about the extension set. The OS is in charge of doing context switches (when one application stops and another starts on the CPU - can happen several times per second). During a context switch, ALL of the registers must be saved so that the old application can gets its context back when it is switched back onto the CPU.

For example. Application 1 is in the middle of a calculation and leaves the value pi (3.14159...) in a register. The OS does not know about this register. Application 2 comes along and sticks the value sqrt(2) in the register (1.414...). Now Application 1 comes back, and it sees sqrt(2) instead of pi, and continues along assuming that the value is pi. (I hope that you can see where this is going).

BOINC asks the OS what the CPU can do. This is safe, if the OS recognizes the extension, then the OS can handle the context switches that use that registry set. CPUZ uses its own code to recognize the hardware - this is dangerous because the OS may NOT be able to do a context switch.


Thanks for the information. I understand the explanation (BS in Computer Science here) and will do some more checking before jumping in. There has to be a utility out thee somewhere that will check it out.

Best regards

Adalberto
ID: 747403 · Report as offensive
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 755135 - Posted: 18 May 2008, 15:43:50 UTC - in response to Message 747219.  

I decided to give my "old" P4 "Prescott", 3.8Ghz, Hyperthreading, Intel, a try on the new app.
CPUZ says SSE 3 capable, so does the Boinc manager, but the new app knocked him out in seconds !
Is my little baby really too old for V8 ?

edit to Bert: Please excuse me joining your thread, but almost the same problem !

Happy crunchin`
Kurt


Not a problem your joining in. It is an open forum after all.

I may have found the answer to my problem at the AMD site. This is what is says about the Athlon processors:

# Sixteen 128-bit SSE/SSE2/SSE3 registers
# AMD Digital Media XPressâ„¢ provides support for SSE, SSE2, SSE3 and MMX instructions

So I will assume the chip is SSE3 and try the new application.

Your operating system also has to know about the extension set. The OS is in charge of doing context switches (when one application stops and another starts on the CPU - can happen several times per second). During a context switch, ALL of the registers must be saved so that the old application can gets its context back when it is switched back onto the CPU.

For example. Application 1 is in the middle of a calculation and leaves the value pi (3.14159...) in a register. The OS does not know about this register. Application 2 comes along and sticks the value sqrt(2) in the register (1.414...). Now Application 1 comes back, and it sees sqrt(2) instead of pi, and continues along assuming that the value is pi. (I hope that you can see where this is going).

BOINC asks the OS what the CPU can do. This is safe, if the OS recognizes the extension, then the OS can handle the context switches that use that registry set. CPUZ uses its own code to recognize the hardware - this is dangerous because the OS may NOT be able to do a context switch.


I finally tried the AK application. Although Boinc still shows SSE2 found and CPU-Z still shows the processor to be SSE3 capable I guess running the opti application is the final answer.

Good news, the AK application has been running for 3 days and there is a very definite decrease on crunch time and no errors in any of the WUs crunched. They are also verifying just fine.

I still don't see much of an increase in RAC but that takes time to stabilize.

ID: 755135 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 755152 - Posted: 18 May 2008, 16:05:53 UTC - in response to Message 755135.  
Last modified: 18 May 2008, 16:06:16 UTC

I finally tried the AK application. Although Boinc still shows SSE2 found and CPU-Z still shows the processor to be SSE3 capable I guess running the opti application is the final answer.

Good news, the AK application has been running for 3 days and there is a very definite decrease on crunch time and no errors in any of the WUs crunched. They are also verifying just fine.

I still don't see much of an increase in RAC but that takes time to stabilize.


The following is taken from one of your completed work units.


Windows optimized S@H Enhanced application by Alex Kan
Version info: SSE3 (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE3 Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

CPUID: AMD Athlon(tm) 64 Processor 3500+
Speed: 1 x 2200 MHz
Cache: L1=64K L2=512K
Features: MMX SSE SSE2 SSE3


The line with "Version info" indicates that you are using SSE3, AKV8 so that is correct.
The "Features" line indicates that your cpu is capable of SSE3.

Looks like you got the right version of AKV8 and are using it correctly.
ID: 755152 · Report as offensive
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 755654 - Posted: 19 May 2008, 17:38:32 UTC - in response to Message 755152.  

I finally tried the AK application. Although Boinc still shows SSE2 found and CPU-Z still shows the processor to be SSE3 capable I guess running the opti application is the final answer.

Good news, the AK application has been running for 3 days and there is a very definite decrease on crunch time and no errors in any of the WUs crunched. They are also verifying just fine.

I still don't see much of an increase in RAC but that takes time to stabilize.


The following is taken from one of your completed work units.


Windows optimized S@H Enhanced application by Alex Kan
Version info: SSE3 (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE3 Win32 Build 41 , Ported by : Jason G, Raistmer, JDWhale

CPUID: AMD Athlon(tm) 64 Processor 3500+
Speed: 1 x 2200 MHz
Cache: L1=64K L2=512K
Features: MMX SSE SSE2 SSE3


The line with "Version info" indicates that you are using SSE3, AKV8 so that is correct.
The "Features" line indicates that your cpu is capable of SSE3.

Looks like you got the right version of AKV8 and are using it correctly.


Yes it looks that way. The problem was that although cpuz shows the processor to be SSE3 capable, Boinc reports it is SSE2 only. Here are a few lines of the messages when Boinc starts up:

5/17/2008 2:41:52 PM|SETI@home|Found app_info.xml; using anonymous platform
5/17/2008 2:41:53 PM||Processor: 1 AuthenticAMD AMD Athlon(tm) 64 Processor 3500+ [x86 Family 15 Model 47 Stepping 2]
5/17/2008 2:41:53 PM||Processor features: fpu tsc pae nx sse sse2 3dnow mmx
5/17/2008 2:41:53 PM||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00)
5/17/2008 2:41:53 PM||Memory: 2.00 GB physical, 3.85 GB virtual
5/17/2008 2:41:53 PM||Disk: 111.78 GB total, 69.90 GB free

The line that mentions the processor features only shows sse2, and the difference triggered my original question.

I am getting good results with the AK/SSE3 version now. My RAC has climbed to an unprecendeted 467+. This is not too bad for a single processor, not overclocked chip. Or at lest not overclocked that I know of. ASUS boards do have a tendency to OC without the owner trying to do it or so I have read.



ID: 755654 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 755659 - Posted: 19 May 2008, 18:03:34 UTC - in response to Message 755654.  

The problem was that although cpuz shows the processor to be SSE3 capable, Boinc reports it is SSE2 only.

BOINC will only tell what options your operating system supports. It won't read them directly from the CPU or BIOS, like CPUZ does.

For instance, under Windows 2000 my P4 2.8GHz was said to use only MMX and SSE. It is SSE2 capable and I was already using the optimized Seti for that. Under Windows XP it shows the CPU is SSE2 compliant as well.

Therefore you always check with CPUZ what the capabilities of the CPU are.
ID: 755659 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 755702 - Posted: 19 May 2008, 19:40:30 UTC

One more time.

Using a processor extension that the Operating System does not support is a very bad idea.

The Operating System is responsible for saving the state of the processor extension when the Operating System decides to do a context switch. If you have two processes that are using the extension, you will get wrong answers. This is not a maybe, it is a will. I cannot state this strongly enough. Occasionally, you are just going to get wrong answers, and you will not be able to figure out why. Using a CPU extension not supported by the Operating System is harming the science.

When you are choosing an optimized application, use the ones that BOINC says the operating system supports, not what CPUZ says the chipset supports.


BOINC WIKI
ID: 755702 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755793 - Posted: 19 May 2008, 22:17:40 UTC - in response to Message 755702.  

One more time.

Using a processor extension that the Operating System does not support is a very bad idea.

The Operating System is responsible for saving the state of the processor extension when the Operating System decides to do a context switch. If you have two processes that are using the extension, you will get wrong answers. This is not a maybe, it is a will. I cannot state this strongly enough. Occasionally, you are just going to get wrong answers, and you will not be able to figure out why. Using a CPU extension not supported by the Operating System is harming the science.

When you are choosing an optimized application, use the ones that BOINC says the operating system supports, not what CPUZ says the chipset supports.

Sorry to take issue with on this but I feel I must.

I have been crunching SETI under BOINC for almost 12 months using XP Home SP2 on a machine that has progressed from an E6400 to a Q6600 to a Q9450. This is my main desktop machine on which I also do web browsing, eMail, spreadsheeting, Word, etc, etc.

Boinc tells me that it is currently capable of supporting:
19/05/2008 14:29:03||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz [x86 Family 6 Model 23 Stepping 7]
19/05/2008 14:29:03||Processor features: fpu tsc pae nx sse sse2 mmx

and I am sure that sse2 was the max for both of the previous processors too.

I have always run at least SSSE3 Optimised Apps on it, and have recently been trying both the latest SSSE3x and SSE4.1 Op Apps since the Q9450 supports SSE4.1 (decided that the SSSE3x is probably faster at the moment).

"you will get wrong answers" - This would cause validation failure so it is not "harming the science". I do not recall ever getting a validation failure that could not be explained. For the past 6 months I have also been logging all my WU's fairly rigorously, and, although I cannot guarantee that I have captured every one, there is not a single "Validate Error" in the log.

I am sure that, if there were a problem with the Op Apps producing "wrong answers" which failed to validate, it would have been raised as an issue in the Number Crunching forum since most posters there follow the recommendation of CPU-Z when choosing which Op App to load.

If your suggestion were widely followed, the speed of the SSE2 Op App compared to the SSSE3x Op Ap would mean that the project would suffer because I (and many others) could do much less work.

Could you, perhaps, publish some evidence to support your assertion that the science will be harmed?

F.
ID: 755793 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 755817 - Posted: 19 May 2008, 23:12:51 UTC - in response to Message 755793.  

One more time.

Using a processor extension that the Operating System does not support is a very bad idea.

The Operating System is responsible for saving the state of the processor extension when the Operating System decides to do a context switch. If you have two processes that are using the extension, you will get wrong answers. This is not a maybe, it is a will. I cannot state this strongly enough. Occasionally, you are just going to get wrong answers, and you will not be able to figure out why. Using a CPU extension not supported by the Operating System is harming the science.

When you are choosing an optimized application, use the ones that BOINC says the operating system supports, not what CPUZ says the chipset supports.

Sorry to take issue with on this but I feel I must.

I have been crunching SETI under BOINC for almost 12 months using XP Home SP2 on a machine that has progressed from an E6400 to a Q6600 to a Q9450. This is my main desktop machine on which I also do web browsing, eMail, spreadsheeting, Word, etc, etc.

Boinc tells me that it is currently capable of supporting:
19/05/2008 14:29:03||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad  CPU   Q9450  @ 2.66GHz [x86 Family 6 Model 23 Stepping 7]
19/05/2008 14:29:03||Processor features: fpu tsc pae nx sse sse2 mmx

and I am sure that sse2 was the max for both of the previous processors too.

I have always run at least SSSE3 Optimised Apps on it, and have recently been trying both the latest SSSE3x and SSE4.1 Op Apps since the Q9450 supports SSE4.1 (decided that the SSSE3x is probably faster at the moment).

"you will get wrong answers" - This would cause validation failure so it is not "harming the science". I do not recall ever getting a validation failure that could not be explained. For the past 6 months I have also been logging all my WU's fairly rigorously, and, although I cannot guarantee that I have captured every one, there is not a single "Validate Error" in the log.

I am sure that, if there were a problem with the Op Apps producing "wrong answers" which failed to validate, it would have been raised as an issue in the Number Crunching forum since most posters there follow the recommendation of CPU-Z when choosing which Op App to load.

If your suggestion were widely followed, the speed of the SSE2 Op App compared to the SSSE3x Op Ap would mean that the project would suffer because I (and many others) could do much less work.

Could you, perhaps, publish some evidence to support your assertion that the science will be harmed?

F.

If you never ever, have another application that uses the extension, and BOINC never starts a second S@H task that uses the unsupported extension while the first is running, you are correct. It will also only affect the one calculation set where the context switch occurs. Possibly only one peak or triplet or such. I don't know the coded quite well enough to know how large the effect is.

Something you have to ask yourself: Is faster, less reliable calculation preferred over slower more reliable?

The way a context switch works:

At the end of a timeslice as determined by the OS:

Step 1: Save all registers, including the processor extensions, from the CPU to RAM for the process that the OS is removing from the CPU. Associate this RAM with the task that is being swapped out.

Step 2: Copy all registers, including processor extensions, from RAM to the CPU for the task that is about to run.

If the OS does not know about a register, then the OS does not know to save that register to RAM. If it is not saved to RAM, then it cannot be counted on to have the same value from one OP in the code to the next.

Note that the OS can do a context switch at any moment. The only thing that is atomic (cannot be broken up) is a single assembly language OP. There are ways to make certain segments of code appear to be atomic with respect to other known programs that share the same variable space.

I am going to reiterate that running an extension that is not supported by the OS is unsafe. Telling people to use extensions that are not supported by the OS is bad advice. Again, I don't know what the range of the error is going to be, but there will be errors introduced if there are two programs that use the unsupported extension.

If you want to use a CPU extension, get an OS that knows what it is, and how to save it and reconstitute it.


BOINC WIKI
ID: 755817 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 755828 - Posted: 19 May 2008, 23:31:22 UTC - in response to Message 755817.  

If you want to use a CPU extension, get an OS that knows what it is, and how to save it and reconstitute it.

Meaning for SSE3 and SSSE3, Windows XP 32bit SP2.
Meaning for SSE4, Windows XP Professional x64 Edition and 64-bit versions of Vista.

Going to be costly. ;)
ID: 755828 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 755840 - Posted: 19 May 2008, 23:56:54 UTC - in response to Message 755828.  

If you want to use a CPU extension, get an OS that knows what it is, and how to save it and reconstitute it.

Meaning for SSE3 and SSSE3, Windows XP 32bit SP2.
Meaning for SSE4, Windows XP Professional x64 Edition and 64-bit versions of Vista.

Going to be costly. ;)

But I AM using XP (Home) 32bit SP2 and Boinc says that supports only SSE2!!

F.
ID: 755840 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 755842 - Posted: 20 May 2008, 0:06:56 UTC - in response to Message 755817.  

At the end of a timeslice as determined by the OS:

Step 1: Save all registers, including the processor extensions, from the CPU to RAM for the process that the OS is removing from the CPU. Associate this RAM with the task that is being swapped out.

Step 2: Copy all registers, including processor extensions, from RAM to the CPU for the task that is about to run.

That is obviously exactly what needs to happen.

But you are making a big leap of intuition if you assume that an OS, Windows for example, achieves that total save/total restore of the registers by explicitly saving/restoring each one separately.

Research in Number Crunching has turned up "the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once." - sourced to Wikipedia, so treat with the usual bargepole, but it changes the landscape quite a bit. If the OS uses a "save everything you know about" instruction, and the CPU knows about its own registers, then the problem goes away.

And another source from early 1998 says "Indeed, it's expected the instructions [FXSAVE and FXRSTOR] will be exploited by Windows 98 and Windows NT 5.0, two upcoming offerings from Microsoft Corp."

Which squares the circle, at least as far as Windows is concerned.
ID: 755842 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 755843 - Posted: 20 May 2008, 0:09:53 UTC - in response to Message 755840.  
Last modified: 20 May 2008, 0:10:33 UTC

I used Pentium’s End: Intel Core Exposed for that bit of knowledge.

"EM64T/NX/SSE4. This is Intel’s implementation of the AMD64 64-bit extensions for x86 processors, which makes it possible to run 64-bit software (such as Windows XP Professional x64 Edition and 64-bit versions of Vista). In theory, 64-bitness lets the system work with much bigger sets of memory at once, process data in bigger chunks, and run better in general."

And PERFORMANCE ANALYSIS OF INTEL CORE 2 DUO PROCESSOR (PDF file) for the other bit of information:

Page 12: "Other important features involve support for new SIMD instructions called Supplemental Streaming SIMD Extension 3, coupled with better power saving technologies."
Page 13: "All experiments were run on systems with 32 bit Windows XP SP2 operating system and Intel Core 2 Duo processors."

So don't look at me. ;-)
ID: 755843 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 755857 - Posted: 20 May 2008, 0:57:55 UTC - in response to Message 755843.  

I used Pentium’s End: Intel Core Exposed for that bit of knowledge.

"EM64T/NX/SSE4. This is Intel’s implementation of the AMD64 64-bit extensions for x86 processors, which makes it possible to run 64-bit software (such as Windows XP Professional x64 Edition and 64-bit versions of Vista). In theory, 64-bitness lets the system work with much bigger sets of memory at once, process data in bigger chunks, and run better in general."

And PERFORMANCE ANALYSIS OF INTEL CORE 2 DUO PROCESSOR (PDF file) for the other bit of information:

Page 12: "Other important features involve support for new SIMD instructions called Supplemental Streaming SIMD Extension 3, coupled with better power saving technologies."
Page 13: "All experiments were run on systems with 32 bit Windows XP SP2 operating system and Intel Core 2 Duo processors."

So don't look at me. ;-)


If you have one process that is tied to one CPU, then you can get away with using unsupported extensions as there will not be anything overwriting the information you need. The problem arises when there are two processes using the same unsupported extension - then the processes overwrite each others data because the OS does not know enough to move it around so the processes have apparently uninterupted access to the registers.



BOINC WIKI
ID: 755857 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 755869 - Posted: 20 May 2008, 2:10:01 UTC - in response to Message 755857.  

I have been running optimized apps for sse3+ since they came out. I have not seen one invalid result other than the one in a million that reports an extra signal when it shouldn't. So the question remains... Where are these invalid results that should be here but aren't?

~BoB


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 755869 · Report as offensive
1 · 2 · Next

Questions and Answers : Windows : SSE2 or SSE3?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.