Errors on Cuda Units with new server build

Message boards : Number crunching : Errors on Cuda Units with new server build
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063407 - Posted: 4 Jan 2011, 13:21:53 UTC
Last modified: 4 Jan 2011, 14:00:26 UTC

Well I retired the old server and put in some new parts that were left over from my previous main rig.

New Rig Details

However I'm having some weird errors with my cuda work units that I can't seem to figure out. It crunched fine for a few hours(better part of half a day) then it started throwing errors.

At first I thought it may have been due to my gpu overclock on one card to bring it up to standards with the other card. And it may be however these are the errors I'm receiving.

- exit code -6 (0xfffffffa)
Incorrect function. (0x1) - exit code 1 (0x1)
1 (0x1) "Freaky power spectrum"
etc.

I've got a mix of these for 7 work units. Does anyone have a clue on this one? I've had the power spectrum ones in the past on any machine I've ran and figure it's just a weird deal between the lunatics apps and something in the WU. Who know. The other errors I figured may be heat related due to the over clock, and two of them happened at the exact same time when the BSOD happened. Now I'm beginning to wonder if it's a power supply issue running two video cards.

Going by the calculations over the internet and by looking up load numbers on all the parts I would need about 471 Watts to power all my parts, and I'm only using a 580 Watt power supply. What you guys think, maybe under excessive load it's just too much? Or do I have another under lying issue here?

Brand new install of Server 2008 R2, Good ram, and drives, good video cards, and drivers from what I can tell. Looking to validate what I'm thinking because I figured the psu in this thing wasn't going to be enough. It has dual 36A 12v rails.
Traveling through space at ~67,000mph!
ID: 1063407 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1063418 - Posted: 4 Jan 2011, 14:09:06 UTC - in response to Message 1063407.  
Last modified: 4 Jan 2011, 14:26:48 UTC

...- exit code -6 (0xfffffffa)
Incorrect function. (0x1) - exit code 1 (0x1)
1 (0x1) "Freaky power spectrum"
etc.


The exit code 1's usually indicate there was a driver crash at some point prior to indicated failure point, usually resulting from a long running pulsefind just before an FFT, or appearing at a memory copy. In some cases only a machine reboot can fix the driver (XP) but server 2008 ( & Vista & Win7), assuming it uses WDDM drivers, should be able to restart the driver.

Unfortunately nVidia didn't put any way to detect this condition & recover the Cuda context in the Seti code, to continue normally where possible. So that recovery/fail-safe mechanism is something that is being looked at in detail in the coming months. The original driver crashes (likely in the pulsefinds preceeding the crash indicators) could have been the kernels running long, or indeed power dropouts losing the device, heat issues etc.

I don't recall what the -6 was exactly, but probably the same thing in a different form.

The funny part is that when you mentioned the 'freaky powerspectrum' crashing, my heart skipped a beat, because I wrote those kernels... They are tough as old boots and don't do anything 'risky' that could induce a crash themselves ;) I was releived to see the example indicates 'cufftExecC2C' which is not one of mine, but the CUFFT library, and probably crashed once again from the pulse-find immediately preceding it.

If you use that card as your primary display, you may need to look at disabling the Windows TDR:

http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
http://forums.amd.com/devforum/messageview.cfm?FTVAR_FORUMVIEWTMP=Linear&catid=328&threadid=100142

HTH,
Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1063418 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063450 - Posted: 4 Jan 2011, 15:47:45 UTC

Thanks for the help Jason. It seems after posting that I took the overclock off, the machine is a mix of GTS 250 1GB and an 8800GTS 640MB and it seems that the mix of cards don't like having one OC'd and one not. Still not totally satisfied I don't have a power issue, however it's been crunching along happily ever since. The temps on the cards are fine as both are running sub 60c, and the BSOD mini dump debugged out to a driver crash. So it seems the OC didn't agree with the old 8800. I'm going to continue looking into the issue and find out for sure what's going on but it seems to have resolved itself. I'll check into the TDR as I didn't even think about it when I built this machine! Thanks again.
Traveling through space at ~67,000mph!
ID: 1063450 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063660 - Posted: 5 Jan 2011, 14:45:21 UTC

Man though this was resolved, it appears today the issue is back but now I'm getting -177 errors with the information listing "Unhandled Exception Detected...". I tried going back to crunching only one gpu and the problem is still there. Really getting irritated, especially consider my other machine never gave any issues. Going to start running a memtest to see it's the ram.
Traveling through space at ~67,000mph!
ID: 1063660 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6653
Credit: 121,090,076
RAC: 0
United States
Message 1063663 - Posted: 5 Jan 2011, 14:51:16 UTC - in response to Message 1063660.  

Man though this was resolved, it appears today the issue is back but now I'm getting -177 errors with the information listing "Unhandled Exception Detected...". I tried going back to crunching only one gpu and the problem is still there. Really getting irritated, especially consider my other machine never gave any issues. Going to start running a memtest to see it's the ram.


-177 errors can be fixed with Fred's Rescheduler tool. There is a checkbox that can fix them.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1063663 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063664 - Posted: 5 Jan 2011, 14:52:42 UTC - in response to Message 1063663.  
Last modified: 5 Jan 2011, 15:27:32 UTC

Man though this was resolved, it appears today the issue is back but now I'm getting -177 errors with the information listing "Unhandled Exception Detected...". I tried going back to crunching only one gpu and the problem is still there. Really getting irritated, especially consider my other machine never gave any issues. Going to start running a memtest to see it's the ram.


-177 errors can be fixed with Fred's Rescheduler tool. There is a checkbox that can fix them.

Steve



Yeah I've used the rescheduler for a bit now. But it isn't fixing these issues. I'm getting all kinds of errors. In the last 2 days or so I've returned 8-10 bad units. I'm still thinking it's a power supply issue but I'm not sure.

*Edit*
I apologize Steve it isn't check marked on my server machine. Not sure if I want to stop the mem check and start crunching again or not to find out.

*Update on some research*

-177 - No clue still. Normally caused by gpu trying to process cpu tasks?

1(0x1) error (Incorrect function) - Says out of date drivers possibly the cause. But I'm running 260.99 from Nvidia. Verified both cards were using it as well.

-1073741819 (0xffffffffc0000005)/ Access Violation (0xc0000005) - Can not find any information about this error. This one was caused directly after I tried OCing my 8800.

-6 (0xfffffffffffffffa) (Bad Work Unit Header) - Says this is mainly caused by something on the Seti@Home server side or issues during transfer. I don't think this was caused by my computer but could be wrong. I think this one also may have came after my OC attempts. No sure though.

Seems most of my errored work units have been the -177 and 1(0x1) items. This is really smelling of an under powered set of video cards to me.

e8400 @ 3Ghz (no oc)
4GB Mushkin Blackline DDr2 800
GTS 250 1GB
8800 GTS 640MB
Assorted hard disks x 4
580 watt psu.

I don't think the psu is enough still, but like I said I want to try any available routes before I have to spend $100+ for a 750-850 watt. And if you were buying a power supply for this machine what size would you shoot for? I'm thinking a 750 would be enough, but I've never ran dual card. I'm figuring, on a rough estimate, ~200 watt for motherboard, ram, processor and optical drive. About 170 each on the video cards and 80 watt's or less for the drives. That would put me at about 450. Like I said I've got a 580 supply but I'm thinking under load it may be spiking too high for the psu and cause one of the cards to error out.
Traveling through space at ~67,000mph!
ID: 1063664 · Report as offensive
Profile Tim Norton
Volunteer tester
Avatar

Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1063680 - Posted: 5 Jan 2011, 16:31:40 UTC - in response to Message 1063664.  

BeNt

a few ideas/suggestions for you

You can fix the -177 errors - as Steve suggested Fred's tool will fix these for you even if you do not actually reschedule any wu to or from the GPU just have the option checked and run the app - actually it tells you if you need to reschedule (test) based on the options selected.

for Info: having check some of you error'd wu the -177 are "Maximum elapsed time exceeded" - simply taking too long to crunch - i think this is because you have quite different cards paired together, the 8800 takes twice as long as 250 to crunch a wu - buried in the client_state file are estimate of how long each wu should take made by the server and a limit where the crunching will stop (so a wu does not crunch forever) - ten times the estimate.

But as this is a new machine and i guess you have no flop estimates in your app_info file the server estimates initially are inaccurate - so the server in your case gave you inaccurate estimate (too short) this is compounded by the difference between the cards. The flop estimates are not going to work that well with the inbalance in crunching speed of the GPU anyway so for now i would leave them for another time.

But fred's tool will fix this as it puts in an estimate way bigger than the original and hence prevents the "time out"

you can find some answers to your error codes at the BoincFaQ

http://boincfaq.mundayweb.com/index.php?language=1

For sanity i would just get the server stable with the 250 card first and let it run for a few days to check some or all of the errors you are getting do not come back. (Also this will help with the time estimate problem as the server will get a better idea of what is a correct time estimate.) Then swap the cards and again leave it a couple of days

if things go ok pop both cards back in and see if the problems come back - if they do then a first step would be to try a new PSU - i guess the existing one is not new and older psu are not as efficient as newer models - also it may well be its got the power overall but you are overloading one of the rails or its near the edge and running two cards at once is just too much - also make sure its getting good air flow as they get quite hot when running near capacity - efficiency drops and less power goes to GPU - opps failure etc

My bet is its power related but time will tell
Tim

ID: 1063680 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1063687 - Posted: 5 Jan 2011, 16:43:15 UTC - in response to Message 1063664.  

I'm figuring, on a rough estimate, ~200 watt for motherboard, ram, processor and optical drive. About 170 each on the video cards and 80 watt's or less for the drives. That would put me at about 450. Like I said I've got a 580 supply but I'm thinking under load it may be spiking too high for the psu and cause one of the cards to error out.


200 + 2 * 170 + 80 = 620, not 450.

So it looks like you ARE overpowering your PSU.

And even if 450 were correct, given that the PSU is < 80% efficient, it would draw at least 450/.80 = 562.5 watts, so YES you need a bigger PSU.

ID: 1063687 · Report as offensive
Profile Area 51
Avatar

Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1063704 - Posted: 5 Jan 2011, 17:42:08 UTC - in response to Message 1063664.  

Man though this was resolved, it appears today the issue is back but now I'm getting -177 errors with the information listing "Unhandled Exception Detected...". I tried going back to crunching only one gpu and the problem is still there. Really getting irritated, especially consider my other machine never gave any issues. Going to start running a memtest to see it's the ram.


-177 errors can be fixed with Fred's Rescheduler tool. There is a checkbox that can fix them.

Steve



Yeah I've used the rescheduler for a bit now. But it isn't fixing these issues. I'm getting all kinds of errors. In the last 2 days or so I've returned 8-10 bad units. I'm still thinking it's a power supply issue but I'm not sure.

*Edit*
I apologize Steve it isn't check marked on my server machine. Not sure if I want to stop the mem check and start crunching again or not to find out.

*Update on some research*

-177 - No clue still. Normally caused by gpu trying to process cpu tasks?

1(0x1) error (Incorrect function) - Says out of date drivers possibly the cause. But I'm running 260.99 from Nvidia. Verified both cards were using it as well.

-1073741819 (0xffffffffc0000005)/ Access Violation (0xc0000005) - Can not find any information about this error. This one was caused directly after I tried OCing my 8800.

-6 (0xfffffffffffffffa) (Bad Work Unit Header) - Says this is mainly caused by something on the Seti@Home server side or issues during transfer. I don't think this was caused by my computer but could be wrong. I think this one also may have came after my OC attempts. No sure though.

Seems most of my errored work units have been the -177 and 1(0x1) items. This is really smelling of an under powered set of video cards to me.

e8400 @ 3Ghz (no oc)
4GB Mushkin Blackline DDr2 800
GTS 250 1GB
8800 GTS 640MB
Assorted hard disks x 4
580 watt psu.

I don't think the psu is enough still, but like I said I want to try any available routes before I have to spend $100+ for a 750-850 watt. And if you were buying a power supply for this machine what size would you shoot for? I'm thinking a 750 would be enough, but I've never ran dual card. I'm figuring, on a rough estimate, ~200 watt for motherboard, ram, processor and optical drive. About 170 each on the video cards and 80 watt's or less for the drives. That would put me at about 450. Like I said I've got a 580 supply but I'm thinking under load it may be spiking too high for the psu and cause one of the cards to error out.



Thermaltake hasve a PSU sizing tool on their website:

http://www.thermaltake.outervision.com/

Never used it before, but it may be of some use to you......
ID: 1063704 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1063746 - Posted: 5 Jan 2011, 20:44:40 UTC - in response to Message 1063704.  

possibly unrelated, but my 9600GT is occasionally generating 0x1 errors like the following:

Stderr output
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce 9600 GT, 499 MiB, regsPerBlock 8192
computeCap 1.1, multiProcs 8
clockRate = 1625000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 9600 GT is okay
SETI@home using CUDA accelerated device GeForce 9600 GT
Priority of process raised successfully
Priority of worker thread raised successfully
size 8 fft, is a freaky powerspectrum
size 16 fft, is a cufft plan
size 32 fft, is a cufft plan
size 64 fft, is a cufft plan
size 128 fft, is a cufft plan
size 256 fft, is a freaky powerspectrum
size 512 fft, is a freaky powerspectrum
size 1024 fft, is a freaky powerspectrum
size 2048 fft, is a cufft plan
size 4096 fft, is a cufft plan
size 8192 fft, is a cufft plan
size 16384 fft, is a cufft plan
size 32768 fft, is a cufft plan
size 65536 fft, is a cufft plan
size 131072 fft, is a cufft plan

) _ _ _)_ o _ _
(__ (_( ) ) (_( (_ ( (_ (
not bad for a human... _)

Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is : 0.420956
Cuda error 'cufftExecC2C' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 102 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'd:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.

</stderr_txt>
]]>

Janice
ID: 1063746 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6653
Credit: 121,090,076
RAC: 0
United States
Message 1063750 - Posted: 5 Jan 2011, 20:50:54 UTC - in response to Message 1063746.  

@ S^S,
This may sound strange, but I used to get those errors at a rate of 2 or 3 a day. That can be a memory error, so I was reluctant to over clock my GPU's memory. Finally I just did it, and all the 1 errors went away. Your's may be a different cause, but I am mentioning it as it was just strange.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1063750 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1063768 - Posted: 5 Jan 2011, 22:10:54 UTC - in response to Message 1063750.  

I am getting about one every 2-3 days out of the GT 9600. Honestly the card is most likely approaching the end of its useful life cycle, as well as the AMD.

I will try a blowout and reseat soon, or if it gets much worse. Beyond that...
Well it might be almost time to start collecting parts for my next system. I do need two computers.
Janice
ID: 1063768 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063823 - Posted: 6 Jan 2011, 1:38:00 UTC

Awesome thanks for the reply guys! Sorry it took me awhile to get back I've got a lot on the bench today. As far as testing each card, I know they are both good because the 250 has been crunching for the last few months without issue and the 8800 is what it replaced and crunched in the past. I did however take the 8800 out this morning and all the errors seemed to have went away.

Jravin, wow I can't believe I missed the x2 calculation on the cards! And the 80% efficiency! It's an SLi power supply that's about 2 years old so I know it has some capacitor aging, but it is 80 bronze certified. With all being considered that you have brought up under load I bet it is out sizing my 580 watt psu and causing issues with the second card! Now things are seeming to come a bit clearer.

I have taken the time to properly setup Fred's tool on my server and will re-enable the second card after I have a proper testing period. Seems the last errored unit was at 9:45UTC time today. I think that's about the time I took the other card offline.

As far as mismatching the cards etc. they aren't so different especially considering they are both G92 series cards. The only real difference is one has more memory and a higher clock speed, along with a die shrink so it's using less power, merely a refresh not actually a new architecture(it's a 9800GTX rebranded). But it could be the issue, I'm totally not sure at this time. I assumed with two different cards the time was calculated dependent of the other card? Is there anything besides the rescheduler fix that will fix this without needing to keep the rescheduler working? Also how often should I tell the scheduler to check everything? Right now I have it setup for every 2 hours but should it be sooner?

Thanks for all the tips, suggestions, and ideas guys I really appreciate it. I've just never had issues with my crunchers like this and the only thing I have never done before is add a second video card into the mix of things. When it comes to power supply size what do you think would be reasonable for a dual video card machine? I'm thinking on minimum with both cards crunching along with the cpu I would probably need a 750 at the lowest and an 850 at the highest? As always I appreciate your input on helping me figure this out.
Traveling through space at ~67,000mph!
ID: 1063823 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6653
Credit: 121,090,076
RAC: 0
United States
Message 1063825 - Posted: 6 Jan 2011, 1:51:22 UTC - in response to Message 1063823.  

I'm thinking on minimum with both cards crunching along with the cpu I would probably need a 750 at the lowest and an 850 at the highest? As always I appreciate your input on helping me figure this out.


Think more than you need at the moment. Overkill will last longer in the long run. When I used to repair electronics, I would always replace defective components with ones stronger than the original. That way, the problems I fixed, were less likely to return.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1063825 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1063839 - Posted: 6 Jan 2011, 2:12:31 UTC - in response to Message 1063823.  

My first thought was a bare minimum of 750W and I'm not all that sure so if you can go with higher I would. As for the reschedule tool, I would run it at least until the card levels out. It might be good to keep on running it as there are still quite a few of the old unmarked VLARs out there. I've got mine set for every 6 hours but it would depend on how you have your cache set up and how fast you get to new work in line. If we have another three plus day outage I should still have enough in reserve that any new work sent out would be checked by the reschedule tool before I got to them.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1063839 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1063841 - Posted: 6 Jan 2011, 2:22:18 UTC - in response to Message 1063823.  

Both of these have given me good service:

http://www.newegg.com/Product/Product.aspx?Item=N82E16817116012

NZXT HALE90-850-M 850W ATX 12V v2.2, EPS 12V v2.91 80 PLUS GOLD Certified Modular Active PFC Power Supply


http://www.newegg.com/Product/Product.aspx?Item=N82E16817151100

Seasonic SS-850HT 850W ATX12V v2.31,EPS12V v2.92 80Plus Silver Certified, Active PFC Power Supply - OEM
ID: 1063841 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063851 - Posted: 6 Jan 2011, 3:13:16 UTC
Last modified: 6 Jan 2011, 3:33:29 UTC

Steve, I generally do the same thing on my main rig. But the server is sort of an after thought machine comprised of unused parts from previous builds as I grow into a new gaming machine etc. It's merely a place I use as a cruncher and file storage machine. Occasionally I use it for video transcoding etc, but nothing super important. Hence if I have to buy parts for it I like to stay as cheap as possible. Before the 580 went in it was running off an Antec Neo 430 watt for years. When I got the 8800 it needed a bit more power than my 7800 needed so I went to the 580 etc. My 480 machine is running on a 750 watt PC&C supply which I love. So needless to say I have never bought a single part for the server. But then again it's always been rehashed of older hardware which all worked together, Until now with the dual video cards.

So now the debate for me looms. Upgrade my gaming machine to a better supply and put the 750 in the server, or simply buy a 750+ psu for the server. I don't anticipate putting any additional hardware in it any time soon so I just want what I need at the moment as money is also a constraint, especially when you start talking about a psu bigger than 850 watts. I guess bottom line is I may end up having to retire the 8800 from service(trusty as it's always been). Guess it's time to get with the better half and work out a deal. ;)

As far as what I'm looking at are these:
Corsair 850TX 80+ Silver - $129.99 5 Year warranty
Seasonic SS-850HT 80+ Silver - $119.99 3 year warranty
What I would get but probably above my price range at this time:
PC&C Silencer 950 80+ Silver - $189.99 7 year warranty

Both the Corsair and Seasonic seem to have a single 12v(70A) rail which is really good, but the PC&C has an 83.4A single rail not to mention 88% efficiency at full load! Between the two it's hard to pick. I know bill give the Seasonic a thumbs up, anyone else have an opinion or supply they have an opinion on?

I generally stay in the Antec / PC Power & Cooling circle of things, but this is a budget limited fix(read not much per the boss), so going with a $200+ supply is out of the question or I would go with a much larger supply. I love my PC&C 750 and can't help but wonder if it would power the server because I can get a new one of those for $129.99.

Obviously I may be hoping beyond my means, but I wish I could run this setup off a 750 psu. All the calculators online put me either at 470 watts or ~700 watts for my setup so I'm lost on the dual card psu debate.

*Edit*
After a bit of reading I'm really leaning towards the Seasonic. I have found out that PC&C outsource the production of their units to Seasonic as their OEM. I never knew that! Apparently they design the supplies and send the build order to them. So if they trust them I'm sure I can possibly.

*Edit #2*
Scratch PC&C off my plate from this point on, at least their MKII line. OCZ(They own PC&C now) is outsourcing that work to Sirfa who makes all the woefully mediocre power supplies, pretty much ever built. There are even reports of hand soldered on capacitors on the end of the circuit boards inside to keep 12v rippples in check. Blast, way to ruin a good name.....
Traveling through space at ~67,000mph!
ID: 1063851 · Report as offensive
Profile Tim Norton
Volunteer tester
Avatar

Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1063852 - Posted: 6 Jan 2011, 3:38:15 UTC - in response to Message 1063851.  

My vote would be the Corsair as its only $10 more and you get 2 years extra warranty

also its sli certified so designed to run with two cards - 4 pcie connectors

As your machine boots up now and runs with two cards even though crunching is a error prone i suggest you are not that far below what you need to run your setup

you would be giving the system another 270watts of capacity to work with which is almost enough to run the cards on their own

You might get away with a 750 but as Steve suggested i would play safe and go higher than you think too give you a margin and the PSU will run cooler and more efficient

Good luck with the boss ok'ing the spend :)

PS: Just as an aside you could use the 420 watt with the 580 watt with a bit of research
Tim

ID: 1063852 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6653
Credit: 121,090,076
RAC: 0
United States
Message 1063855 - Posted: 6 Jan 2011, 3:46:31 UTC
Last modified: 6 Jan 2011, 3:47:48 UTC

Just for the record.

Piggy:
1250 watt BFG PSU
2200 VA APC Smart UPS
30 AMP twist lock direct line from the mains, 10 Guage
Overkill.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1063855 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1063962 - Posted: 6 Jan 2011, 14:16:14 UTC
Last modified: 6 Jan 2011, 14:16:54 UTC

Since 9:45 UTC yesterday (3:45 local) I haven't had any errors. One thing I am noticing happening now however is the gpu will have a work unit that just stops. If I suspend it and it loads a different one in it crunches away happily. One more issue to figure out. But this could also be from the power supply, who knows.

I've decided to go after the Corsair 850TX, seems right now it's the best bang for the buck and it's only going to run about ~$130, and it seems after rebate could be $109. So I need to get that ordered to see if it really cures my problem. Thanks for all the help guys getting this worked out, and especially for the suggestions and comments!

BTW Steve, really? 10 gauge 30 amp twist locks lines from your mains, really? You don't happen to have a time machine stuffed in there somewhere do you, because you have to be producing at least 1.21 jiggawatts with that kind of power. Insane but I can't bash if I had some extra funding I would probably build a computer lab onto my house lol.
Traveling through space at ~67,000mph!
ID: 1063962 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Errors on Cuda Units with new server build


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.