Computation Errors / Completed - Marked as Invalid

Message boards : Number crunching : Computation Errors / Completed - Marked as Invalid
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1739195 - Posted: 2 Nov 2015, 23:01:37 UTC

Afternoon peeps!! I am getting some computation errors since adding a second GPU. Is there a known cause for this error? I'm thinking maybe the clockrate set too fast? Any thoughts/ideas?

Thanks
AyalaZero
ID: 1739195 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 1739197 - Posted: 2 Nov 2015, 23:05:56 UTC - in response to Message 1739195.  

If you are OCing try stock speeds and see what happens.
ID: 1739197 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1739200 - Posted: 2 Nov 2015, 23:14:13 UTC - in response to Message 1739195.  
Last modified: 2 Nov 2015, 23:19:00 UTC

I am getting some computation errors since adding a second GPU.

Most likely cause- your power supply isn't up to it.
Remove the new card, see if the errors stop.
If they do, remove the old card, replace it with the new one & see if things are still OK.
If not, it's an issue with the card, or the driver isn't suitable for the new card.
If it's OK, then add the old card back again.
If the errors re-occur, I'd blame the power supply.


As Betreger posted, if you're overclocking, set things back to stock. Could be the card not coping with the OC, could still be the PSU not being up to the extra load.



EDIT
I just looked at your system info- you have a 140W CPU, and 2*250W GPUs.
Without overclocking, I'd want a 750W minimum PSU- and that's a quality PSU, not some cheap brand.
With just a 600W PSU there, I'd expect problems, even without overclocking. With a cheap PSU, lots of problems.
Grant
Darwin NT
ID: 1739200 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1739204 - Posted: 2 Nov 2015, 23:32:34 UTC - in response to Message 1739200.  

I have an EVGA 1000W PSU platinum. So likely the overclocking, i'll set it to stock, hopefully it fixes it. (Once confirmed I'll OC to a error-free OC speed.)

Thanks guys
AyalaZero
ID: 1739204 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1739205 - Posted: 2 Nov 2015, 23:33:28 UTC - in response to Message 1739195.  
Last modified: 2 Nov 2015, 23:36:00 UTC

Afternoon peeps!! I am getting some computation errors since adding a second GPU. Is there a known cause for this error? I'm thinking maybe the clockrate set too fast? Any thoughts/ideas?

Thanks
AyalaZero

It seems when you went to Anonymous platform you didn't select the Correct Version of CUDA. Back when you were running Stock you were Not sent CUDA32 'cause they don't work very well on your cards.
http://setiathome.berkeley.edu/results.php?hostid=7798698&offset=2020
Try running the Lunatics installer again and select CUDA50 this time.

You also appear to have a nice stash of Ghosts there...
ID: 1739205 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1739209 - Posted: 2 Nov 2015, 23:47:30 UTC - in response to Message 1739205.  
Last modified: 2 Nov 2015, 23:48:02 UTC

It seems when you went to Anonymous platform you didn't select the Correct Version of CUDA. Back when you were running Stock you were Not sent CUDA32 'cause they don't work very well on your cards.

Ah, yeah. TBar's got it.
You need CUDA50, CUDA 32 will screw the pooch seriously with any current hardware.
Grant
Darwin NT
ID: 1739209 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1739280 - Posted: 3 Nov 2015, 6:13:08 UTC

Did anyone look to see what the stderr for any of the errors was - I just looked and all the OP's errors are "aborted by user" (140 of them), with the most recent being today at ~03:53utc.

(The OP is talking about the system with a pair of GTX908ti?)

There is one "error while computing", stderr follows:

Stderr output

<core_client_version>7.6.9</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 194 (0xc2)
</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
  Device 1: GeForce GTX 980 Ti, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 22 
     pciBusID = 3, pciSlotID = 0
  Device 2: GeForce GTX 980 Ti, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 22 
     pciBusID = 2, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
   Device 2: GeForce GTX 980 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 980 Ti
mbcuda.cfg, processpriority key detected
mbcuda.cfg, Global pfblockspersm key being used for this device
pulsefind: blocks per SM 16 
mbcuda.cfg, Global pfperiodsperlaunch key being used for this device
pulsefind: periods per launch 400 
Priority of process set to NORMAL successfully
Priority of worker thread set successfully

setiathome enhanced x41zc, [i]Cuda 3.20[/i]

Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.422474

[b]Kepler GPU current clockRate = 1463 MHz[/b]

re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Thread call stack limit is: 1k

</stderr_txt>
]]>


I can see two things wrong:
First is the overclock - reset that to standard.
Second is the version of CUDA being used, 3.2 will lead to errors on a GTX980ti, re-install the optimised apps, this time selecting the CUDA 5.0 application.

(and don't abandon tasks, it doesn't help diagnose a problem with one task when those trying to help have to dig through dozens of abandons when a simple "suspend" will stop tasks being run while help is being sought.)

The OP's other system only has an Intel GPU which is dropping errors.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1739280 · Report as offensive
Profile AyalaZero
Volunteer tester

Send message
Joined: 14 Aug 05
Posts: 21
Credit: 10,910,119
RAC: 0
United States
Message 1739742 - Posted: 4 Nov 2015, 22:59:55 UTC

http://setiathome.berkeley.edu/results.php?hostid=7798698&offset=80&show_names=0&state=5&appid=
Seems like a lot of peeps are getting them marked as "Completed - marked as invalid" maybe it's a bad batch of WU's? If you look at your results, I'm sure we'll all have some recent invalid tasks. Hopefully they fix this soon.

AyalaZero
ID: 1739742 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1739746 - Posted: 4 Nov 2015, 23:09:16 UTC - in response to Message 1739742.  

http://setiathome.berkeley.edu/results.php?hostid=7798698&offset=80&show_names=0&state=5&appid=
Seems like a lot of peeps are getting them marked as "Completed - marked as invalid" maybe it's a bad batch of WU's?

All the other people getting Invalid WUs happened after the Weekly outage (Tues, 3/11/2015). There were some changes to the splitter code, and they didn't work as intended.
They're working on it.
Grant
Darwin NT
ID: 1739746 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1739748 - Posted: 4 Nov 2015, 23:28:03 UTC

Yep lots of invalids here this morning, along with some errors (some I even had to abort as they were going nowhere), but I'm also bogged down with backup work being downloaded. :-(

Oh well, it can only get better.

Cheers.
ID: 1739748 · Report as offensive
Profile GTP

Send message
Joined: 5 Jul 99
Posts: 67
Credit: 137,504,906
RAC: 0
United States
Message 1739813 - Posted: 5 Nov 2015, 4:55:19 UTC - in response to Message 1739748.  

Ditto here, nearly 200 jobs marked invalid with 30-50sec CPU times.

Example: http://setiathome.berkeley.edu/workunit.php?wuid=1954401815

All the best,
Aaron Lephart

TechVelocity.com
ID: 1739813 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1739819 - Posted: 5 Nov 2015, 5:51:46 UTC

There was a problem with the servers:

http://setiathome.berkeley.edu/forum_thread.php?id=78297&postid=1739673
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1739819 · Report as offensive
Profile Francis Noel
Avatar

Send message
Joined: 30 Aug 05
Posts: 452
Credit: 142,832,523
RAC: 94
Canada
Message 1739886 - Posted: 5 Nov 2015, 13:19:38 UTC - in response to Message 1739819.  

There was a problem with the servers:

http://setiathome.berkeley.edu/forum_thread.php?id=78297&postid=1739673



Ah, thank you for the heads-up rob I had missed that.
This explains why I woke up to a few hundred invalids this morning from all hosts, GPU types and CUDA versions.
mambo
ID: 1739886 · Report as offensive

Message boards : Number crunching : Computation Errors / Completed - Marked as Invalid


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.