Message boards :
Number crunching :
Unexplained cpu errors
Message board moderation
Author | Message |
---|---|
MadMaC Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 |
Not sure why this is happening... I cant check the ones in the image as uploads aren't working but browsing through tasks for the machine I can see errors stating <core_client_version>6.10.56</core_client_version> <![CDATA[ <message> - exit code -12 (0xfffffff4) </message> <stderr_txt> </stderr_txt> ]]> The machine in question is http://setiathome.berkeley.edu/results.php?hostid=5424775 Im suspending cpu calculations for now and apologise to my wingmen :-( |
Sid Send message Joined: 12 Jun 07 Posts: 16 Credit: 10,968,872 RAC: 0 |
Any heat issues? |
aad Send message Joined: 3 Apr 99 Posts: 101 Credit: 204,131,099 RAC: 26 |
Did you allready try a reboot? Or perhaps you have Windows update issues (new drivers)? |
geoff Send message Joined: 25 Apr 00 Posts: 123 Credit: 34,100,351 RAC: 18 |
Looking at error tasks for 5424775 they are all GPU errors with exit status of -177 and -12 |
Miep Send message Joined: 23 Jul 99 Posts: 2412 Credit: 351,996 RAC: 0 |
Looking at error tasks for 5424775 they are all GPU errors with exit status of -177 and -12 He said the tasks in question hadn't been reported yet. -177 (elapsed time too long) is a consequence of the introduction of server side DCF corrections and can be prevented by running the new rescheduler to correct the 'maximum allowed elapsed time before abort' values. -12 are a mostly GPU specific problem, that's being worked on at Lunatics. Was that the complete stderr text?! That shouldn't be empty... I'd also suggest a reboot and maybe some basic system checks before setting boinc to allow one task to run as a test. (set to NNT, mark all CPU, set them to suspend, unsuspend one) Carola ------- I'm multilingual - I can misunderstand people in several languages! |
MadMaC Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 |
Thanks for the input - I dont think it was heat related as the rig is in an air conditioned room I have rebooted the box and will see how it goes running one task.. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
@ Miep (Carola), just ahead of me, but you're right, most of them are -177 error's on GPU. Naam 23my10aa.8366.12746.14.10.159_1 Werkeenheid 643374638 Aangemaakt 13 Aug 2010 22:49:58 UTC Verzonden 14 Aug 2010 0:33:09 UTC Ontvangen 16 Aug 2010 8:30:43 UTC Server status Binnen Uitkomst Client fout Client status Berekeningsfout Afsluit status -177 (0xffffffffffffff4f) Computer ID 5424775 Rapporteren voor 27 Aug 2010 18:49:49 UTC Loop tijd 1,583.59 CPU tijd 560.84 Validatie status Ongeldig Punten 0.00 Programma versie SETI@home Enhanced Anoniem platform (NVIDIA GPU) Stderr output <core_client_version>6.10.56</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> s = 'bf54a347' 00533c54 bf54aa46 bf54adc5 bec27b55 bec29295 bec2a9d4 AK_v8b_win_SSE3_AMD!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'bf54a6c6' All my GPU error's were heat related, so I took some serious action ;-) and have my host with ATI's HD5770 & 4850, turned flat. Laying flat, like an 'old fashioned PC', but it did work!. Gonna run my QX9650 host, without a case, cause everything got that hot, I burned my fingers taking out a Harddisk! Both card's, let alone more cards, produce so much heat, the 2 input and 2 output fans, are simply not enough! And overclocking can be a real killer, too. Just my ample observations on my own rig's. And reading more and more about CUDA/CAL error's on GPU's, on these Forums, as well. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.