a few more WUs that caught my attention


log in

Advanced search

Message boards : Number crunching : a few more WUs that caught my attention

Author Message
N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11632
Credit: 14,358,347
RAC: 13,493
United States
Message 1311791 - Posted: 6 Dec 2012, 16:17:24 UTC

1122055164 has had three of us return -12 errors with MAX_TRIPLETS_ABOVE_THRESHOLD. The fourth host shows as pending, but it actually returned a -9 overflow for 31 pulses. Is this just a bad WU?

1122577134 it's minor: the -9 30 spikes came on a GTX 570 instead of the usual culprit, a 560 Ti.

1116855907 it's not the error that I noticed, it's that my successful wingman shows 11,600 seconds of CPU time but 0 seconds run time.

2653530133 had one host return -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS. I went through the stderr and counted: the 'Lunatics not bad for a human' text logo appears 50 times. What can cause this?

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile dancer42
Volunteer tester
Send message
Joined: 2 Jun 02
Posts: 436
Credit: 1,158,909
RAC: 1,536
United States
Message 1311795 - Posted: 6 Dec 2012, 16:41:40 UTC - in response to Message 1311791.

this sounds like a out of range error were there was a stack error causing the program to go out of the expected range thus executing the wrong piece of code out of order.

this causes unpredictable errors there after.

try restarting the computer if the error is in memory this may correct it.

if this fails you may need to do a clean re-install.

the lunatics logo means you are running an optimized install.

the bad thing about this is if you need to do a clean install unless you saved the lunatics installer, you will lose you optimized install.

the optimized installer is currently not available do to licensing concerns.

ps clean install is, uninstall wipe boinc directory and reinstall.


____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4252
Credit: 1,050,380
RAC: 249
United States
Message 1311860 - Posted: 6 Dec 2012, 18:38:55 UTC - in response to Message 1311791.

1122055164 has had three of us return -12 errors with MAX_TRIPLETS_ABOVE_THRESHOLD. The fourth host shows as pending, but it actually returned a -9 overflow for 31 pulses. Is this just a bad WU?

Yes, looks like a WU with bad RFI impact. CUDA apps do all of the Triplet searches for a chirp/fft pair before Pulse searches, so the excess triplet error being seen rather than Pulse overflow isn't surprising.

1122577134 it's minor: the -9 30 spikes came on a GTX 570 instead of the usual culprit, a 560 Ti.

1116855907 it's not the error that I noticed, it's that my successful wingman shows 11,600 seconds of CPU time but 0 seconds run time.

BOINC 6.2.19 is from before GPU crunching, so doesn't report run time. I don't know whether there's a later version of BOINC for SunOS 5.11 any place.

2653530133 had one host return -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS. I went through the stderr and counted: the 'Lunatics not bad for a human' text logo appears 50 times. What can cause this?

The cause of "Cuda error....: all CUDA-capable devices are busy or unavailable." could be many things. BOINC sends back only the last 64K of a stderr.txt which is larger, the ERR_TOO_MANY_EXITS happens after 100 early exits. Because those exits were of the temporary_exit type with a 3 minute delay, that was happening on the host for over 5 hours. The intent is that should be enough time for the user to notice and try to fix the problem.
Joe

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11632
Credit: 14,358,347
RAC: 13,493
United States
Message 1311919 - Posted: 6 Dec 2012, 21:00:20 UTC - in response to Message 1311860.

2653530133 had one host return -226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS. I went through the stderr and counted: the 'Lunatics not bad for a human' text logo appears 50 times. What can cause this?

The cause of "Cuda error....: all CUDA-capable devices are busy or unavailable." could be many things. BOINC sends back only the last 64K of a stderr.txt which is larger, the ERR_TOO_MANY_EXITS happens after 100 early exits. Because those exits were of the temporary_exit type with a 3 minute delay, that was happening on the host for over 5 hours. The intent is that should be enough time for the user to notice and try to fix the problem.
Joe

That intent wouldn't work for me. I generally only touch my machines once every four days, when I have to go downstairs to do laundry. Even then, I don't look *that* closely at the tasks in progress.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Message boards : Number crunching : a few more WUs that caught my attention

Copyright © 2014 University of California