Astropulse Errors-Optimized version 5

Message boards : Number crunching : Astropulse Errors-Optimized version 5
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 856177 - Posted: 21 Jan 2009, 21:49:17 UTC - in response to Message 856174.  

@Blurf
Please, could you change first post in thread.
"Please, upgrade your optimized AP version to ap_5.00r69 !" should be changed to "Please, upgrade your optimized AP version to ap_5.00r103 !"


Done.


ID: 856177 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 857490 - Posted: 25 Jan 2009, 4:36:12 UTC - in response to Message 855368.  

I'd keep an eye on it, to see that it validates & gets credit, but most likely nothing to worry about.

[Edit: I'll place an additional flush after the incomplete text block is printed, to ensure the text is written to disk as quickly as possible, reducing the chance of this kind of event.]

[Later:] Hmm, looks like it *could possibly* also be the source of the 'exited with no finished file' warnings. Adding more paranoid code in next release.

wuid=390473712 did validate in the end. Using r103 now.
Thanks for the info.
ID: 857490 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 859668 - Posted: 30 Jan 2009, 9:14:09 UTC
Last modified: 30 Jan 2009, 9:19:26 UTC

My first rev 103 result was a success and validated. But I noticed its claimed credit was about 30 points higher than it's wingman's rev 69 result.
ID: 859668 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 859758 - Posted: 30 Jan 2009, 15:19:27 UTC - in response to Message 859668.  
Last modified: 30 Jan 2009, 15:21:38 UTC

application Astropulse
created 28 Dec 2008 10:36:00 UTC
name ap_15no08ae_B5_P0_00253_20081228_01028.wu
minimum quorum 2
initial replication 4
max # of error/total/success tasks 5, 10, 10
Task ID
click for details Computer Sent Time reported or deadline
explain Server state
explain Outcome
explain Client state
explain CPU time (sec) claimed credit granted credit
1105979118 4730870 28 Dec 2008 10:37:11 UTC 4 Jan 2009 8:25:37 UTC Over Client error Compute error 238,370.00 142.54 ---
1105979119 4677151 28 Dec 2008 10:37:10 UTC 3 Jan 2009 1:41:31 UTC Over Success Done 62,636.84 213.12 0.00
1112661132 1683125 4 Jan 2009 8:25:42 UTC 25 Jan 2009 15:17:26 UTC Over Success Done 1,813,830.00 703.68 0.00
1134159972 4070955 25 Jan 2009 15:17:39 UTC 26 Jan 2009 20:35:27 UTC Over Success Done 44,771.58 782.78 0.00
1135724185 4264561 26 Jan 2009 20:35:43 UTC 25 Feb 2009 20:35:43 UTC In progress --- New --- --- ---


Is this one of the reasons AP validators are turned off?
As well as the AP-Splitters, because this seems like a waist of time, IMHO.
Hope, it will validate after all ;)
ID: 859758 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 859787 - Posted: 30 Jan 2009, 17:04:30 UTC
Last modified: 30 Jan 2009, 17:07:59 UTC

You'll notice that newer issued AstroPulse tasks claim more credit than they used to. This is due to the rising server side credit multiplier. In all cases where I've looked at it, it was either a recent issued task, or a resend, where the wingmen were issued some months ago, failed to complete successfully and were subsequently reissued.

@Fred, The first three wingmen clearly choked on that task, for whatever reasons. Hopefully your new wingman will be in with an accurate V5 result.

As to the state of the AP pipeline, probably some work is underway related to testing / implementing the new, substantially different, 5.01 AstroPulse application. I've heard whispers that it, when it is ready to come to main, may be treated as a new application name. This would avoid many of the cross validation difficulties experienced with the 4.35->5 transition. There could be other reasons for things being turned off, but I would imagine the reconfiguration involved might take a bit of effort.

[Note: 5.01 AP tasks are also intrinsically larger in terms of amount of processing, so the credit claim will likely rise further for those (when they appear here) ]

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 859787 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 859856 - Posted: 30 Jan 2009, 22:04:23 UTC - in response to Message 859787.  
Last modified: 30 Jan 2009, 22:12:18 UTC

Jason,

Thank you! I didn't realize the credit adjustment was also applied to reissues. (And, obviously, I also have a tendency to jump to the wrong conclusion.)

Stick

You'll notice that newer issued AstroPulse tasks claim more credit than they used to. This is due to the rising server side credit multiplier. In all cases where I've looked at it, it was either a recent issued task, or a resend, where the wingmen were issued some months ago, failed to complete successfully and were subsequently reissued.

ID: 859856 · Report as offensive
Profile KenZaske

Send message
Joined: 12 Oct 04
Posts: 7
Credit: 456,380
RAC: 0
United States
Message 865824 - Posted: 15 Feb 2009, 18:33:28 UTC

Hello everyone;

I have been using the optimized AP version 5.00r69 for quite a while, suddenly almost every work unit I have done this month has gotten a zero score. Dozens of them report a “client error!” See: http://setiathome.berkeley.edu/results.php?hostid=4752984&offset=40 for more details. I went back to the default client but I want to use the optimized client. Any idea what happened or which optimized client still works?

Ken

ID: 865824 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 865839 - Posted: 15 Feb 2009, 19:07:42 UTC - in response to Message 865824.  

Hi there,
The one Astropulse task I can see in your list, appears on the surface to have processed normally, but was teamed with an outdated v4.36 wingman, so has been sent out for reissue. Probably OK, but I can't guarantee that, because all the other tasks, which seem to be multibeam (cuda) appear to be getting 'Compute Errors' on your host.

If you're looking for a newer AstroPulse build, there is one in the optimised apps sticky, but bear in mind that soon a newer incompatible version will be available, possibly in a few days or less. What I would suggest for that machine, as it seems to have some health problems, is reverting to stock in the meantime, give it a good clean, and make sure those cuda drivers & other general machine health indicators are up to scratch.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 865839 · Report as offensive
Profile KenZaske

Send message
Joined: 12 Oct 04
Posts: 7
Credit: 456,380
RAC: 0
United States
Message 865914 - Posted: 15 Feb 2009, 23:36:17 UTC - in response to Message 865839.  

I got those health issues dealth with a few weeks ago. The new motherboard uses an NF4U chip set, which I am not impressed with. I wish I had another ULI MB but alas, I don't. It to six reinstalls of WinXP 64bit to get a stable install. Now if nVidia would just publish a stable driver set I would be happy. Thanks for the quick reply.
ID: 865914 · Report as offensive
Profile KenZaske

Send message
Joined: 12 Oct 04
Posts: 7
Credit: 456,380
RAC: 0
United States
Message 867626 - Posted: 21 Feb 2009, 7:08:21 UTC - in response to Message 865839.  

I have been thinking about it and have one question. Does it require PhysX to be installed?
ID: 867626 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 867630 - Posted: 21 Feb 2009, 7:31:45 UTC - in response to Message 867626.  
Last modified: 21 Feb 2009, 7:34:29 UTC

I have been thinking about it and have one question. Does it require PhysX to be installed?


For AstroPulse: there is no Astropulse GPU / Cuda application at this time (Only CPU), so Astropulse applications won't care about any GPU related installation considerations at this time. It may come at a later date, but the exact form the GPU will be used may be different to the current multibeam Cuda enabled builds.

For Multibeam (setiathome_enhanced): AFAIK at the moment, it doesn't matter for the Cuda Multibeam builds whether PhysX is installed or not, but when I put in my updated drivers in my machine (which is important to do), it installed PhysX and it seems to do no harm.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 867630 · Report as offensive
Profile KenZaske

Send message
Joined: 12 Oct 04
Posts: 7
Credit: 456,380
RAC: 0
United States
Message 868792 - Posted: 23 Feb 2009, 21:17:14 UTC - in response to Message 867630.  

Thank you. As for PhyxX doing no harm, I have noticed a small increase in game performance if I uninstall it. But then again, none of the games I play use it anyway.
ID: 868792 · Report as offensive
Tony Farkas

Send message
Joined: 29 Apr 02
Posts: 1
Credit: 14,151
RAC: 0
Japan
Message 868941 - Posted: 24 Feb 2009, 5:46:45 UTC
Last modified: 24 Feb 2009, 5:59:13 UTC

I've completed 2 AP wu on 2 diffent computers and they wont upload. It took them around 100 hours to do each wu. how do get them uploaded? They have been finished for 4 days now.


2/24/2009 2:57:08 PM|SETI@home|Started upload of ap_16ja09ag_B3_P1_00398_20090218_02436.wu_1_0
2/24/2009 2:57:22 PM||Project communication failed: attempting access to reference site
2/24/2009 2:57:22 PM|SETI@home|Temporarily failed upload of ap_16ja09ag_B3_P1_00398_20090218_02436.wu_1_0: HTTP error
2/24/2009 2:57:22 PM|SETI@home|Backing off 3 hr 26 min 47 sec on upload of ap_16ja09ag_B3_P1_00398_20090218_02436.wu_1_0
2/24/2009 2:57:31 PM||Internet access OK - project servers may be temporarily down.
ID: 868941 · Report as offensive
Profile Piotr Kunkel
Volunteer tester

Send message
Joined: 7 Apr 00
Posts: 18
Credit: 19,385,083
RAC: 0
Poland
Message 868971 - Posted: 24 Feb 2009, 9:30:08 UTC
Last modified: 24 Feb 2009, 9:30:33 UTC

Simply be patient.
" Then they came for me and there was no one left to speak out for me."

Martin Niemöller
ID: 868971 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 869630 - Posted: 26 Feb 2009, 8:43:53 UTC
Last modified: 26 Feb 2009, 8:46:20 UTC

Maybe this has already been covered, or it is simply a fluke but I decided to post it anyway .... There's a lot of you out there that know more about this kind of stuff than I do.

[edit]I just realized this thread is intended for ap 5.0 and not the newer 5.03, but I don't see a thread for posting errors occurring with 5.03. But if a mod wants to move it to a more appropriate thread, be my guest .... [/edit]

I got a compute error on this unit

Optimized AstroPulse 5.03, on a dual atom running Windows HomeServer. Painfully slow, but it's running all day anyway and the extra power consumption for making it run seti is something like 4 Watts :-)


The unit errored after something like 16 h of processing. A couple of others are still running, I'll keep an eye on those too.

Most obvious problem:

No heartbeat from core client for 30 sec - exiting
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
### Restart at 12.26 percent.
No heartbeat from core client for 30 sec - exiting
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
boinc_graphics_make_shmem failed: 0


What causes this and can it be avoided somehow, or is it just something that happens from time to time?

Regards,
John.



Name ap_07ja09ad_B4_P0_00119_20090223_28491.wu_2
Workunit 417940236
Created 23 Feb 2009 19:43:08 UTC
Sent 23 Feb 2009 20:04:17 UTC
Received 25 Feb 2009 23:06:17 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -202 (0xffffffffffffff36)
Computer ID 4728779
Report deadline 25 Mar 2009 20:04:17 UTC
CPU time 61519.14
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -202 (0xffffff36)
</message>
<stderr_txt>
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
AstroPulse v. 5.03
Non-graphics FFTW USE_CONVERSION_OPT USE_SSE3
Windows x86 rev 112, Don't Panic!, by Raistmer with support of Lunatics.kwsn.net team. SSE3
static fftw lib, built by Jason G.
ffa threshold mod, by Joe Segur.
SSE3 dechirping by JDWhale
CPUID: Intel(R) Atom(TM) CPU 330 @ 1.60GHz

Cache: L1=64K L2=512K
Features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3
No heartbeat from core client for 30 sec - exiting
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
### Restart at 12.26 percent.
No heartbeat from core client for 30 sec - exiting
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
boinc_graphics_make_shmem failed: 0

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 149.103875429963
Granted credit 0
application version 5.03
ID: 869630 · Report as offensive
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 627
Credit: 1,674,173
RAC: 54
Germany
Message 871268 - Posted: 2 Mar 2009, 9:56:58 UTC

I changed to new optimized ap_5.03r112_SSE3 the last days. Using WinXP SP3 32bit, AMD 64X2-3800, NO OCing. Never had any problems with optimized astropulse clients before. Now all AP WUs immediately exit with error. The german error message in the following reads: "Process creation failed: Access denied (0x5)"


02.03.2009 10:13:00|SETI@home|Starting ap_21ja09aa_B1_P0_00172_20090228_06971.wu_1
02.03.2009 10:13:00|SETI@home|[error] Process creation failed: Zugriff verweigert (0x5)
02.03.2009 10:13:00|SETI@home|[error] Process creation failed: Zugriff verweigert (0x5)
02.03.2009 10:13:01|SETI@home|[error] Process creation failed: Zugriff verweigert (0x5)
02.03.2009 10:13:01|SETI@home|[error] Process creation failed: Zugriff verweigert (0x5)
02.03.2009 10:13:01|SETI@home|[error] Process creation failed: Zugriff verweigert (0x5)
02.03.2009 10:13:03|SETI@home|Computation for task ap_21ja09aa_B1_P0_00172_20090228_06971.wu_1 finished
02.03.2009 10:13:03|SETI@home|Output file ap_21ja09aa_B1_P0_00172_20090228_06971.wu_1_0 for task ap_21ja09aa_B1_P0_00172_20090228_06971.wu_1 absent


I've no idea, what this is about. Normal s@h MB WUs are handled without problems. I observed, mainly AMD systems exit on AP WUs with error. Maybe some AMD specific problem with optimized AP clients?

Some links to my AP WUs exiting in error with wingmans also using AMD systems:

AP WU 419896689
AP WU 419895682
AP WU 418909902

Suggestions?

Regards,
Michael
ID: 871268 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 871275 - Posted: 2 Mar 2009, 10:34:16 UTC - in response to Message 871268.  
Last modified: 2 Mar 2009, 10:54:21 UTC

I changed to new optimized ap_5.03r112_SSE3 the last days. Using WinXP SP3 32bit, AMD 64X2-3800, NO OCing. Never had any problems with optimized astropulse clients before. Now all AP WUs immediately exit with error. The german error message in the following reads: "Process creation failed: Access denied (0x5)"


Not come across this one directly myself, but suspect it could be the Boinc accounts' permissions messed up somehow. I had a similar DLL failure on one machine a while back, which running the repair install via control panel seemed to fix those (a boinc reinstall should fix those similarly). I have no idea why they would've been damaged or changed in any way on my system though, but if the permissions repair helps you please let us know here, and we can let the Boinc devs know these may be being trashed through some unidentified mechanism. The alternative is probably to use the non-service (non-protected application) install type for Boinc (default), but I didn't try that either.

Jason

[Edit: also could try raising the priority of the Boinc.exe process as discussed in the next post, about 'No Heartbeat' messages.]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 871275 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 871278 - Posted: 2 Mar 2009, 10:49:24 UTC - in response to Message 869630.  


...
No heartbeat from core client for 30 sec - exiting
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
boinc_graphics_make_shmem failed: 0

What causes this and can it be avoided somehow, or is it just something that happens from time to time?

Regards,
John.


For the "no heartbeat" message, first suspect high system load during these times. I'm told it's harmless enough (though annoying ;)). I had that symptom last week on my machine out in the lounge room, and eventually traced it to my mother having secret games of "Railroad Tycoon II" at around 2am, without suspending Boinc (as shown).

In your case I would look for some scheduled tasks or services that run around these times.

The hard crash though, may be of more concern, and may or may not be connected to the No heartbeat issue directly. I've seen mention that raising the priority of the boinc.exe process above normal (using task manager, or easier might be 'Process Lasso' which will allow automated priority modification of the program on startup.)

If that helps both issues, so indicating they may be connected to system utilisation / boinc priority, then please let us know.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 871278 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 871279 - Posted: 2 Mar 2009, 11:03:12 UTC - in response to Message 871278.  

I'm told it's harmless enough (though annoying ;)).

It's harmless if it only happens once or twice per WU, but if it happens too often (I think 100 times per WU), you eventually reach "Too many normally harmless exits" and the task is abandoned.

As with all BOINC error and warning messages (and indeed Windows messages), it's better to understand and eliminate as many as possible, rather than just ignoring them and hoping they'll go away.
ID: 871279 · Report as offensive
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 627
Credit: 1,674,173
RAC: 54
Germany
Message 871302 - Posted: 2 Mar 2009, 12:57:31 UTC - in response to Message 871275.  
Last modified: 2 Mar 2009, 13:07:12 UTC

Oh well, a stupid problem I could have easily identified by myself. Thanks Jason for the hint. I'm running BOINC in protected service mode.

I simply copied the optimized AP binaries to the appropriate folder and modified the app_info.xml accordingly. But I don't payed attention tho the access rights of the binaries (.exe). The s@h MB binary (running without problems) has the following group access rights set:

Administrators: Full
boinc_admins: Full
boinc_projects: Full
boinc_users: Read

So I simply missed to add access rights to the optimized AP binaries for those boinc_admins, boinc_projects and boinc_users groups in WinXP. I simply forgot it.

But maybe my fault helps others... ;-)
I only have to wait now for next AP WUs to observe the issue again.

Greetings,
Michael
ID: 871302 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Astropulse Errors-Optimized version 5


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.