AstroPulse errors - Reporting

Message boards : Number crunching : AstroPulse errors - Reporting
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14

AuthorMessage
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 836243 - Posted: 1 Dec 2008, 20:04:52 UTC

I am possibly too dumb to play with computers = why did Task ID 1062735426 run 38+ hours, exit with a "valid" but give me 0.00 credits (and no joy)


Link:

http://setiathome.berkeley.edu/result.php?resultid=1062735426



System (3255203) runs AK_v8_win_SSE3.exe with out issues

Thanx,
Oz
Member of the 20 Year Club



ID: 836243 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 836247 - Posted: 1 Dec 2008, 20:12:17 UTC

BTW, the contents of my app_info.xml are:


<app_info>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>AK_v8_win_SSE3.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>517</version_num>
<file_ref>
<file_name>AK_v8_win_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>527</version_num>
<file_ref>
<file_name>AK_v8_win_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>528</version_num>
<file_ref>
<file_name>AK_v8_win_SSE3.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse</name>
</app>
<file_info>
<name>ap_4.37_SSE3_EPF.exe</name>
<executable/>
</file_info>
<file_info>
<name>ap_5.00r69_SSE3.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-1-1a_upx.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse</app_name>
<version_num>435</version_num>
<file_ref>
<file_name>ap_4.37_SSE3_EPF.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>ap_5.00r69_SSE3.exe</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse</app_name>
<version_num>436</version_num>
<file_ref>
<file_name>ap_4.37_SSE3_EPF.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>ap_5.00r69_SSE3.exe</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
</app_version>
<app_version>
<app_name>astropulse</app_name>
<version_num>500</version_num>
<file_ref>
<file_name>ap_5.00r69_SSE3.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>ap_4.37_SSE3_EPF.exe</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
</app_version>
</app_info>

Did I need to edit or delete my app_info.sam?

Oz
Member of the 20 Year Club



ID: 836247 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 836250 - Posted: 1 Dec 2008, 20:30:32 UTC - in response to Message 836243.  
Last modified: 1 Dec 2008, 20:31:11 UTC

I am possibly too dumb to play with computers = why did Task ID 1062735426 run 38+ hours, exit with a "valid" but give me 0.00 credits (and no joy)


Link:

http://setiathome.berkeley.edu/result.php?resultid=1062735426



System (3255203) runs AK_v8_win_SSE3.exe with out issues

Thanx,
Oz

Because of the transition to 5.00 the tasks done after I believe the 20th are having problems validating. It was posted that people still using the opti app that completed their tasks before the deadline would get credit for the task. It may take a while and probably have to be done by script, but since this task seems to fall into this category, in the end you shoudl get credit.
ID: 836250 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 836251 - Posted: 1 Dec 2008, 20:44:52 UTC - in response to Message 836250.  

Thanks Byron,

I just let them run then (and keep an eye out)

The laundry list of things in the task details had me worried, i.e.:

Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
###Restarted at _some_ percent.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period

and

In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
### App was restarted.

when I know that it's a non graphical app, and

Error reading from foldfile: wanted 262144 bytes, but read only 0 bytes.
Short fold buffer didn't fill up (lol=0, size=65536).
Long fold buffer didn't fill up (lol=0, size=262144).
No heartbeat from core client for 30 sec - exiting

don't know what THAT means at all...

And I'm running Ver. 5 (enhanced app) but I see instead:

Skipping: /fraction_done_update_period
AstroPulse v. 4.35
Non-graphics FFTW USE_CONVERSION_OPT SPLIT_COMPLEX USE_SSE3
Windows x86 rev 24 build 54 by Raistmer with support of Lunatics.kwsn.net team. SSE3


THAT'S why I thought I did something wrong...

Oz
Member of the 20 Year Club



ID: 836251 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 836256 - Posted: 1 Dec 2008, 21:04:53 UTC - in response to Message 836251.  

Oz

No, there were no problems in the task. About the only way I know to stop the first two from happening is to leave the Boinc Manager on longer. The more you let it run till the end of the task, the shorter that is. The third one is the app basicly not getting any data for 30 seconds and it's reacting to that. It's a common occurance for AMD pc's and the opti app seems to keep it from causing an actual error. The last simply is what allowed you to run the 4.36 task.
ID: 836256 · Report as offensive
Profile Leaps-from-Shadows
Volunteer tester
Avatar

Send message
Joined: 11 Aug 08
Posts: 323
Credit: 259,220
RAC: 0
United States
Message 836286 - Posted: 1 Dec 2008, 22:19:43 UTC
Last modified: 1 Dec 2008, 22:22:01 UTC

Oz-

It appears that the task was started with the ap_4.35rev24b54_SSE3 app and reported about 23 hours ago. If you switched to the new package in the middle of crunching this work unit, it most likely finished with the ap_4.37_SSE3_EPF app.

As I understand it, the 'heartbeat' is communication between the BOINC client and the Astropulse or Multibeam app. If BOINC loses communication with the app for 30 seconds, it causes that error.

And you would get less 'Unrecognized XML' errors if you updated to the newest BOINC client version.
Cruiser
Gateway GT5692 L-f-S Edition
-Phenom X4 9650 CPU
-4GB 667MHz DDR2 RAM
-500GB SATA HD
-Vista x64 SP1
-BOINC 6.2.19 32-bit client
-SSE3 optimized 32-bit apps
ID: 836286 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 836291 - Posted: 1 Dec 2008, 22:30:29 UTC - in response to Message 836251.  

Thanks Byron,

I just let them run then (and keep an eye out)

The laundry list of things in the task details had me worried, i.e.:

Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
###Restarted at _some_ percent.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period


The "Unrecognized XML..." and "Skipping..." stuff is simply because newer core clients pass slightly different information to client applications, and the XML parser was called with a "verbose" flag set. Note that it is complaining about extra information, so it cannot be anything the application needs.

and

In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
### App was restarted.

when I know that it's a non graphical app, and

Error reading from foldfile: wanted 262144 bytes, but read only 0 bytes.
Short fold buffer didn't fill up (lol=0, size=65536).
Long fold buffer didn't fill up (lol=0, size=262144).
No heartbeat from core client for 30 sec - exiting

don't know what THAT means at all...


The foldfile is some information saved so a restart is possible in the middle of some processing, and the fold buffers are related to the same. It's not a fatal error in any sense, the app just has to go back a little further to do a clean restart, maximum repeat is 1/14028 of the total crunch.

"No heartbeat..." indicates that the BOINC core client didn't update a shared memory field in the last 30 seconds, apps are supposed to quit if they get that indication that BOINC isn't running. It can easily be caused by some unrelated high priority task and doesn't signify anything wrong unless you get a lot of those, more than one per hour would make me uncomfortable enough to investigate. With BOINC actually still running, it just restarts the application and there's very little impact.

And I'm running Ver. 5 (enhanced app) but I see instead:

Skipping: /fraction_done_update_period
AstroPulse v. 4.35
Non-graphics FFTW USE_CONVERSION_OPT SPLIT_COMPLEX USE_SSE3
Windows x86 rev 24 build 54 by Raistmer with support of Lunatics.kwsn.net team. SSE3


THAT'S why I thought I did something wrong...

Oz

If Task Manager shows the correct version of the app running on AP tasks, you're doing fine. The application identification information is only added to stderr.txt when starting at the beginning of a WU. Assuming you did the upgrade to the combined package after that WU was started, the 4.37 AP app would have been used to complete the WU but wouldn't have put its identification in. If you have other AP WUs downloaded before the upgrade, they will also be crunched with 4.37, though starting from the beginning will show the expected identification in stderr. New AP WUs downloaded since the upgrade will be crunched with 5.00.

The complexity of all this is unfortunate, the project's transition plan didn't work out as they expected and we've done the best we can to adapt. Things should settle down to a large extent fairly soon.
                                                                Joe
ID: 836291 · Report as offensive
Profile Dave
Volunteer tester

Send message
Joined: 7 Jun 08
Posts: 2
Credit: 427,602
RAC: 0
United States
Message 836830 - Posted: 4 Dec 2008, 1:28:48 UTC

I've noticed in some of my recent astropulse results, that after the wingman returns their results, a few seconds later a third work unit is created and sent out, however we are both given credit. Here is one of them http://setiathome.berkeley.edu/workunit.php?wuid=369847524 I use an optimized version, and it looks like the other cruncher didn't. I don't know if thats an issue, but I wouldn't think so. It just seems like a waste of resources to send out another AP workunit for something that has been validated. Is this normal?
ID: 836830 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 836852 - Posted: 4 Dec 2008, 3:02:49 UTC - in response to Message 836830.  

I've noticed in some of my recent astropulse results, that after the wingman returns their results, a few seconds later a third work unit is created and sent out, however we are both given credit. Here is one of them http://setiathome.berkeley.edu/workunit.php?wuid=369847524 I use an optimized version, and it looks like the other cruncher didn't. I don't know if thats an issue, but I wouldn't think so. It just seems like a waste of resources to send out another AP workunit for something that has been validated. Is this normal?

It has NOT been validated, no canonical result has been chosen. The AP validator unfortunately makes results look valid before they should be.

Your optimized version is out of date, please update to version 5.00, see the Optimised AP v5.00 - initial release thread. You may get credit for that WU if Eric runs a script which grants credit indiscriminately, but you shouldn't count on his doing so indefinitely.
                                                                Joe
ID: 836852 · Report as offensive
Profile Dave
Volunteer tester

Send message
Joined: 7 Jun 08
Posts: 2
Credit: 427,602
RAC: 0
United States
Message 836858 - Posted: 4 Dec 2008, 3:38:38 UTC - in response to Message 836852.  

I just recently updated to 5.00, the WU I was asking about was one of the last ones to run on the old version. I haven't had anyone return the other half of my new results yet, so I couldn't tell if the update would take care of it not being sent out to a third person. Thanks
ID: 836858 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 837035 - Posted: 4 Dec 2008, 21:43:31 UTC - in response to Message 836858.  

Please continue this discussion in the New Astropulse errors reporting thread.

Thanks


ID: 837035 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14

Message boards : Number crunching : AstroPulse errors - Reporting


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.