AstroPulse errors - Reporting

Message boards : Number crunching : AstroPulse errors - Reporting
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 796122 - Posted: 11 Aug 2008, 1:28:16 UTC
Last modified: 11 Aug 2008, 1:32:48 UTC

Thanks Pappa,
Managed to get them running OK last night, I had problems with a corrupted wu file. The debug message switches will help. I'll look at that when I get home from school.

Cheers
Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 796122 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 796395 - Posted: 11 Aug 2008, 18:20:41 UTC

Just found out this morning that my AP app was not completely downloaded and lost an AP unit. Got the full app downloaded now though.

ID: 796395 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 796497 - Posted: 11 Aug 2008, 21:32:47 UTC

wuid=313778905

Another case of AP begin sent to "underspec" host... in this case 850 MHz PIII.
This host received 2 AP WUs on 10 Aug (yesterday). Fortunately the host owner realizes the WUs will not complete before deadline and cancelled them.

What happened to the "Minimum CPU: 1.6 GHz" requirement ?

IMO, more work to the scheduler is still needed.
ID: 796497 · Report as offensive
Profile Adrian Taylor
Volunteer tester
Avatar

Send message
Joined: 22 Apr 01
Posts: 95
Credit: 10,933,449
RAC: 0
United Kingdom
Message 796566 - Posted: 12 Aug 2008, 0:45:55 UTC

hi there

im runing an 8-core mac pro

so far all my ap wu's have given a computing error:

<core_client_version>6.2.15</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 896
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 1024
......
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 4864
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 4992
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
SIGABRT: abort called

Crashed executable name: astropulse_4.35_i686-apple-darwin
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.5.4 build 9E17
Sun Aug 10 18:25:12 2008

atos cannot load symbols for the file astropulse_4.35_i686-apple-darwin.
0 0x0004e931 1 0x00042f12 2 0x95e1609b 3 0xffffffff 4 0x95e8eec2 5 0x95e9e47f 6 0x95006005 7 0x9500410c 8 0x9500414b 9 0x95004261 10 0x950045d8 11 0x00026fe3 12 0x00028837 13 0x0002bd55 14 0x0002cf49 15 0x000185bf 16 0x0003a9f4 17 0x00037e76 18 0x00031421 19 0x00034768 20 0x00002736
Thread 0 crashed with X86 Thread State (32-bit):
eax: 0xffffffe1 ebx: 0x95ddde62 ecx: 0xbfffd2bc edx: 0x95da94a6
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffd2f8 esp: 0xbfffd2bc
ss: 0x0000001f efl: 0x00000206 eip: 0x95da94a6 cs: 0x00000007
ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037

Binary Images Description:
0x1000 - 0xeafff /Library/Application Support/BOINC Data/slots/4/../../projects/setiathome.berkeley.edu/astropulse_4.35_i686-apple-darwin
0x92101000 - 0x92108fff /usr/lib/libgcc_s.1.dylib
0x94fbe000 - 0x9501bfff /usr/lib/libstdc++.6.dylib
0x9541f000 - 0x95423fff /usr/lib/system/libmathCommon.A.dylib
0x95da8000 - 0x95f08fff /usr/lib/libSystem.B.dylib

any ideas ? im running the ap worker from here:
http://www.dotsch.de/boinc/SETI@home%20applications.html

with a suitable app.info.xml file

cheers :-)
63. (1) (b) "music" includes sounds wholly or predominantly characterised by the emission of a succession of repetitive beats
ID: 796566 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 796589 - Posted: 12 Aug 2008, 1:47:20 UTC - in response to Message 794493.  

I would bet that they will be going back through and fixing these type of errors as there is something wrong on the AP validators.


My one AP unit that I have returned just got the big fat 0 granted, so I think they are still working on the problem.

ID: 796589 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 796591 - Posted: 12 Aug 2008, 1:57:28 UTC

Message was sent to Eric and Josh
We will see what they can sort with the validator


Please consider a Donation to the Seti Project.

ID: 796591 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 796617 - Posted: 12 Aug 2008, 3:06:51 UTC - in response to Message 796497.  

wuid=313778905

Another case of AP begin sent to "underspec" host... in this case 850 MHz PIII.
This host received 2 AP WUs on 10 Aug (yesterday). Fortunately the host owner realizes the WUs will not complete before deadline and cancelled them.

What happened to the "Minimum CPU: 1.6 GHz" requirement ?

IMO, more work to the scheduler is still needed.

There is no "Minimum CPU: 1.6 GHz" in actuality, CPU speed is factored in using the Whetstone benchmark. Josh is not very familiar with BOINC, so he doesn't really understand how much variation there is in the BOINC benchmarks. Because PIII systems run the benchmark efficiently, their speed is overestimated.

If those who abort the WUs do so after running long enough that an abort was mathematically justified, the CPU time and "... at dm_chunk_large xxxx" values in the stderr.txt are sufficient to demonstrate that the host would not have finished the work within deadline. Unfortunately, stderr.txt information isn't preserved so unless the project sets up some special monitoring they won't know how many impossible tasks the Scheduler is sending. In addition, probably many simply refuse to do work which is estimated to take a long time, there's no way to handle that better until the preference option is available.

Part of the change is to multiply the estimated wall time a host will take for AP work by 1.3 because the work is "hard". That factor is hard coded, which I consider a mistake; it should have been a project config.xml parameter which could be adjusted to really only send the hard work to hosts almost certain to complete it with ample margin. I estimate that AP work is much harder than that factor currently recognizes, and believe a healthy boost would be better.
                                                               Joe
ID: 796617 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 796632 - Posted: 12 Aug 2008, 3:52:34 UTC - in response to Message 796617.  
Last modified: 12 Aug 2008, 4:03:51 UTC

wuid=313778905

Another case of AP begin sent to "underspec" host... in this case 850 MHz PIII.
This host received 2 AP WUs on 10 Aug (yesterday). Fortunately the host owner realizes the WUs will not complete before deadline and cancelled them.

What happened to the "Minimum CPU: 1.6 GHz" requirement ?

IMO, more work to the scheduler is still needed.

There is no "Minimum CPU: 1.6 GHz" in actuality, CPU speed is factored in using the Whetstone benchmark. Josh is not very familiar with BOINC, so he doesn't really understand how much variation there is in the BOINC benchmarks. Because PIII systems run the benchmark efficiently, their speed is overestimated.
                                                               Joe

Then maybe the AP FAQ should be less cryptic.
From the Astropulse FAQ...
==========================================
What are the minimum requirements for my computer to run astropulse?

  • Minimum CPU: 1.6 GHz
  • Minimum RAM: 256 MB
  • Miniumum disk space: 128 MB


If your computer doesn't meet these requirements, our server probably won't send you Astropulse workunits. The RAM and disk space requirements are overestimates; Astropulse actually uses significantly less.
==========================================

IMO, a 1.5GHz P4 probably should not receive AP work... Definitely 850MHz PIII and less should not be receiving AP work! I have wingman running 500MHz PIII... wuid=311362368 estimated CPU time of 1200 CPU Hours (50 days) with 30 day deadline. This should be fixed!

ID: 796632 · Report as offensive
rigasrigas1980

Send message
Joined: 2 Aug 08
Posts: 6
Credit: 814
RAC: 0
Greece
Message 796687 - Posted: 12 Aug 2008, 6:21:39 UTC

hello,
don't you think that
astropulse's deadlines must be bigger?
i run a celeron 1,73 ghz,
and i run my first AP WU,
and it is so slow
ID: 796687 · Report as offensive
Profile JSabin

Send message
Joined: 20 Aug 07
Posts: 40
Credit: 978,691
RAC: 0
United States
Message 796757 - Posted: 12 Aug 2008, 11:09:39 UTC

Astropulse projects hang up on my system after spending tens of hours on them. I've had this happen three times now.

In the future, if I see them in my queue, I'll just delete them.

Anyone else having this issue?
ID: 796757 · Report as offensive
Toni

Send message
Joined: 1 Sep 99
Posts: 2
Credit: 200,988
RAC: 0
United States
Message 796815 - Posted: 12 Aug 2008, 14:52:51 UTC

My computer is running concurrently two Astropulse units. The first time I noticed them, they both reported about eight hours elapsed and 96 hours remaining (estimated completion time of 104 hours). Today both report about 90 hours elapsed and 53 hours remaining (an estimated completion time of 143 hours). Units are due September 5. I will let them finish -- assuming Zeno's paradox is not operating here. :-)

This is just annoying. I will abort any Astropulse units downloaded in the future.


Toni
ID: 796815 · Report as offensive
web03
Volunteer tester
Avatar

Send message
Joined: 13 Feb 01
Posts: 355
Credit: 719,156
RAC: 0
United States
Message 796816 - Posted: 12 Aug 2008, 15:00:29 UTC

Toni -

What you are seeing is normal. Because you have been running MB units, your DCF is set really low. AP units seem to have a decent estimate of time if your DCF is around .4 or so. It's not unusual for them to take about 140 hours or so. IMHO, this is really a good thing as most users start running AP units, it will lessen the loads on the servers.

Please reconsider as AP is doing valid research as well.
Wendy



Click Here for BOINC FAQ Service
ID: 796816 · Report as offensive
Profile JSabin

Send message
Joined: 20 Aug 07
Posts: 40
Credit: 978,691
RAC: 0
United States
Message 796849 - Posted: 12 Aug 2008, 20:45:57 UTC - in response to Message 796816.  

Toni -

What you are seeing is normal. Because you have been running MB units, your DCF is set really low. AP units seem to have a decent estimate of time if your DCF is around .4 or so. It's not unusual for them to take about 140 hours or so. IMHO, this is really a good thing as most users start running AP units, it will lessen the loads on the servers.

Please reconsider as AP is doing valid research as well.

Wendy,

The issue I see is they don't seem to complete on my system at all. They just run for many hours then seem to stop or make no progress as the clock continues to tick. While the standard work units click away on the other processor. Worse yet, because they are so much work, they also fill the queue and there are no other work units available for the second processor when it finishes up its current WUs. Right now my one machine has nothing to do and hasn't had anything for many hours.

It would be nice if we could choose to ignore them somehow. Right now I've "wasted" 100+ hours with the AP units.

~Joe
ID: 796849 · Report as offensive
James Nelson
Volunteer tester
Avatar

Send message
Joined: 23 Mar 02
Posts: 381
Credit: 4,806,382
RAC: 0
United States
Message 796857 - Posted: 12 Aug 2008, 20:56:01 UTC

got this after my first AP_WU

<core_client_version>5.2.13</core_client_version>
<message>app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>astropulse_4.35_COPYRIGHT</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

Im not fussed because one other host already sent back error and it has been sent to two others. but I thought it might be worth noting.
ID: 796857 · Report as offensive
Profile JSabin

Send message
Joined: 20 Aug 07
Posts: 40
Credit: 978,691
RAC: 0
United States
Message 796923 - Posted: 12 Aug 2008, 23:08:45 UTC

Thanks to the AP WUs, both my computers are idle, have been for hours.

Perhaps it's time to take a break from SETI and go to another project.
ID: 796923 · Report as offensive
Toni

Send message
Joined: 1 Sep 99
Posts: 2
Credit: 200,988
RAC: 0
United States
Message 796954 - Posted: 12 Aug 2008, 23:52:21 UTC - in response to Message 796816.  


What you are seeing is normal. Because you have been running MB units, your DCF is set really low. AP units seem to have a decent estimate of time if your DCF is around .4 or so. It's not unusual for them to take about 140 hours or so. IMHO, this is really a good thing as most users start running AP units, it will lessen the loads on the servers.

Please reconsider as AP is doing valid research as well.



Hi, Wendy,
I navigate the Internet okay but I'm no computer geek [no offense intended to anyone here]. What's a DCF and how do I tell what mine is? My computer is running S@H 24/7, is not overclocked or tweaked in any way; I wouldn't know how.

Lessening the load on servers is a good thing for the community as a whole, so we (dual processors and I) will soldier on. In the meantime, I'm curious about what new information is gained from Astropulse units as opposed to regular units.

Toni
ID: 796954 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 796965 - Posted: 13 Aug 2008, 0:01:49 UTC - in response to Message 796857.  

James

IN Boinc Manager I would reset the project if you have no work running that will redownload files that are missing or corrupt.


got this after my first AP_WU

<core_client_version>5.2.13</core_client_version>
<message>app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>astropulse_4.35_COPYRIGHT</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

Im not fussed because one other host already sent back error and it has been sent to two others. but I thought it might be worth noting.


Please consider a Donation to the Seti Project.

ID: 796965 · Report as offensive
web03
Volunteer tester
Avatar

Send message
Joined: 13 Feb 01
Posts: 355
Credit: 719,156
RAC: 0
United States
Message 797024 - Posted: 13 Aug 2008, 1:46:22 UTC - in response to Message 796954.  
Last modified: 13 Aug 2008, 1:46:55 UTC


What you are seeing is normal. Because you have been running MB units, your DCF is set really low. AP units seem to have a decent estimate of time if your DCF is around .4 or so. It's not unusual for them to take about 140 hours or so. IMHO, this is really a good thing as most users start running AP units, it will lessen the loads on the servers.

Please reconsider as AP is doing valid research as well.



Hi, Wendy,
I navigate the Internet okay but I'm no computer geek [no offense intended to anyone here]. What's a DCF and how do I tell what mine is? My computer is running S@H 24/7, is not overclocked or tweaked in any way; I wouldn't know how.

Lessening the load on servers is a good thing for the community as a whole, so we (dual processors and I) will soldier on. In the meantime, I'm curious about what new information is gained from Astropulse units as opposed to regular units.

Toni


Toni -

If you go to your computer summary page, you should see a line that states...

Task duration correction factor

DCF = Duration Correction Factor

In regards to Astropulse, I would look at the following link for more information. Astropulse FAQ
{edit - fixed typo}
Wendy



Click Here for BOINC FAQ Service
ID: 797024 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 797090 - Posted: 13 Aug 2008, 4:44:15 UTC - in response to Message 796566.  

hi there

im runing an 8-core mac pro

so far all my ap wu's have given a computing error:

<core_client_version>6.2.15</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 896
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 1024
......
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 4864
In ap_client_main.cpp: in mainloop(): at dm_chunk_large 4992
In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
SIGABRT: abort called

Crashed executable name: astropulse_4.35_i686-apple-darwin
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.5.4 build 9E17
Sun Aug 10 18:25:12 2008

atos cannot load symbols for the file astropulse_4.35_i686-apple-darwin.
0 0x0004e931 1 0x00042f12 2 0x95e1609b 3 0xffffffff 4 0x95e8eec2 5 0x95e9e47f 6 0x95006005 7 0x9500410c 8 0x9500414b 9 0x95004261 10 0x950045d8 11 0x00026fe3 12 0x00028837 13 0x0002bd55 14 0x0002cf49 15 0x000185bf 16 0x0003a9f4 17 0x00037e76 18 0x00031421 19 0x00034768 20 0x00002736
Thread 0 crashed with X86 Thread State (32-bit):
eax: 0xffffffe1 ebx: 0x95ddde62 ecx: 0xbfffd2bc edx: 0x95da94a6
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffd2f8 esp: 0xbfffd2bc
ss: 0x0000001f efl: 0x00000206 eip: 0x95da94a6 cs: 0x00000007
ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037

Binary Images Description:
0x1000 - 0xeafff /Library/Application Support/BOINC Data/slots/4/../../projects/setiathome.berkeley.edu/astropulse_4.35_i686-apple-darwin
0x92101000 - 0x92108fff /usr/lib/libgcc_s.1.dylib
0x94fbe000 - 0x9501bfff /usr/lib/libstdc++.6.dylib
0x9541f000 - 0x95423fff /usr/lib/system/libmathCommon.A.dylib
0x95da8000 - 0x95f08fff /usr/lib/libSystem.B.dylib

any ideas ? im running the ap worker from here:
http://www.dotsch.de/boinc/SETI@home%20applications.html

with a suitable app.info.xml file

cheers :-)


In ap_gfx_main.cpp: in ap_graphics_init(): Starting client.
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
SIGABRT: abort called

Crashed executable name: astropulse_4.35_i686-apple-darwin
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.5.4 build 9E17
Tue Aug 12 19:28:22 2008

sh: /usr/bin/atos: No such file or directory
0 0x0004e931 1 0x00042f12 2 0x93e1e09b 3 0xffffffff 4 0x93e96ec2 5 0x93ea647f 6 0x918ad005 7 0x918ab10c 8 0x918ab14b 9 0x918ab261 10 0x918ab5d8 11 0x00026fe3 12 0x00028837 13 0x0002bd55 14 0x0002cf49 15 0x000185bf 16 0x0003a9f4 17 0x00037e76 18 0x00031421 19 0x00034768 20 0x00002736
Thread 0 crashed with X86 Thread State (32-bit):
eax: 0xffffffe1 ebx: 0x93de5e62 ecx: 0xbfffd28c edx: 0x93db14a6
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffd2c8 esp: 0xbfffd28c
ss: 0x0000001f efl: 0x00000206 eip: 0x93db14a6 cs: 0x00000007
ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037

Binary Images Description:
0x1000 - 0xeafff /Library/Application Support/BOINC Data/slots/1/../../projects/setiathome.berkeley.edu/astropulse_4.35_i686-apple-darwin
0x91865000 - 0x918c2fff /usr/lib/libstdc++.6.dylib
0x92243000 - 0x92247fff /usr/lib/system/libmathCommon.A.dylib
0x93db0000 - 0x93f10fff /usr/lib/libSystem.B.dylib
0x948ab000 - 0x948b2fff /usr/lib/libgcc_s.1.dylib


Exiting...

Got the same error as Adrian on my iMac, this happened after crunching on the unit was stopped while it crunched a bunch of shorties in EDF mode. It crashed almost immediately after restarting the unit.

Tue Aug 12 19:28:22 2008|SETI@home|Restarting task ap_29fe08af_B4_P0_00330_20080805_09904.wu_1 using astropulse version 435
Tue Aug 12 19:28:23 2008|SETI@home|Computation for task ap_29fe08af_B4_P0_00330_20080805_09904.wu_1 finished


A good possibility that the Mac version of the AP app does not like to be paused.

Will see what happens when it start the next AP unit in a couple of days.

ID: 797090 · Report as offensive
Profile magpie2005
Avatar

Send message
Joined: 2 Dec 05
Posts: 9
Credit: 464,062
RAC: 0
United Kingdom
Message 797248 - Posted: 13 Aug 2008, 14:22:30 UTC - in response to Message 794571.  

I noticed that Astropulse was downloaded and started automatically at the begining of the week with one work unit that should be taking approx 111hrs.

However, after a few days we have CPU time of 94.5 hrs but still 60 hrs to go!!!! Now I'm no rainman but my math tells me something just don't add up here... 111 - 94.5 should be around... say... oh... let me see now... 16.5... which is way, way different to the 60 hrs still to go.

At this rate not only will I never make the report deadline and therefore not get any credit, I just don't think this will ever end...

Anybody else having this problem or has any idea what is going on and why??????


Welcome to the forums...

A quick answer to your problem is in the Astropulse FAQ:

How long does an Astropulse workunit take to run?
The run times compared to SETI@home enhanced are long (sometimes a week or more), but you should receive the same number of credits per second for astropulse as for seti@home. credits/time should be in line with those using the default enhanced MB application.


The overclocked Q6600s are doing them in 40-80 hours, so your run time is not out of the ordinary. Let it crunch, and see what happens...


Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is...



What the ................
Is that really ..........
It can't be .............
no... NO... NO... NOOOOOO
aaaaaAAAAAARRRRGGGGHHHHHHHHH
ID: 797248 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

Message boards : Number crunching : AstroPulse errors - Reporting


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.