AstroPulse errors - Reporting

Message boards : Number crunching : AstroPulse errors - Reporting
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 794596 - Posted: 8 Aug 2008, 13:32:35 UTC - in response to Message 794564.  

I noticed that Astropulse was downloaded and started automatically at the begining of the week with one work unit that should be taking approx 111hrs.

However, after a few days we have CPU time of 94.5 hrs but still 60 hrs to go!!!! Now I'm no rainman but my math tells me something just don't add up here... 111 - 94.5 should be around... say... oh... let me see now... 16.5... which is way, way different to the 60 hrs still to go.

At this rate not only will I never make the report deadline and therefore not get any credit, I just don't think this will ever end...

Anybody else having this problem or has any idea what is going on and why??????

This post on the Beta site, by Wendy (web03) should give you some idea of AP on P4 HT 2.8.
AP 4.35 - Taking forever
ID: 794596 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 794710 - Posted: 8 Aug 2008, 20:06:43 UTC - in response to Message 794558.  

Well, I've got to disable AstroPulse on my main box. So far, every AstroPulse WU has failed out after 2-8 hours of work. (See 946111958, 945705757, 943747957 for examples.) ...

I've seen you use WinXPx64 on your host. Is your 32 bit environment on that host o.k., as AstroPulse is a 32bit application ? Do you have a way to check your hosts 32bit capabilities ?

Lowring the OC is in such cases a good idea, but if the errors are still happening not the reason. You could always run memtest86+ for a few hours, if you think the RAM is the problem.

Sorry, if i'm guessing in the wrong directions, its just a start.
_\|/_
U r s
ID: 794710 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 794747 - Posted: 8 Aug 2008, 22:00:25 UTC - in response to Message 794710.  
Last modified: 8 Aug 2008, 22:02:00 UTC

Well, I've got to disable AstroPulse on my main box. So far, every AstroPulse WU has failed out after 2-8 hours of work. (See 946111958, 945705757, 943747957 for examples.) ...

I've seen you use WinXPx64 on your host. Is your 32 bit environment on that host o.k., as AstroPulse is a 32bit application ? Do you have a way to check your hosts 32bit capabilities ?

Hmm. Good thought. I'll have to look into that. I just built this machine end of May, so it's pretty much been crunching as long as it's existed. Brand new fresh O/S install at that time, but you never can be too sure about corruption sneaking in. That's why I'm asking for ideas. ;-)

Lowring the OC is in such cases a good idea, but if the errors are still happening not the reason. You could always run memtest86+ for a few hours, if you think the RAM is the problem.

Yeah, when I first built it, and when I got it to the overclock it's been running at, it was happy with the testing I did then... But, it's going to test again this weekend if I see another bad AP WU at the slightly lowered OC.

Sorry, if i'm guessing in the wrong directions, its just a start.

No need to worry. I'm looking for suggestions and you've made a couple for me to investigate. Thanks Urs.
ID: 794747 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 794949 - Posted: 9 Aug 2008, 6:38:07 UTC
Last modified: 9 Aug 2008, 6:59:09 UTC

Here's my contribution to the errors list. This happened on all four AP units I've received so far.

8/8/2008 11:30:01 PM|SETI@home|Starting ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0
8/8/2008 11:30:01 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:02 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:02 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:03 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:04 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:04 PM|SETI@home|Computation for task ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0 finished
8/8/2008 11:30:04 PM|SETI@home|Output file ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0_0 for task ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0 absent

This is an exit code 185 as discussed by Richard earlier, but all three of the files are installed.

/edit/

The main app shows a size of 280kb, rather than 452 as shown on the file list. I downloaded it three times just to make sure and each time, it's 280kb. The other two file sizes match up correctly.
ID: 794949 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 794950 - Posted: 9 Aug 2008, 6:43:46 UTC
Last modified: 9 Aug 2008, 6:47:04 UTC

Unable to run standalone tests so far with in.dat & fftw dll present, and with or without init_data.xml. E8400 (Wolfdale) on XP w/SP3. Error is 'C++ Out of Memory exception' on Stock, custom and experimental PGO optimised builds.

Awaiting Boinc 'live' runs [with stock] to see if the same happens.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 794950 · Report as offensive
Dave
Volunteer tester
Avatar

Send message
Joined: 3 Aug 08
Posts: 16
Credit: 622,564
RAC: 0
United States
Message 794956 - Posted: 9 Aug 2008, 7:04:46 UTC - in response to Message 794564.  

I crunched my first Astropulse this week which gave an initial estimate of 94 hours. As the week went on the hours to complete kept estimating more and more hours. At about 60 completed hours the estimate put it at about 140 hours. When I checked an hour later the project was complete for a massive credit. Still waiting on a final credit though. If your works like mine, it may be done sooner then you think. I am running Vista on a 2007 HP laptop with AMD Turion 64X2 processor.

I noticed that Astropulse was downloaded and started automatically at the begining of the week with one work unit that should be taking approx 111hrs.

However, after a few days we have CPU time of 94.5 hrs but still 60 hrs to go!!!! Now I'm no rainman but my math tells me something just don't add up here... 111 - 94.5 should be around... say... oh... let me see now... 16.5... which is way, way different to the 60 hrs still to go.

At this rate not only will I never make the report deadline and therefore not get any credit, I just don't think this will ever end...

Anybody else having this problem or has any idea what is going on and why??????

ID: 794956 · Report as offensive
chassell

Send message
Joined: 23 Jul 00
Posts: 3
Credit: 3,285,099
RAC: 0
United States
Message 794967 - Posted: 9 Aug 2008, 7:33:18 UTC - in response to Message 794949.  

Here's my contribution to the errors list. This happened on all four AP units I've received so far.

8/8/2008 11:30:01 PM|SETI@home|Starting ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0
8/8/2008 11:30:01 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:02 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:02 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:03 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:04 PM|SETI@home|[error] Process creation failed:
8/8/2008 11:30:04 PM|SETI@home|Computation for task ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0 finished
8/8/2008 11:30:04 PM|SETI@home|Output file ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0_0 for task ap_01mr08ag_B0_P1_00010_20080808_07152.wu_0 absent

This is an exit code 185 as discussed by Richard earlier, but all three of the files are installed.

/edit/

The main app shows a size of 280kb, rather than 452 as shown on the file list. I downloaded it three times just to make sure and each time, it's 280kb. The other two file sizes match up correctly.


The file size is the key to your problem. I had the same issue when I compared the files sizes to those Richard listed. I actually found the files with the correct size somewhere in the Astropulse boards. I am now crunching my first astropulse wu.
ID: 794967 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 794988 - Posted: 9 Aug 2008, 8:55:39 UTC - in response to Message 794967.  

The file size is the key to your problem. I had the same issue when I compared the files sizes to those Richard listed. I actually found the files with the correct size somewhere in the Astropulse boards. I am now crunching my first astropulse wu.


I found the files here. http://setiathome.berkeley.edu/forum_thread.php?id=48327

I looked at the files size in my folder, (luckily before the AP WU started), with it showing up as only being 342KB in size. Also took a couple of times downloading it before I could get the file from 342 to the correct size, but it's at 452KB now.
ID: 794988 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 794990 - Posted: 9 Aug 2008, 8:59:20 UTC - in response to Message 794988.  

The file size is the key to your problem. I had the same issue when I compared the files sizes to those Richard listed. I actually found the files with the correct size somewhere in the Astropulse boards. I am now crunching my first astropulse wu.


I found the files here. http://setiathome.berkeley.edu/forum_thread.php?id=48327

I looked at the files size in my folder, (luckily before the AP WU started), with it showing up as only being 342KB in size. Also took a couple of times downloading it before I could get the file from 342 to the correct size, but it's at 452KB now.



That was the problem. Amazed that I didn't catch it sooner. Interestingly, it took four tries to get a "full" download of the file. All is well in Astroland now.....
ID: 794990 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 794992 - Posted: 9 Aug 2008, 9:00:24 UTC - in response to Message 794950.  
Last modified: 9 Aug 2008, 9:01:01 UTC

Unable to run standalone tests so far with in.dat & fftw dll present, and with or without init_data.xml. E8400 (Wolfdale) on XP w/SP3. Error is 'C++ Out of Memory exception' on Stock, custom and experimental PGO optimised builds.

Awaiting Boinc 'live' runs [with stock] to see if the same happens.


Update: Stock AP build 4.35 will not run standalone, Custom ICC PGO optimised build will with some appropriate pressure applied to the right areas.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 794992 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 795189 - Posted: 9 Aug 2008, 18:02:11 UTC - in response to Message 794956.  

I crunched my first Astropulse this week which gave an initial estimate of 94 hours. As the week went on the hours to complete kept estimating more and more hours. At about 60 completed hours the estimate put it at about 140 hours. When I checked an hour later the project was complete for a massive credit.



That's because you hit the over-flow limit on the WU:
Found 30 single pulses and 30 repeating pulses, exiting.
called boinc_finish

If it hadn't over-flowed, it probably would have taken the full amount of time.

-Dave
ID: 795189 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 795199 - Posted: 9 Aug 2008, 18:24:39 UTC

Msg from Eric, on beta site.
Josh aware of and is working on this problem.

They have all finished early with:
Found 30 single pulses and 30 repeating pulses, exiting.
called boinc_finish
But there is no error called, i.e. the exit status is 0 (0x0)

And therefore on those with more than one task returned there appears to be no units Validated.

ID: 795199 · Report as offensive
Dave
Volunteer tester
Avatar

Send message
Joined: 3 Aug 08
Posts: 16
Credit: 622,564
RAC: 0
United States
Message 795417 - Posted: 10 Aug 2008, 2:06:25 UTC - in response to Message 795189.  

I crunched my first Astropulse this week which gave an initial estimate of 94 hours. As the week went on the hours to complete kept estimating more and more hours. At about 60 completed hours the estimate put it at about 140 hours. When I checked an hour later the project was complete for a massive credit.



That's because you hit the over-flow limit on the WU:
Found 30 single pulses and 30 repeating pulses, exiting.
called boinc_finish

If it hadn't over-flowed, it probably would have taken the full amount of time.

-Dave


Is this over-flow a problem with my machine or an Astropulse issue?
I have dual processors only running at 50% to control overheating and have not OC'd.

Dave
ID: 795417 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 795437 - Posted: 10 Aug 2008, 2:46:13 UTC - in response to Message 795417.  



Is this over-flow a problem with my machine or an Astropulse issue?
I have dual processors only running at 50% to control overheating and have not OC'd.

Dave


If Astropulse is like Multibeam, which it probably is to some extent, then it's a normal thing and some small percentage of WUs will just be too noisy. I think I remember something in the 1 - 2 percent range for Multibeam, though my mind couid just be making that memory up...

-Dave
ID: 795437 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 795449 - Posted: 10 Aug 2008, 3:18:55 UTC - in response to Message 795417.  

Dave

This is intersting, yes AP is processor intensive.

Josh has he word on the overflow condition for pulses. As near as I can figure it may be a splitter issue.


I crunched my first Astropulse this week which gave an initial estimate of 94 hours. As the week went on the hours to complete kept estimating more and more hours. At about 60 completed hours the estimate put it at about 140 hours. When I checked an hour later the project was complete for a massive credit.



That's because you hit the over-flow limit on the WU:
Found 30 single pulses and 30 repeating pulses, exiting.
called boinc_finish

If it hadn't over-flowed, it probably would have taken the full amount of time.

-Dave


Is this over-flow a problem with my machine or an Astropulse issue?
I have dual processors only running at 50% to control overheating and have not OC'd.

Dave


Please consider a Donation to the Seti Project.

ID: 795449 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 795592 - Posted: 10 Aug 2008, 9:15:52 UTC - in response to Message 795417.  

Is this over-flow a problem with my machine or an Astropulse issue?
I have dual processors only running at 50% to control overheating and have not OC'd.

Dave

So, to answer the question, it is an Astropulse issue - I wouldn't even call it a problem, as it's part of the basic design of Astropulse. It's just being called into use more often than we might have expected, that's all. As Pappa says, a bit of fine-tuning of the work supply is all that's needed.

Not a problem with your computer at all.
ID: 795592 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 795610 - Posted: 10 Aug 2008, 10:37:45 UTC - in response to Message 794992.  

Unable to run standalone tests so far with in.dat & fftw dll present, and with or without init_data.xml. E8400 (Wolfdale) on XP w/SP3. Error is 'C++ Out of Memory exception' on Stock, custom and experimental PGO optimised builds.

Awaiting Boinc 'live' runs [with stock] to see if the same happens.


Update: Stock AP build 4.35 will not run standalone, Custom ICC PGO optimised build will with some appropriate pressure applied to the right areas.


Hopefully have finally figured this one out :D, it seems the client is sensitive to data corruption of the WUs during transmission. Hopefully *shouldn't* be an issue with normal Boinc driven runs, but for future reference, at this time the symptoms seem to include horrible violent death of the client, claiming the machine is out of memory and a crash dump (instead of a "Bad Workunit" message or some such).

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 795610 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 795624 - Posted: 10 Aug 2008, 11:17:41 UTC - in response to Message 795610.  
Last modified: 10 Aug 2008, 11:19:44 UTC

Unable to run standalone tests so far with in.dat & fftw dll present, and with or without init_data.xml. E8400 (Wolfdale) on XP w/SP3. Error is 'C++ Out of Memory exception' on Stock, custom and experimental PGO optimised builds.

Awaiting Boinc 'live' runs [with stock] to see if the same happens.


Update: Stock AP build 4.35 will not run standalone, Custom ICC PGO optimised build will with some appropriate pressure applied to the right areas.


Hopefully have finally figured this one out :D, it seems the client is sensitive to data corruption of the WUs during transmission. Hopefully *shouldn't* be an issue with normal Boinc driven runs, but for future reference, at this time the symptoms seem to include horrible violent death of the client, claiming the machine is out of memory and a crash dump (instead of a "Bad Workunit" message or some such).

Maybe but I did discover a whole batch, as reported on beta for quicker response, that all give the overflow message. It was to this message that Eric replied that Josh was on the case. Since then others have been reported, but I do not know if they are part of batch(s).
If the overflow msg affects batches, it would indicate a common problem, RFI, faulty splitter, or if a transmission problem then it would have to be in equipment at the Berkeley end before connection to the web.

edi] that original was block of 104 AP units. [/edit
ID: 795624 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 795628 - Posted: 10 Aug 2008, 11:33:35 UTC - in response to Message 795624.  

Maybe but I did discover a whole batch, as reported on beta for quicker response, that all give the overflow message. It was to this message that Eric replied that Josh was on the case. Since then others have been reported, but I do not know if they are part of batch(s).
If the overflow msg affects batches, it would indicate a common problem, RFI, faulty splitter, or if a transmission problem then it would have to be in equipment at the Berkeley end before connection to the web.

edi] that original was block of 104 AP units. [/edit


Yep, unrelated issues as far as I can tell, as Boinc should be handling transmission integrity I would have thought. No misunderstanding there, just a potential pitfall in certain special situation that most people aren't likely to come across (except developers).

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 795628 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 795753 - Posted: 10 Aug 2008, 16:11:11 UTC - in response to Message 794992.  
Last modified: 10 Aug 2008, 16:12:13 UTC

Jason

I am not sure if you figured out how to get AP to run in the standalone mode.
At a point in time we did some standalone tests and I created a basic set of instructions. This was amplified on a web page that I need to get Josh to resurect.

In your test folder
Rename your workunit to in.dat 
For Standalone Testing he added information about two switches for "piping" more information to the stderr.txt file

When you start the ap_client.exe you can start it with:

ap_client -debug_msg
ap_client -debug_loop_msg
ap_client -debug_msg -debug_loop_msg


Basic standalone AP it also has the debug switches that may still be fully enabled.

I just tested with the stock AP 4.35 and it is running with the debug switches



Unable to run standalone tests so far with in.dat & fftw dll present, and with or without init_data.xml. E8400 (Wolfdale) on XP w/SP3. Error is 'C++ Out of Memory exception' on Stock, custom and experimental PGO optimised builds.

Awaiting Boinc 'live' runs [with stock] to see if the same happens.


Update: Stock AP build 4.35 will not run standalone, Custom ICC PGO optimised build will with some appropriate pressure applied to the right areas.

Please consider a Donation to the Seti Project.

ID: 795753 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 14 · Next

Message boards : Number crunching : AstroPulse errors - Reporting


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.