Work Unit problem

Message boards : Number crunching : Work Unit problem
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 619583 - Posted: 15 Aug 2007, 14:51:52 UTC

Should I abort this work unit nearly two hours work and only 0.05% done.
ID: 619583 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 619586 - Posted: 15 Aug 2007, 14:55:17 UTC


see here . . .

seriously . . . NOT bEin' a smart-alik eithEr . . .

ID: 619586 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619587 - Posted: 15 Aug 2007, 14:55:39 UTC - in response to Message 619583.  

Should I abort this work unit nearly two hours work and only 0.05% done.

No. Stop BOINC, and then start BOINC. Report what happens.
ID: 619587 · Report as offensive
Ebola Bob
Avatar

Send message
Joined: 5 Jun 05
Posts: 23
Credit: 79,925
RAC: 0
United Kingdom
Message 619592 - Posted: 15 Aug 2007, 15:06:14 UTC - in response to Message 619586.  


see here . . .

seriously . . . NOT bEin' a smart-alik eithEr . . .



Dude, is your Shift key a bit temperamental?
Click below to see my werewolf

ID: 619592 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 619595 - Posted: 15 Aug 2007, 15:07:20 UTC

I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it.

The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck.

Have never had this problem both here and on Beta, that I can remember.

Will be interesting to see what the other pc's on it make of it.Probably a -9...
ID: 619595 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 619596 - Posted: 15 Aug 2007, 15:11:32 UTC - in response to Message 619592.  


see here . . .

seriously . . . NOT bEin' a smart-alik eithEr . . .



Dude, is your Shift key a bit temperamental?


nOpE - it's IntEntiOnal SIR! ;>


BOINC Wiki . . .

Science Status Page . . .
ID: 619596 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619606 - Posted: 15 Aug 2007, 15:27:11 UTC - in response to Message 619595.  

I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it.

The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck.

Have never had this problem both here and on Beta, that I can remember.

Will be interesting to see what the other pc's on it make of it.Probably a -9...

Funnily enough, I'm currently working on result 590854298, which is doing the same thing.

I was running stock 5.27 at the time, and noticed it 'stuck' at 0.002% after 15 minutes or so. I assumed it was just a badly-behaved -9, as usual, and after advising madmac to restart BOINC, I followed my own advice. That didn't clear it, so I re-installed the opti app - 2.2B in this case, because it's on my Vista box - and I'm letting it run for a while to see what happens. Just clicked up to 0.009% after almost 20 minutes - that's slower than Astropulse, almost up to CPDN speed!

I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer.
ID: 619606 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 619608 - Posted: 15 Aug 2007, 15:33:28 UTC - in response to Message 619606.  

... that's slower than Astropulse, almost up to CPDN speed!

I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer.


Ha! Astropulse did come to mind when I looked at the progress...Yes it might be a good idea to get some input from Beta and/or Joe...
ID: 619608 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619610 - Posted: 15 Aug 2007, 15:47:53 UTC - in response to Message 619608.  
Last modified: 15 Aug 2007, 15:50:13 UTC

... that's slower than Astropulse, almost up to CPDN speed!

I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer.


Ha! Astropulse did come to mind when I looked at the progress...Yes it might be a good idea to get some input from Beta and/or Joe...

Posted and PM'd.

Now at 0.017% after 41 minutes - all other WUs running at normal speed, task manager shows 100% on all cores.
ID: 619610 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 619611 - Posted: 15 Aug 2007, 15:52:58 UTC - in response to Message 619606.  

I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it.

The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck.

Have never had this problem both here and on Beta, that I can remember.

Will be interesting to see what the other pc's on it make of it.Probably a -9...

Funnily enough, I'm currently working on result 590854298, which is doing the same thing.

I was running stock 5.27 at the time, and noticed it 'stuck' at 0.002% after 15 minutes or so. I assumed it was just a badly-behaved -9, as usual, and after advising madmac to restart BOINC, I followed my own advice. That didn't clear it, so I re-installed the opti app - 2.2B in this case, because it's on my Vista box - and I'm letting it run for a while to see what happens. Just clicked up to 0.009% after almost 20 minutes - that's slower than Astropulse, almost up to CPDN speed!

I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer.

There was something similar when setiathome_enhanced was new here, turned out to be the splitter putting bad values in the workunit header. You might comapare the header portions of a good result file to the header in the slow result. I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.
                                                             Joe
ID: 619611 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 619614 - Posted: 15 Aug 2007, 15:59:20 UTC - in response to Message 619587.  

Should I abort this work unit nearly two hours work and only 0.05% done.

No. Stop BOINC, and then start BOINC. Report what happens.

I did thats why I got 0.05%, I left it to have something to eat and a half an hour later it had uploaded so it looks like the percentage got stuck.
ID: 619614 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619617 - Posted: 15 Aug 2007, 16:00:31 UTC - in response to Message 619611.  

I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.
                                                             Joe

Done - you should have mail.
ID: 619617 · Report as offensive
Mr. Onderdijk, Bert

Send message
Joined: 9 Aug 99
Posts: 1
Credit: 444,483
RAC: 0
Netherlands
Message 619623 - Posted: 15 Aug 2007, 16:15:06 UTC - in response to Message 619617.  

Got the same problem. My PC takes two hours to reach 0.020%. Restarting Boinc does let it restart from 0.000 again.
ID: 619623 · Report as offensive
Profile sic

Send message
Joined: 16 Dec 06
Posts: 1
Credit: 140,480
RAC: 0
Hungary
Message 619625 - Posted: 15 Aug 2007, 16:17:28 UTC

Fine... here a another WU with problem.
http://setiathome.berkeley.edu//result.php?resultid=590581942
Workunit has done, just before I clicked to abort.

(Sry, my english is little)
ID: 619625 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 619629 - Posted: 15 Aug 2007, 16:25:38 UTC - in response to Message 619617.  
Last modified: 15 Aug 2007, 16:32:03 UTC

I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.
                                                             Joe

Done - you should have mail.

Got it, and the problem is:

<triplet_thresh>-2.06835318</triplet_thresh>

Because all data points are positive, that puts the Triplet finding into a very slow exhaustive comparison. But since there are no points below threshold it can't possibly find a triplet, hence it goes on doing that on each successive array. It's simply a situation which wasn't expected when triplet finding was designed.

I did finally get a WU from the 04mr07ab set. It also had a low triplet threshold but not negative, so it simply overflowed very quickly.

The pattern looks exactly like what was seen in May 2006, but Murphy probably will make the cause different.
                                                                 Joe


Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler.
ID: 619629 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 619631 - Posted: 15 Aug 2007, 16:28:56 UTC



Grid software is supposed to only farm work out to nodes that meet the proper software requirements . . .
ID: 619631 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619634 - Posted: 15 Aug 2007, 16:35:07 UTC - in response to Message 619629.  

I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.
                                                             Joe

Done - you should have mail.

Got it, and the problem is:

<triplet_thresh>-2.06835318</triplet_thresh>

Because all data points are positive, that puts the Triplet finding into a very slow exhaustive comparison. But since there are no points below threshold it can't possibly find a triplet, hence it goes on doing that on each successive array. It's simply a situation which wasn't expected when triplet finding was designed.

I did finally get a WU from the 04mr07ab set. It also had a low triplet threshold but not negative, so it simply overflowed very quickly.

The pattern looks exactly like what was seen in May 2006, but Murphy probably will make the cause different.
                                                                 Joe

I have another from the same batch (04mr07ab.32128.5798.15.4.43) queued up next to run, and it also has
<triplet_thresh>-2.06835318</triplet_thresh>
. Your advice please? Is there anything to be gained from running these, or should we just abort and move on?

And can we rely on your good offices to alert whoever's manning the splitters in Eric's absence, to get them to put a sanity check on the threshold?
ID: 619634 · Report as offensive
Profile bounty.hunter
Volunteer tester
Avatar

Send message
Joined: 22 Mar 04
Posts: 442
Credit: 459,063
RAC: 0
India
Message 619637 - Posted: 15 Aug 2007, 16:37:31 UTC - in response to Message 619629.  

Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler.


Thanks Joe, That is a good suggestion.
I would have probably aborted any others too, if you hadn't mentioned this...Ha!


ID: 619637 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 619638 - Posted: 15 Aug 2007, 16:38:06 UTC - in response to Message 619629.  

Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler.

Good thinking. Suspended it is.
ID: 619638 · Report as offensive
Profile Browny
Volunteer tester
Avatar

Send message
Joined: 22 Sep 99
Posts: 33
Credit: 814,772
RAC: 0
Netherlands
Message 619644 - Posted: 15 Aug 2007, 16:53:45 UTC

Hi All,

I have the same problem overhere, a resyart from Bionc didn't help, in the message tab I can read the following and that is pretty clear but what can I do about it, let it go or something else????

15-8-2007 18:46:35|SETI@home|Restarting task 04mr07ab.32128.6616.15.4.75_1 using setiathome_enhanced version 527
15-8-2007 18:47:12|SETI@home|app reporting negative CPU: -0.281250

Regards, Browny
ID: 619644 · Report as offensive
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Work Unit problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.