Work Unit problem

Author	Message
[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 619583 - Posted: 15 Aug 2007, 14:51:52 UTC Should I abort this work unit nearly two hours work and only 0.05% done. ID: 619583 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 619586 - Posted: 15 Aug 2007, 14:55:17 UTC see here . . . seriously . . . NOT bEin' a smart-alik eithEr . . . ID: 619586 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619587 - Posted: 15 Aug 2007, 14:55:39 UTC - in response to Message 619583. Should I abort this work unit nearly two hours work and only 0.05% done. No. Stop BOINC, and then start BOINC. Report what happens. ID: 619587 ·

Ebola Bob Send message Joined: 5 Jun 05 Posts: 23 Credit: 79,925 RAC: 0	Message 619592 - Posted: 15 Aug 2007, 15:06:14 UTC - in response to Message 619586. see here . . . seriously . . . NOT bEin' a smart-alik eithEr . . . Dude, is your Shift key a bit temperamental? Click below to see my werewolf ID: 619592 ·

bounty.hunter Volunteer tester Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0	Message 619595 - Posted: 15 Aug 2007, 15:07:20 UTC I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck. Have never had this problem both here and on Beta, that I can remember. Will be interesting to see what the other pc's on it make of it.Probably a -9... ID: 619595 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 619596 - Posted: 15 Aug 2007, 15:11:32 UTC - in response to Message 619592. see here . . . seriously . . . NOT bEin' a smart-alik eithEr . . . Dude, is your Shift key a bit temperamental? nOpE - it's IntEntiOnal SIR! ;> BOINC Wiki . . . Science Status Page . . . ID: 619596 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619606 - Posted: 15 Aug 2007, 15:27:11 UTC - in response to Message 619595. I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck. Have never had this problem both here and on Beta, that I can remember. Will be interesting to see what the other pc's on it make of it.Probably a -9... Funnily enough, I'm currently working on result 590854298, which is doing the same thing. I was running stock 5.27 at the time, and noticed it 'stuck' at 0.002% after 15 minutes or so. I assumed it was just a badly-behaved -9, as usual, and after advising madmac to restart BOINC, I followed my own advice. That didn't clear it, so I re-installed the opti app - 2.2B in this case, because it's on my Vista box - and I'm letting it run for a while to see what happens. Just clicked up to 0.009% after almost 20 minutes - that's slower than Astropulse, almost up to CPDN speed! I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer. ID: 619606 ·

bounty.hunter Volunteer tester Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0	Message 619608 - Posted: 15 Aug 2007, 15:33:28 UTC - in response to Message 619606. ... that's slower than Astropulse, almost up to CPDN speed! I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer. Ha! Astropulse did come to mind when I looked at the progress...Yes it might be a good idea to get some input from Beta and/or Joe... ID: 619608 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619610 - Posted: 15 Aug 2007, 15:47:53 UTC - in response to Message 619608. Last modified: 15 Aug 2007, 15:50:13 UTC ... that's slower than Astropulse, almost up to CPDN speed! I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer. Ha! Astropulse did come to mind when I looked at the progress...Yes it might be a good idea to get some input from Beta and/or Joe... Posted and PM'd. Now at 0.017% after 41 minutes - all other WUs running at normal speed, task manager shows 100% on all cores. ID: 619610 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 619611 - Posted: 15 Aug 2007, 15:52:58 UTC - in response to Message 619606. I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck. Have never had this problem both here and on Beta, that I can remember. Will be interesting to see what the other pc's on it make of it.Probably a -9... Funnily enough, I'm currently working on result 590854298, which is doing the same thing. I was running stock 5.27 at the time, and noticed it 'stuck' at 0.002% after 15 minutes or so. I assumed it was just a badly-behaved -9, as usual, and after advising madmac to restart BOINC, I followed my own advice. That didn't clear it, so I re-installed the opti app - 2.2B in this case, because it's on my Vista box - and I'm letting it run for a while to see what happens. Just clicked up to 0.009% after almost 20 minutes - that's slower than Astropulse, almost up to CPDN speed! I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer. There was something similar when setiathome_enhanced was new here, turned out to be the splitter putting bad values in the workunit header. You might comapare the header portions of a good result file to the header in the slow result. I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com. Joe ID: 619611 ·

[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 619614 - Posted: 15 Aug 2007, 15:59:20 UTC - in response to Message 619587. Should I abort this work unit nearly two hours work and only 0.05% done. No. Stop BOINC, and then start BOINC. Report what happens. I did thats why I got 0.05%, I left it to have something to eat and a half an hour later it had uploaded so it looks like the percentage got stuck. ID: 619614 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619617 - Posted: 15 Aug 2007, 16:00:31 UTC - in response to Message 619611. I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com. Joe Done - you should have mail. ID: 619617 ·

Mr. Onderdijk, Bert Send message Joined: 9 Aug 99 Posts: 1 Credit: 444,483 RAC: 0	Message 619623 - Posted: 15 Aug 2007, 16:15:06 UTC - in response to Message 619617. Got the same problem. My PC takes two hours to reach 0.020%. Restarting Boinc does let it restart from 0.000 again. ID: 619623 ·

sic Send message Joined: 16 Dec 06 Posts: 1 Credit: 140,480 RAC: 0	Message 619625 - Posted: 15 Aug 2007, 16:17:28 UTC Fine... here a another WU with problem. http://setiathome.berkeley.edu//result.php?resultid=590581942 Workunit has done, just before I clicked to abort. (Sry, my english is little) ID: 619625 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 619629 - Posted: 15 Aug 2007, 16:25:38 UTC - in response to Message 619617. Last modified: 15 Aug 2007, 16:32:03 UTC I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com. Joe Done - you should have mail. Got it, and the problem is: <triplet_thresh>-2.06835318</triplet_thresh> Because all data points are positive, that puts the Triplet finding into a very slow exhaustive comparison. But since there are no points below threshold it can't possibly find a triplet, hence it goes on doing that on each successive array. It's simply a situation which wasn't expected when triplet finding was designed. I did finally get a WU from the 04mr07ab set. It also had a low triplet threshold but not negative, so it simply overflowed very quickly. The pattern looks exactly like what was seen in May 2006, but Murphy probably will make the cause different. Joe Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. ID: 619629 ·

Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0	Message 619631 - Posted: 15 Aug 2007, 16:28:56 UTC Grid software is supposed to only farm work out to nodes that meet the proper software requirements . . . ID: 619631 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619634 - Posted: 15 Aug 2007, 16:35:07 UTC - in response to Message 619629. I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com. Joe Done - you should have mail. Got it, and the problem is: <triplet_thresh>-2.06835318</triplet_thresh> Because all data points are positive, that puts the Triplet finding into a very slow exhaustive comparison. But since there are no points below threshold it can't possibly find a triplet, hence it goes on doing that on each successive array. It's simply a situation which wasn't expected when triplet finding was designed. I did finally get a WU from the 04mr07ab set. It also had a low triplet threshold but not negative, so it simply overflowed very quickly. The pattern looks exactly like what was seen in May 2006, but Murphy probably will make the cause different. Joe I have another from the same batch (04mr07ab.32128.5798.15.4.43) queued up next to run, and it also has <triplet_thresh>-2.06835318</triplet_thresh> . Your advice please? Is there anything to be gained from running these, or should we just abort and move on? And can we rely on your good offices to alert whoever's manning the splitters in Eric's absence, to get them to put a sanity check on the threshold? ID: 619634 ·

bounty.hunter Volunteer tester Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0	Message 619637 - Posted: 15 Aug 2007, 16:37:31 UTC - in response to Message 619629. Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. Thanks Joe, That is a good suggestion. I would have probably aborted any others too, if you hadn't mentioned this...Ha! ID: 619637 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14661 Credit: 200,643,578 RAC: 874	Message 619638 - Posted: 15 Aug 2007, 16:38:06 UTC - in response to Message 619629. Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. Good thinking. Suspended it is. ID: 619638 ·

Browny Volunteer tester Send message Joined: 22 Sep 99 Posts: 33 Credit: 814,772 RAC: 0	Message 619644 - Posted: 15 Aug 2007, 16:53:45 UTC Hi All, I have the same problem overhere, a resyart from Bionc didn't help, in the message tab I can read the following and that is pretty clear but what can I do about it, let it go or something else???? 15-8-2007 18:46:35\|SETI@home\|Restarting task 04mr07ab.32128.6616.15.4.75_1 using setiathome_enhanced version 527 15-8-2007 18:47:12\|SETI@home\|app reporting negative CPU: -0.281250 Regards, Browny ID: 619644 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.