Message boards :
Number crunching :
Work Unit problem
Message board moderation
Author | Message |
---|---|
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Should I abort this work unit nearly two hours work and only 0.05% done. No. Stop BOINC, and then start BOINC. Report what happens. |
Ebola Bob Send message Joined: 5 Jun 05 Posts: 23 Credit: 79,925 RAC: 0 |
Dude, is your Shift key a bit temperamental? Click below to see my werewolf |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. The surprising thing was that, I'm using Chicken 2.4 and I didn't expect it to get stuck. Have never had this problem both here and on Beta, that I can remember. Will be interesting to see what the other pc's on it make of it.Probably a -9... |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
nOpE - it's IntEntiOnal SIR! ;> BOINC Wiki . . . Science Status Page . . . |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. Funnily enough, I'm currently working on result 590854298, which is doing the same thing. I was running stock 5.27 at the time, and noticed it 'stuck' at 0.002% after 15 minutes or so. I assumed it was just a badly-behaved -9, as usual, and after advising madmac to restart BOINC, I followed my own advice. That didn't clear it, so I re-installed the opti app - 2.2B in this case, because it's on my Vista box - and I'm letting it run for a while to see what happens. Just clicked up to 0.009% after almost 20 minutes - that's slower than Astropulse, almost up to CPDN speed! I think I'll cross-refer this thread to Beta, and see if we can rustle up a programmer. |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
... that's slower than Astropulse, almost up to CPDN speed! Ha! Astropulse did come to mind when I looked at the progress...Yes it might be a good idea to get some input from Beta and/or Joe... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
... that's slower than Astropulse, almost up to CPDN speed! Posted and PM'd. Now at 0.017% after 41 minutes - all other WUs running at normal speed, task manager shows 100% on all cores. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I've had a similar workunit which did get up to .05 or so after 2 hours and the time kept going up...Despite a number of exits and restarts I finally had to abort it. There was something similar when setiathome_enhanced was new here, turned out to be the splitter putting bad values in the workunit header. You might comapare the header portions of a good result file to the header in the slow result. I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com. Joe |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.Joe Done - you should have mail. |
Mr. Onderdijk, Bert Send message Joined: 9 Aug 99 Posts: 1 Credit: 444,483 RAC: 0 |
Got the same problem. My PC takes two hours to reach 0.020%. Restarting Boinc does let it restart from 0.000 again. |
sic Send message Joined: 16 Dec 06 Posts: 1 Credit: 140,480 RAC: 0 |
Fine... here a another WU with problem. http://setiathome.berkeley.edu//result.php?resultid=590581942 Workunit has done, just before I clicked to abort. (Sry, my english is little) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.Joe Got it, and the problem is: <triplet_thresh>-2.06835318</triplet_thresh> Because all data points are positive, that puts the Triplet finding into a very slow exhaustive comparison. But since there are no points below threshold it can't possibly find a triplet, hence it goes on doing that on each successive array. It's simply a situation which wasn't expected when triplet finding was designed. I did finally get a WU from the 04mr07ab set. It also had a low triplet threshold but not negative, so it simply overflowed very quickly. The pattern looks exactly like what was seen in May 2006, but Murphy probably will make the cause different. Joe Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I'd also like to have a look, you could send me an archive of the WU and any related files: jsegur at westelcom dot com.Joe I have another from the same batch (04mr07ab.32128.5798.15.4.43) queued up next to run, and it also has <triplet_thresh>-2.06835318</triplet_thresh>. Your advice please? Is there anything to be gained from running these, or should we just abort and move on? And can we rely on your good offices to alert whoever's manning the splitters in Eric's absence, to get them to put a sanity check on the threshold? |
bounty.hunter Send message Joined: 22 Mar 04 Posts: 442 Credit: 459,063 RAC: 0 |
Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. Thanks Joe, That is a good suggestion. I would have probably aborted any others too, if you hadn't mentioned this...Ha! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Edit: Aborting these WUs would simply impose an additional load on the servers as they reissue the WUs to others. I suggest anyone who has one Suspend it until the project has a chance to react. They could cancel the WUs to keep reissues from happening and that should give project aborts of the WUs to each host as it contacts the Scheduler. Good thinking. Suspended it is. |
Browny Send message Joined: 22 Sep 99 Posts: 33 Credit: 814,772 RAC: 0 |
Hi All, I have the same problem overhere, a resyart from Bionc didn't help, in the message tab I can read the following and that is pretty clear but what can I do about it, let it go or something else???? 15-8-2007 18:46:35|SETI@home|Restarting task 04mr07ab.32128.6616.15.4.75_1 using setiathome_enhanced version 527 15-8-2007 18:47:12|SETI@home|app reporting negative CPU: -0.281250 Regards, Browny |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.