Message boards :
Number crunching :
Pending Credits almost cleared!!!
Message board moderation
Author | Message |
---|---|
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
These are all that are left - at an all time low for me, thanks to the long outage:- Result ID Claimed credit 525244870 57.96 525245072 55.26 526060966 18.05 Pending credit: 131.28 However, it is wishful thinking that they might be cleared before the restart. All 3 of these Result IDs are in long queues moving very slowly. The database/file status is reassuringly clearing, but is unlikely to be completely cleared before restart:- Results in progress 494,759 54m Workunits waiting for validation 3 54m Workunits waiting for assimilation 512,650 54m I am intrigued that there are now fewer "Results in Progress" than there are "Workunits Waiting for Assimilation". Keith |
Astro Send message Joined: 16 Apr 02 Posts: 8026 Credit: 600,015 RAC: 0 |
Keith, remember that when it says "results" it's talking about tasks/results. Each wu NOW consists of a min of 3 results (used to be 4). So, this might make your comparison take on a whole new look. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Hmmmm.... Well that makes sense to me, the two aren't really directly related. As long as the MSD is down the number of WU's waiting to assimilate will continue to grow. As results are returned and validate the number in progress will continue to decrease except for reissues and eventually even those will be gone. Personally, I don't know why the team is worrying about getting Thumper back online ASAP, as this would seem me to be a perfect opportunity to get all outstanding work returned and be able to go over everything with a fine tooth comb, cleanup all the DB's, test all the new gear they've installed recently fully, and maybe even experiment a little with some things they would have liked to before but felt were too risky while in production mode. I don't think it would have a very big impact on user retention overall. I for one am more than willing to cut them some slack and let them do anything they feel is neccessary to improve their backend situation, and my hosts will still be sitting here waiting to dig in when they're ready, even if that's a month from now. Alinator |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
I just noticed that R I P is now < 500,000 too. As most of those have been in progress for at least 10 days now, I would like to know from Matt or Eric or another member of the Staff about the possibility of re-sending Results early for the Workunits that have not yet reached Quorum. Is it just a simple couple of lines of code for the Scheduler, or would it be a major undertaking which could screw up other things? None of my PCs have run out of work yet, but Beta now has under 3K results ready to send, and my oldest box is not capable of running Rosetta as it only has 128MB RAM. [edited]I missed out the word EARLY[/edit] Sir Arthur C Clarke 1917-2008 |
Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0 |
Yeah, my pending credit is down to just over 100 which is about a tenth the normal amount. That makes sense because units can be validated, etc, and when there are no results to send in, naturally the number of pendings decrease. As soon as Thumper is replaced and new units start being issued again, the pendings and RACs will go back up to normal. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I'm just curious and I saw you suggested this before, but the part I'm not seeing is what would be the benefit from the projects POV to doing this? Since three are sent 3 initially and only 2 need to be returned to complete it, to just summarily reissue another one would most likely mean hosts would just be wasting their time doing unecessary results. I was looking thorough the list of results for a host mentioned elsewhere and the majority of the WU's by far there don't need another result to complete the science. Alinator |
Keith T. Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9 |
I was thinking that there may be a lot of "ghost" WUs out there like this one of mine. I never received the WU. There may have been many others like this which could be crunched over the next few days while we wait for the new server to arrive and be put into service. At the moment they get re-issued when they time-out, I am suggesting that they could be re-issued early to speed up the quorum process. If this would be a major undertaking for the staff, then ignore the suggestion, but if it is a simple process to re-send them then please do it. |
Philadelphia Send message Joined: 12 Feb 07 Posts: 1590 Credit: 399,688 RAC: 0 |
I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ??? http://setiathome.berkeley.edu/show_host_detail.php?hostid=2941574 |
gregh Send message Joined: 10 Jun 99 Posts: 220 Credit: 4,292,549 RAC: 0 |
These are all that are left - at an all time low for me, thanks to the long outage:- I have only 3 machines doing Seti. 2 have already done all the packets they had and this one I am typing on has about 24% of one last packet to go and that will be done very soon. It doesnt matter in the long run. I remember the old days when you would do 1 packet, return it and pick up another, no queues. If Seti went down back then, you didnt have any other WUs to do other than the one being done at the time! |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ??? Maybe but I don't think it's likely. Even as mine ran out of work they were still able to reach the project frequently enough I don't think they would have had a problem reporting on time. Of course if your host was running a monster cache it might have more time pressure on it to report ASAP after finishing one. In any event I'm down to three where this scenario is playing out. So I'm just not seeing it as a big enough issue where the team should go looking for trouble by messing around with stuff to fix something which is already proven to take care of itself the way things are. IOW, you are no worse off than you would have been if the failure of Thumper had not occured. Alinator |
Keith Send message Joined: 19 May 99 Posts: 483 Credit: 938,268 RAC: 0 |
I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ??? Philadelphia I, too have "investigated" the list of "Your Results" on my computer and, although I have no results left to work on, there are 4 listed. 3 of these appear in an extract of the list:- 532127097 128633205 1 May 2007 23:05:48 UTC 3 May 2007 20:14:59 UTC Over Success Done 10,516.11 49.91 49.91 532126675 128633047 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- --- 532126673 128633049 1 May 2007 23:05:28 UTC 5 May 2007 11:23:11 UTC Over Success Done 23,440.59 49.91 49.91 532126651 128633044 1 May 2007 23:05:28 UTC 4 May 2007 23:08:28 UTC Over Success Done 23,451.94 49.92 49.92 532126625 128633043 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- --- 532126585 128633028 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- --- 532123764 128632104 1 May 2007 23:02:46 UTC 3 May 2007 16:26:58 UTC Over Success Done 5,578.81 48.68 48.68 Now, I happen to know when this occurred. It was when I upgraded BOINC Manager. They disappeared from my "Tasks" page and were no longer in the BOINC Data's sub folder for SETI, but remained in the "Your Results" listing. I guess these are classic "Ghost results" or "Orphaned Results". I wonder if that is what has happened to host 2941574? You, by the way, are lucky with the "huge number of WUs" amounting to 183. I've got 2 similar cases, but one with 600, and another with 300 WUs holding back my pendings!!! I agree absolutely with the suggestions made on this thread. This outage, as I have said before, is a blessing in disguise, giving an opportunity at the small expense of an extra 2 or 3 days to continue with a clean start once all the units have been crunched and validated. Any few that remain when everything comes to a standstill must be the problem "results" and can be removed to continue with a clean database. That must be worthwhile. Keith |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
Agreed, and they came pretty close to doing that when the project was down almost 2 years ago now for the server closet upgrade. Unfortunately, IIRC there was malfunction not too long after they went back online that started the Zombies accumulating again. ;-) Although there are far fewer today than there was then (or so it seems). I don't have any right now on my account, but back then I had only been crunching BOINC for a few months and had about a dozen as I recall. Alinator |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 |
Even as mine ran out of work they were still able to reach the project frequently enough I don't think they would have had a problem reporting on time. Of course if your host was running a monster cache it might have more time pressure on it to report ASAP after finishing one. Something to keep in mind: BOINC CCs that don't have return_results_immediately enabled will wait for a while to report. My own 5.4.11 host went into about a one week backoff on attempting to report until I went to give it a nudge. Something else to consider... I have two hosts, both using the same cable connection and going through the same router. This host (my AMD) has had no joy in contacting the scheduler every time I've attempted it in the past ~36 hours. I had left my Intel host alone and let it continue to try to contact the scheduler in hopes of snagging a reissue (hasn't happened). It has been able to connect, most recently about 3 hours ago. So two machines having the same connection have different results... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.