Pending Credits almost cleared!!!

Message boards : Number crunching : Pending Credits almost cleared!!!
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith

Send message
Joined: 19 May 99
Posts: 483
Credit: 938,268
RAC: 0
United Kingdom
Message 564653 - Posted: 10 May 2007, 17:45:10 UTC

These are all that are left - at an all time low for me, thanks to the long outage:-

Result ID Claimed credit
525244870 57.96
525245072 55.26
526060966 18.05
Pending credit: 131.28

However, it is wishful thinking that they might be cleared before the restart. All 3 of these Result IDs are in long queues moving very slowly.

The database/file status is reassuringly clearing, but is unlikely to be completely cleared before restart:-

Results in progress 494,759 54m
Workunits waiting for validation 3 54m
Workunits waiting for assimilation 512,650 54m

I am intrigued that there are now fewer "Results in Progress" than there are "Workunits Waiting for Assimilation".

Keith
ID: 564653 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 564659 - Posted: 10 May 2007, 18:00:17 UTC

Keith, remember that when it says "results" it's talking about tasks/results. Each wu NOW consists of a min of 3 results (used to be 4). So, this might make your comparison take on a whole new look.
ID: 564659 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 564660 - Posted: 10 May 2007, 18:01:49 UTC - in response to Message 564653.  
Last modified: 10 May 2007, 18:03:11 UTC



<snip>

I am intrigued that there are now fewer "Results in Progress" than there are "Workunits Waiting for Assimilation".

Keith


Hmmmm....

Well that makes sense to me, the two aren't really directly related.

As long as the MSD is down the number of WU's waiting to assimilate will continue to grow. As results are returned and validate the number in progress will continue to decrease except for reissues and eventually even those will be gone.

Personally, I don't know why the team is worrying about getting Thumper back online ASAP, as this would seem me to be a perfect opportunity to get all outstanding work returned and be able to go over everything with a fine tooth comb, cleanup all the DB's, test all the new gear they've installed recently fully, and maybe even experiment a little with some things they would have liked to before but felt were too risky while in production mode.

I don't think it would have a very big impact on user retention overall. I for one am more than willing to cut them some slack and let them do anything they feel is neccessary to improve their backend situation, and my hosts will still be sitting here waiting to dig in when they're ready, even if that's a month from now.

Alinator




ID: 564660 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 564664 - Posted: 10 May 2007, 18:21:19 UTC - in response to Message 564653.  
Last modified: 10 May 2007, 18:26:34 UTC


Results in progress 494,759 54m
Workunits waiting for validation 3 54m
Workunits waiting for assimilation 512,650 54m

I am intrigued that there are now fewer "Results in Progress" than there are "Workunits Waiting for Assimilation".

Keith


I just noticed that R I P is now < 500,000 too. As most of those have been in progress for at least 10 days now, I would like to know from Matt or Eric or another member of the Staff about the possibility of re-sending Results early for the Workunits that have not yet reached Quorum.

Is it just a simple couple of lines of code for the Scheduler, or would it be a major undertaking which could screw up other things?

None of my PCs have run out of work yet, but Beta now has under 3K results ready to send, and my oldest box is not capable of running Rosetta as it only has 128MB RAM.

[edited]I missed out the word EARLY[/edit]
Sir Arthur C Clarke 1917-2008
ID: 564664 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 564666 - Posted: 10 May 2007, 18:27:46 UTC

Yeah, my pending credit is down to just over 100 which is about a tenth the normal amount. That makes sense because units can be validated, etc, and when there are no results to send in, naturally the number of pendings decrease. As soon as Thumper is replaced and new units start being issued again, the pendings and RACs will go back up to normal.
ID: 564666 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 564672 - Posted: 10 May 2007, 18:43:34 UTC - in response to Message 564664.  
Last modified: 10 May 2007, 18:45:05 UTC


I just noticed that R I P is now < 500,000 too. As most of those have been in progress for at least 10 days now, I would like to know from Matt or Eric or another member of the Staff about the possibility of re-sending Results early for the Workunits that have not yet reached Quorum.

Is it just a simple couple of lines of code for the Scheduler, or would it be a major undertaking which could screw up other things?

None of my PCs have run out of work yet, but Beta now has under 3K results ready to send, and my oldest box is not capable of running Rosetta as it only has 128MB RAM.

[edited]I missed out the word EARLY[/edit]


I'm just curious and I saw you suggested this before, but the part I'm not seeing is what would be the benefit from the projects POV to doing this?

Since three are sent 3 initially and only 2 need to be returned to complete it, to just summarily reissue another one would most likely mean hosts would just be wasting their time doing unecessary results.

I was looking thorough the list of results for a host mentioned elsewhere and the majority of the WU's by far there don't need another result to complete the science.

Alinator
ID: 564672 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 564711 - Posted: 10 May 2007, 20:20:11 UTC - in response to Message 564672.  


I just noticed that R I P is now < 500,000 too. As most of those have been in progress for at least 10 days now, I would like to know from Matt or Eric or another member of the Staff about the possibility of re-sending Results early for the Workunits that have not yet reached Quorum.

Is it just a simple couple of lines of code for the Scheduler, or would it be a major undertaking which could screw up other things?

None of my PCs have run out of work yet, but Beta now has under 3K results ready to send, and my oldest box is not capable of running Rosetta as it only has 128MB RAM.

[edited]I missed out the word EARLY[/edit]


I'm just curious and I saw you suggested this before, but the part I'm not seeing is what would be the benefit from the projects POV to doing this?

Since three are sent 3 initially and only 2 need to be returned to complete it, to just summarily reissue another one would most likely mean hosts would just be wasting their time doing unecessary results.

I was looking thorough the list of results for a host mentioned elsewhere and the majority of the WU's by far there don't need another result to complete the science.

Alinator


I was thinking that there may be a lot of "ghost" WUs out there like this one of mine. I never received the WU.

There may have been many others like this which could be crunched over the next few days while we wait for the new server to arrive and be put into service.

At the moment they get re-issued when they time-out, I am suggesting that they could be re-issued early to speed up the quorum process. If this would be a major undertaking for the staff, then ignore the suggestion, but if it is a simple process to re-send them then please do it.
ID: 564711 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 564723 - Posted: 10 May 2007, 20:35:46 UTC
Last modified: 10 May 2007, 20:35:59 UTC

I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ???

http://setiathome.berkeley.edu/show_host_detail.php?hostid=2941574
ID: 564723 · Report as offensive
Profile gregh

Send message
Joined: 10 Jun 99
Posts: 220
Credit: 4,292,549
RAC: 0
Australia
Message 564740 - Posted: 10 May 2007, 21:13:18 UTC - in response to Message 564653.  

These are all that are left - at an all time low for me, thanks to the long outage:-

Result ID Claimed credit
525244870 57.96
525245072 55.26
526060966 18.05
Pending credit: 131.28



I have only 3 machines doing Seti. 2 have already done all the packets they had and this one I am typing on has about 24% of one last packet to go and that will be done very soon.

It doesnt matter in the long run. I remember the old days when you would do 1 packet, return it and pick up another, no queues. If Seti went down back then, you didnt have any other WUs to do other than the one being done at the time!
ID: 564740 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 564744 - Posted: 10 May 2007, 21:33:39 UTC - in response to Message 564723.  

I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ???

http://setiathome.berkeley.edu/show_host_detail.php?hostid=2941574


Maybe but I don't think it's likely. Even as mine ran out of work they were still able to reach the project frequently enough I don't think they would have had a problem reporting on time. Of course if your host was running a monster cache it might have more time pressure on it to report ASAP after finishing one.

In any event I'm down to three where this scenario is playing out. So I'm just not seeing it as a big enough issue where the team should go looking for trouble by messing around with stuff to fix something which is already proven to take care of itself the way things are.

IOW, you are no worse off than you would have been if the failure of Thumper had not occured.

Alinator
ID: 564744 · Report as offensive
Profile Keith

Send message
Joined: 19 May 99
Posts: 483
Credit: 938,268
RAC: 0
United Kingdom
Message 564745 - Posted: 10 May 2007, 21:34:45 UTC - in response to Message 564723.  

I was just checking on my pendings to see how the folks were doing that needed to validate on my pendings and ran across this computer who has a huge number of WU's that have already expired and many more ready too. Do you think he's crunching them but unable to report because of the outage or ???

http://setiathome.berkeley.edu/show_host_detail.php?hostid=2941574


Philadelphia
I, too have "investigated" the list of "Your Results" on my computer and, although I have no results left to work on, there are 4 listed. 3 of these appear in an extract of the list:-

532127097 128633205 1 May 2007 23:05:48 UTC 3 May 2007 20:14:59 UTC Over Success Done 10,516.11 49.91 49.91
532126675 128633047 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- ---
532126673 128633049 1 May 2007 23:05:28 UTC 5 May 2007 11:23:11 UTC Over Success Done 23,440.59 49.91 49.91
532126651 128633044 1 May 2007 23:05:28 UTC 4 May 2007 23:08:28 UTC Over Success Done 23,451.94 49.92 49.92
532126625 128633043 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- ---
532126585 128633028 1 May 2007 23:05:28 UTC 18 May 2007 10:10:10 UTC In Progress Unknown New --- --- ---
532123764 128632104 1 May 2007 23:02:46 UTC 3 May 2007 16:26:58 UTC Over Success Done 5,578.81 48.68 48.68

Now, I happen to know when this occurred. It was when I upgraded BOINC Manager. They disappeared from my "Tasks" page and were no longer in the BOINC Data's sub folder for SETI, but remained in the "Your Results" listing. I guess these are classic "Ghost results" or "Orphaned Results". I wonder if that is what has happened to host 2941574? You, by the way, are lucky with the "huge number of WUs" amounting to 183. I've got 2 similar cases, but one with 600, and another with 300 WUs holding back my pendings!!!

I agree absolutely with the suggestions made on this thread. This outage, as I have said before, is a blessing in disguise, giving an opportunity at the small expense of an extra 2 or 3 days to continue with a clean start once all the units have been crunched and validated. Any few that remain when everything comes to a standstill must be the problem "results" and can be removed to continue with a clean database. That must be worthwhile.

Keith
ID: 564745 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 564749 - Posted: 10 May 2007, 21:46:19 UTC

Agreed, and they came pretty close to doing that when the project was down almost 2 years ago now for the server closet upgrade.

Unfortunately, IIRC there was malfunction not too long after they went back online that started the Zombies accumulating again. ;-)

Although there are far fewer today than there was then (or so it seems). I don't have any right now on my account, but back then I had only been crunching BOINC for a few months and had about a dozen as I recall.

Alinator
ID: 564749 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 564751 - Posted: 10 May 2007, 21:54:15 UTC - in response to Message 564744.  

Even as mine ran out of work they were still able to reach the project frequently enough I don't think they would have had a problem reporting on time. Of course if your host was running a monster cache it might have more time pressure on it to report ASAP after finishing one.


Something to keep in mind: BOINC CCs that don't have return_results_immediately enabled will wait for a while to report. My own 5.4.11 host went into about a one week backoff on attempting to report until I went to give it a nudge.

Something else to consider... I have two hosts, both using the same cable connection and going through the same router. This host (my AMD) has had no joy in contacting the scheduler every time I've attempted it in the past ~36 hours. I had left my Intel host alone and let it continue to try to contact the scheduler in hopes of snagging a reissue (hasn't happened). It has been able to connect, most recently about 3 hours ago. So two machines having the same connection have different results...
ID: 564751 · Report as offensive

Message boards : Number crunching : Pending Credits almost cleared!!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.