Aborted by Project

Message boards : Number crunching : Aborted by Project
Message board moderation

To post messages, you must log in.

AuthorMessage
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 589325 - Posted: 20 Jun 2007, 0:46:30 UTC

A unit has suddenly aborted itself with the above comment, LESS than 24 hours after download. There is nothing about this in the Messages tab. I had not started work on it, but it was next in the queue. The two wingers had gained ctedit for it.

What has happened, and why?
ID: 589325 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 589332 - Posted: 20 Jun 2007, 1:01:58 UTC
Last modified: 20 Jun 2007, 1:16:51 UTC

The project has enabled a couple of newer BOINC backend features recently they haven't used previously.

The first is the ability to resend a lost or 'ghost' result to your host.

The second is the abilty to command the host to abort results under certain conditions. The first is to unconditionally abort results which are 'defective' or have been canceled by the project whether they have started or not. The other part is what you are seeing. In this case the host will abort unrun results which have already formed a quorum (for SAH this is 2 results strongly similiar). The reason is that's all that's needed to complete the science for the WU, and thus your host would just be wasting it's time and your money to run the result.

HTH,

Alinator
ID: 589332 · Report as offensive
Martin Johnson

Send message
Joined: 9 Jun 01
Posts: 201
Credit: 224,995
RAC: 0
United Kingdom
Message 589335 - Posted: 20 Jun 2007, 1:08:57 UTC

I see, thanks. Funny it said nothing in the Messages tab.
ID: 589335 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 589341 - Posted: 20 Jun 2007, 2:03:35 UTC - in response to Message 589332.  

The project has enabled a couple of newer BOINC backend features recently they haven't used previously.

The first is the ability to resend a lost or 'ghost' result to your host.

The second is the abilty to command the host to abort results under certain conditions. The first is to unconditionally abort results which are 'defective' or have been canceled by the project whether they have started or not. The other part is what you are seeing. In this case the host will abort unrun results which have already formed a quorum (for SAH this is 2 results strongly similiar). The reason is that's all that's needed to complete the science for the WU, and thus your host would just be wasting it's time and your money to run the result.

HTH,

Alinator

There is actually another sub case for the command to abort results. If the task has been validated, and your work is late, and the WU has been moved to the science data base (i.e. removed from the list of active WUs) then your task will be aborted even if it has started. You aren't going to get credit for it anyway.


BOINC WIKI
ID: 589341 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 589356 - Posted: 20 Jun 2007, 2:41:14 UTC - in response to Message 589341.  


There is actually another sub case for the command to abort results. If the task has been validated, and your work is late, and the WU has been moved to the science data base (i.e. removed from the list of active WUs) then your task will be aborted even if it has started. You aren't going to get credit for it anyway.


I suppose you could technically put that in the 'defective' category. ;-)

Alinator
ID: 589356 · Report as offensive
Profile Dave C
Avatar

Send message
Joined: 22 Jan 02
Posts: 364
Credit: 1,025,962
RAC: 0
United States
Message 589903 - Posted: 21 Jun 2007, 13:21:48 UTC

I think I'm done with this because of the fact that one computer I am using will never get any more credits just for the fact that it will not finnish any units before other computers. And I know that there are some slower ones out there that will be in the same boat.

Do they expect use to keep buying new computers just to keep up with every one else, I can't aford that.
Avians and Myrmicats, the Octospiders, and the Humans all living in one huge cylinder in space called RAMA.
ID: 589903 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 589905 - Posted: 21 Jun 2007, 13:29:22 UTC - in response to Message 589903.  

I think I'm done with this because of the fact that one computer I am using will never get any more credits just for the fact that it will not finnish any units before other computers. And I know that there are some slower ones out there that will be in the same boat.

Do they expect use to keep buying new computers just to keep up with every one else, I can't aford that.

HI, If I'm understanding what the people smarter than I are saying, it's not aborting wus already started. It's only aborting those that haven't started. You'll still continue to contribute and get credit for those that you run.

You're network traffic will increase since you'll spend bandwidth downloading wus that never run. I don't pay by the bit, so it doesn't matter to me.
ID: 589905 · Report as offensive
Profile Dingo
Volunteer tester
Avatar

Send message
Joined: 28 Jun 99
Posts: 104
Credit: 16,364,896
RAC: 1
Australia
Message 589909 - Posted: 21 Jun 2007, 13:43:23 UTC

I also notice that these aborted bt project wu end up as a Client error on the users account. There maybe should be another category other than User Error for these ??

Proud Founder and member of



Have a look at my WebCam
ID: 589909 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 589911 - Posted: 21 Jun 2007, 13:45:45 UTC

After firing a couple more neurons......I suppose the "abort by project" thing would foster keeping a smaller cache to avoid wasted downloads by those of us with SLOW computers.

PS. I agree that the project aborted units shouldn't count when it comes to the quota.
ID: 589911 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 589919 - Posted: 21 Jun 2007, 14:29:29 UTC - in response to Message 589911.  

After firing a couple more neurons......I suppose the "abort by project" thing would foster keeping a smaller cache to avoid wasted downloads by those of us with SLOW computers.

PS. I agree that the project aborted units shouldn't count when it comes to the quota.


I have suggested that about a 100 times so far.. To no avail. Maybe eventually someone will notice and do something about it. I also don't like my machine being flagged with a lot of "Client Errors", when in effect, I have none.

Boinc Button Abuser In Training >My Shrubbers<
ID: 589919 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 589929 - Posted: 21 Jun 2007, 15:04:31 UTC - in response to Message 589919.  

After firing a couple more neurons......I suppose the "abort by project" thing would foster keeping a smaller cache to avoid wasted downloads by those of us with SLOW computers.

PS. I agree that the project aborted units shouldn't count when it comes to the quota.


I have suggested that about a 100 times so far.. To no avail. Maybe eventually someone will notice and do something about it. I also don't like my machine being flagged with a lot of "Client Errors", when in effect, I have none.


I can relate to the "aesthetics" of how it appears, but all you really need to do for now is keep an eye on your WU limit per quota day and make sure that you aren't getting penalized...
ID: 589929 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 589938 - Posted: 21 Jun 2007, 15:43:01 UTC
Last modified: 21 Jun 2007, 15:51:27 UTC

I upgraded to version 5.10.X about 2 week ago. After starting this version I initially had about 30% of my cache aborted. I understand that this can be troubling when it happens. Now at any given time on my 5 boxes I see anywhere between none and 5 aborted work units waiting to be reported. I am still running a 5 day cache on all 5 boxes and things have pretty much sorted out. I actually like it this way. I have gotten used to seeing some of the work being aborted but Boinc immediately downloads more work to take it's place. The work that needs to be done eventually "bubbles" to the top of the cache and get crunched.

RAC on my boxes range from 710 to over 5,000 and this abort procedure has NOT affected the RAC on any box.

Now I can be certain that work done by my computers is actually needed by the project. If an outage occurs then none of the cache will be aborted since communications with Berkeley is not happening so I can happily crunch all the cache on hand if needed and get credit for all of it. If Berkeley comes back online then work is reported and new work downloaded as before. Like I said, I kind of like it now that it has settled down.

Boinc....Boinc....Boinc....Boinc....
ID: 589938 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 589941 - Posted: 21 Jun 2007, 15:52:34 UTC - in response to Message 589938.  


Now I can be certain that work done by my computers is actually needed by the project. If an outage occurs then none of the cache will be aborted since communications with Berkeley is not happening so I can happily crunch all the cache on hand if needed and get credit for all of it. If Berkeley comes back online then work is reported and new work downloaded as before. Like I said, I kind of like it now that it has settled down.


While I'm hesitant to stir you up, there is still a scenario where large caches would be eliminated during a specific type of "outage". The scenario is one which uploading and reporting work, but downloading does not. That happened a time or two (or more) over the past couple of months. Come to think of it, those issues had problems with the download servers, but it could also happen if there was a splitter-only outage to where the Ready to Send queue ran dry/low...

Like I said, hesitant to stir you up if you're happy, but I want to make sure that it's understood that a large cache still could very well be zapped if certain conditions happen...

Brian
ID: 589941 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 589951 - Posted: 21 Jun 2007, 16:19:50 UTC

For slow systems, I don't think there's much impact unless the user wants it. Left alone, the core client normally only contacts the Scheduler when it requests work. If the Scheduler aborts some unstarted work in the reply as well as sending the requested work, the core client will request replacement work after recalculating the queue fill level. The impact is the added contact with the Scheduler and downloading the replacement work. On a really slow system a user might want to click the Update button once in awhile to clear useless work out of the queue and replace it with potentially useful WUs.

Many out of the ordinary cases have already been discussed, and I'll just note that personally I'd rather not have my hosts doing scientifically useless work even during outages. But I won't criticize those who disagree, the credit system and competition in that realm are probably major factors in the amount of work we get done. It doesn't matter to me why a user is crunching.

When Dr. Anderson first described his idea for this feature it included a message on hosts when a started WU was no longer needed. I suppose the reason that wasn't implemented is the difficulty in wording the message clearly, but still I wish it had been included.
                                                                Joe
ID: 589951 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 589952 - Posted: 21 Jun 2007, 16:19:57 UTC - in response to Message 589941.  

While I'm hesitant to stir you up, there is still a scenario where large caches would be eliminated during a specific type of "outage". The scenario is one which uploading and reporting work, but downloading does not. That happened a time or two (or more) over the past couple of months. Come to think of it, those issues had problems with the download servers, but it could also happen if there was a splitter-only outage to where the Ready to Send queue ran dry/low...

Like I said, hesitant to stir you up if you're happy, but I want to make sure that it's understood that a large cache still could very well be zapped if certain conditions happen...

Brian


Thanks.........I tend to "suspend communications" at the first sign of trouble and don't return to "normal communications" until Berkely is back online for at least several hours and no new problems reported in the forum. This procedure has worked well for me in the past.

Boinc....Boinc....Boinc....Boinc....
ID: 589952 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 592366 - Posted: 25 Jun 2007, 11:27:53 UTC

bumped from the first page for Modesto
ID: 592366 · Report as offensive

Message boards : Number crunching : Aborted by Project


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.