Project Backoff too extreme...

Message boards : Number crunching : Project Backoff too extreme...
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 971657 - Posted: 19 Feb 2010, 14:55:47 UTC

When the servers are slow the "Project Backoff" sucks. If I leave it alone I will keep having more waiting to upload than I have uploading. It wants to keep waiting like an hour or 2 when uploads are working but just slow. On it's own Boinc managed to upload 18 units, in the last 20 minutes I made it upload 185 units. I'll take the flak as I feel this is too extreme. The scheduler should know the difference from a slow connection to one that is not working. If they don't see anything is wrong and want to tell us maybe I can break it the rest of the way so they will fix it!
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 971657 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 971696 - Posted: 19 Feb 2010, 16:34:13 UTC - in response to Message 971657.  

The scheduler should know...

Uploads and downloads aren't done via the scheduler. Simply said, only reports and requests for new work use the scheduler. Up- and downloads just transfer data from the project's server (disk) to your computer (disk).


ID: 971696 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 971699 - Posted: 19 Feb 2010, 16:47:50 UTC - in response to Message 971696.  

The scheduler should know...

Uploads and downloads aren't done via the scheduler. Simply said, only reports and requests for new work use the scheduler. Up- and downloads just transfer data from the project's server (disk) to your computer (disk).


What tells it how long to wait from one try to the next?
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 971699 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65759
Credit: 55,293,173
RAC: 49
United States
Message 971701 - Posted: 19 Feb 2010, 16:50:20 UTC - in response to Message 971657.  

Yes It does, I can see the retry in x number of minutes, But backoff frequently means longer and longer times and then when It does try It's for a minute or two only and then It goes right back to backoff, Somebody rip It out, If not I could go back to a version of Boinc that doesn't support It like 6.10.18(I think that one doesn't at least).
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971701 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 971704 - Posted: 19 Feb 2010, 16:54:42 UTC

I use v6.10.18 and it has project backoff.
ID: 971704 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65759
Credit: 55,293,173
RAC: 49
United States
Message 971708 - Posted: 19 Feb 2010, 16:58:43 UTC - in response to Message 971704.  

I use v6.10.18 and it has project backoff.

I wasn't sure, Thanks.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971708 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 971724 - Posted: 19 Feb 2010, 17:21:58 UTC - in response to Message 971701.  

Somebody rip It out

It's there to stop computers DDoS'ing the project. Max back-off will be 24 hours.
If you don't like it ... The first version to have it was 6.6.38
ID: 971724 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971731 - Posted: 19 Feb 2010, 17:29:01 UTC - in response to Message 971657.  

When the servers are slow the "Project Backoff" sucks. If I leave it alone I will keep having more waiting to upload than I have uploading. It wants to keep waiting like an hour or 2 when uploads are working but just slow. On it's own Boinc managed to upload 18 units, in the last 20 minutes I made it upload 185 units. I'll take the flak as I feel this is too extreme. The scheduler should know the difference from a slow connection to one that is not working. If they don't see anything is wrong and want to tell us maybe I can break it the rest of the way so they will fix it!

I'm sorry, my friend, but the backoffs are not big enough.

Your complaint is that when a transaction fails, it takes too long for the next retry.

My statement is "if the backoff was long enough, the very next try would almost always be successful."

According to the "30 day" graphs on Scarecrow's site, something just under 50,000 results per hour are reported. That implies that there are about 50,000 uploads per hour. (call it just under four per second).

SETI was struggling for a bit over 30 hours.

That's a backlog of 1,500,000 uploads, and if I remember correctly, the cap on retries is 4 hours -- meaning we're asking the upload servers to take over 100 completed uploads per second.

That won't ever happen.

So, the project wide backoff helps by saying "if one upload failed, another upload in a few seconds will also likely fail, so there is no great need to try."

... but there are many clients that don't have the project-wide backoff, and even if the project-wide backoff cut the load in half, it's still going to be a monster number.

What is needed is a mechanism to tell the clients to slow down at times like this. If we knew that the upload servers could handle 20 uploads per second, and we could tell all the clients to back-off to the point where there were exactly 20 per second, then backlogs would clear very much faster.

The biggest single problem is that our normal reaction is that anything will fit if we just push very hard. Sometimes we still think "push harder" is the right answer even after we break the tool.
ID: 971731 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 971733 - Posted: 19 Feb 2010, 17:31:08 UTC - in response to Message 971708.  

I use v6.10.18 and it has project backoff.

I wasn't sure, Thanks.

You'd have to go back to whatever the last version of 6.6 was IIRC.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 971733 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971735 - Posted: 19 Feb 2010, 17:35:32 UTC - in response to Message 971699.  

The scheduler should know...

Uploads and downloads aren't done via the scheduler. Simply said, only reports and requests for new work use the scheduler. Up- and downloads just transfer data from the project's server (disk) to your computer (disk).


What tells it how long to wait from one try to the next?

It is called an "exponential back-off" -- the back-off is calculated based on the number of times the upload has failed.

The first few back-offs are fairly short, and it increases to an upper limit.

The average, as I remember, is two hours. It doesn't go all the way to four hours and stay there (which is the normal way of doing an exponential back-off).

I think most of us know (directly or indirectly) what happens if you jam stuff into the garbage disposer too fast.

Feed it at a reasonable rate, and it grinds up the garbage into nice tiny bits that are flushed down the drain.

Feed it too fast, and the motor jams, the circuit breaker pops, and you have to get the tool and unjam the motor -- if you're lucky. If you clog the drain you might have to disassemble the trap or even call the rooter-dude.
ID: 971735 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 971740 - Posted: 19 Feb 2010, 17:43:52 UTC

The biggest single problem is that our normal reaction is that anything will fit if we just push very hard. Sometimes we still think "push harder" is the right answer even after we break the tool.



Get a bigger hammer?? :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 971740 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65759
Credit: 55,293,173
RAC: 49
United States
Message 971761 - Posted: 19 Feb 2010, 18:13:17 UTC - in response to Message 971731.  

When the servers are slow the "Project Backoff" sucks. If I leave it alone I will keep having more waiting to upload than I have uploading. It wants to keep waiting like an hour or 2 when uploads are working but just slow. On it's own Boinc managed to upload 18 units, in the last 20 minutes I made it upload 185 units. I'll take the flak as I feel this is too extreme. The scheduler should know the difference from a slow connection to one that is not working. If they don't see anything is wrong and want to tell us maybe I can break it the rest of the way so they will fix it!

I'm sorry, my friend, but the backoffs are not big enough.

Your complaint is that when a transaction fails, it takes too long for the next retry.

My statement is "if the backoff was long enough, the very next try would almost always be successful."

According to the "30 day" graphs on Scarecrow's site, something just under 50,000 results per hour are reported. That implies that there are about 50,000 uploads per hour. (call it just under four per second).

SETI was struggling for a bit over 30 hours.

That's a backlog of 1,500,000 uploads, and if I remember correctly, the cap on retries is 4 hours -- meaning we're asking the upload servers to take over 100 completed uploads per second.

That won't ever happen.

So, the project wide backoff helps by saying "if one upload failed, another upload in a few seconds will also likely fail, so there is no great need to try."

... but there are many clients that don't have the project-wide backoff, and even if the project-wide backoff cut the load in half, it's still going to be a monster number.

What is needed is a mechanism to tell the clients to slow down at times like this. If we knew that the upload servers could handle 20 uploads per second, and we could tell all the clients to back-off to the point where there were exactly 20 per second, then backlogs would clear very much faster.

The biggest single problem is that our normal reaction is that anything will fit if we just push very hard. Sometimes we still think "push harder" is the right answer even after we break the tool.

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out, as I realize the cpus are under what amounts to a heavy bombardment, It's that or forbid any new accounts for Seti@Home use until something changes. At least the cpus haven't turned molten.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971761 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971782 - Posted: 19 Feb 2010, 18:43:04 UTC - in response to Message 971761.  

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out, as I realize the cpus are under what amounts to a heavy bombardment, It's that or forbid any new accounts for Seti@Home use until something changes. At least the cpus haven't turned molten.

But the problem comes from the fact that, given a choice of letting things cool off, and turning up the heat, too many people want to try to melt the CPUs.
ID: 971782 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 971784 - Posted: 19 Feb 2010, 18:44:11 UTC - in response to Message 971761.  

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out

Or they can just figure out what the present problem is & fix it. Unlikely to require new hardware to do that.

Grant
Darwin NT
ID: 971784 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971785 - Posted: 19 Feb 2010, 18:44:23 UTC - in response to Message 971740.  

The biggest single problem is that our normal reaction is that anything will fit if we just push very hard. Sometimes we still think "push harder" is the right answer even after we break the tool.



Get a bigger hammer?? :-)

Exactly.
ID: 971785 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971786 - Posted: 19 Feb 2010, 18:45:31 UTC - in response to Message 971784.  

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out

Or they can just figure out what the present problem is & fix it. Unlikely to require new hardware to do that.

Look at the numbers. The problem isn't the hardware on the server side, it's 100 upload attempts per second.
ID: 971786 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971788 - Posted: 19 Feb 2010, 18:47:18 UTC - in response to Message 971724.  

Somebody rip It out

It's there to stop computers DDoS'ing the project. Max back-off will be 24 hours.
If you don't like it ... The first version to have it was 6.6.38

Project jack off........IMHO.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 971788 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 971791 - Posted: 19 Feb 2010, 18:53:49 UTC - in response to Message 971785.  

The biggest single problem is that our normal reaction is that anything will fit if we just push very hard. Sometimes we still think "push harder" is the right answer even after we break the tool.



Get a bigger hammer?? :-)

Exactly.


Well, that shows everybody who said Ned doesn't know Jack.

ID: 971791 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65759
Credit: 55,293,173
RAC: 49
United States
Message 971794 - Posted: 19 Feb 2010, 18:57:54 UTC - in response to Message 971786.  

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out

Or they can just figure out what the present problem is & fix it. Unlikely to require new hardware to do that.

Look at the numbers. The problem isn't the hardware on the server side, it's 100 upload attempts per second.

And what is the maximum capacity to generate acks by the servers?

Are We 50% over? 100% over? or what?
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971794 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 971802 - Posted: 19 Feb 2010, 19:03:36 UTC - in response to Message 971794.  

My backoffs have been ongoing for nearly a week, Seti needs to get AP off It's turf and to have a few more servers so that acks can be sent out

Or they can just figure out what the present problem is & fix it. Unlikely to require new hardware to do that.

Look at the numbers. The problem isn't the hardware on the server side, it's 100 upload attempts per second.

And what is the maximum capacity to generate acks by the servers?

Are We 50% over? 100% over? or what?

Probably more like 250%
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 971802 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Project Backoff too extreme...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.