Panic Mode On (18) Server problems

Message boards : Number crunching : Panic Mode On (18) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13765
Credit: 208,696,464
RAC: 304
Australia
Message 913759 - Posted: 3 Jul 2009, 22:18:48 UTC - in response to Message 913755.  

How did it clear magically like that?

Enough work had been returned that the servers weren't being hammered by clients trying to return work.
Also the download traffic has almost dropped to nothing, so that frees up some bandwidth & server & scheduler resources as well.
Grant
Darwin NT
ID: 913759 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 913760 - Posted: 3 Jul 2009, 22:22:53 UTC

When a host's upload queue drops below 2*nCPUs, it's away - there's nothing to stop it. Once the ninth-but-last, or seventeenth-but-last, or whatever, upload goes through, new work requests start up with a vengeance, and it can download hundreds in the next few minutes. That could drain any ready-so-send queue very quickly indeed: and with Matt's recent change to a temporary table for ready-to-send, instead of a free-running query from each splitter, it could take a while for new work generation to pick up.
ID: 913760 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913761 - Posted: 3 Jul 2009, 22:37:18 UTC

So there is no longer an upload crisis?
ID: 913761 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 913764 - Posted: 3 Jul 2009, 22:42:04 UTC - in response to Message 913761.  

So there is no longer an upload crisis?

Correct. (Until the next one...)

Note that if the downloads remain borked for a significant length of time, then when they come back the resulting pressure on bandwidth could cause another upload crisis; and so it goes on...

F.
ID: 913764 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13765
Credit: 208,696,464
RAC: 304
Australia
Message 913765 - Posted: 3 Jul 2009, 22:42:07 UTC - in response to Message 913761.  

So there is no longer an upload crisis?

Not at the moment, all mine are uploading as they finish. It'll probably be a few hours though before all the outstanding work has been uploaded.
Grant
Darwin NT
ID: 913765 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 913766 - Posted: 3 Jul 2009, 22:44:51 UTC - in response to Message 913761.  

So there is no longer an upload crisis?

There was never a crisis. There was a bit of a squash, like fans leaving a football ground at the end of a match, but game's over now and the freeway is flowing again.

As Ned keeps (very properly) reminding us, BOINC was designed to cope with these ups and downs. It quietly goes about its business, backing off at the first sign of trouble, and biding its time with saintly patience. It's only us humans that use words like 'panic' and 'crisis'.
ID: 913766 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913767 - Posted: 3 Jul 2009, 22:49:52 UTC - in response to Message 913766.  

So I can tell my friend (who somehow got 109 tasks in under 10 hours) to run an update command now?

(Update also forces the uploads, right?
ID: 913767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13765
Credit: 208,696,464
RAC: 304
Australia
Message 913768 - Posted: 3 Jul 2009, 22:56:41 UTC - in response to Message 913767.  


Just tell them to make sure network access is enabled & then just let it do it's thing. They'll all upload the next time they try.
Grant
Darwin NT
ID: 913768 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 913770 - Posted: 3 Jul 2009, 22:59:47 UTC - in response to Message 913767.  

So I can tell my friend (who somehow got 109 tasks in under 10 hours) to run an update command now?

(Update also forces the uploads, right?

Wrong, I'm afraid.

When the job is finished, two things happen, in this order:

First, the data has to be uploaded. That's a straight file transfer.

Once the data resides on the server's disk (following a successful transfer), the job can be reported.

You can't force anything. You can instigate a 'please try again' by visiting the Transfers tab (in advanced view), selecting a pending upload, and clicking the 'Retry now' button. If that seems to work consistently, you can re-submit the entire queue by choosing "Retry communications" from the Advanced drop-down menu.

Once all the uploads have completed, then you can use the Update button to instigate the second, reporting stage. But you may find that BOINC relieves you of this minor chore by requesting new work, and reporting what it's done already while it's at it.
ID: 913770 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913774 - Posted: 3 Jul 2009, 23:02:11 UTC - in response to Message 913770.  

K

I found it strange that it reported so many tasks then just stopped, I assumed that all of the tasks it had were stuck in the random back offs.

Unfortunatley, I can't seem to get into his client through the select computer command.
ID: 913774 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 913785 - Posted: 3 Jul 2009, 23:24:38 UTC - in response to Message 913755.  

So it's all over now? No more failed uploads?

How did it clear magically like that?

Because there is a huge difference between running at 95% capacity, and running at 105% capacity. The moment we dipped under about 97% we went from most uploads failing to most uploads working, and then the need went away.
ID: 913785 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 913803 - Posted: 3 Jul 2009, 23:52:43 UTC - in response to Message 913766.  
Last modified: 3 Jul 2009, 23:54:56 UTC

So there is no longer an upload crisis?

There was never a crisis. There was a bit of a squash, like fans leaving a football ground at the end of a match, but game's over now and the freeway is flowing again.

As Ned keeps (very properly) reminding us, BOINC was designed to cope with these ups and downs. It quietly goes about its business, backing off at the first sign of trouble, and biding its time with saintly patience. It's only us humans that use words like 'panic' and 'crisis'.


Well you might say they wasn't a "crisis" but I had 4 crunchers that I wanted to reduce to 2 due to the heat in my flat, but they couldn't upload for DAYS and DAYS, I didn't want to abort for the sake of the wingman as it isn't their fault. I have finally decide to stop ALL seti@home as I cannot control it. The current temperature in my flat is 33c (91F) at 0:48am!! with only 2 machines running. No, seti@home is out of user control, and whilst I am crunching WCG, I will finish this week end and my 2 redundant crunchers are destined for the council recycle bin.

Bernie

PS If anyone is interested they are BAV-3 and BAV-4 in my list
ID: 913803 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913805 - Posted: 3 Jul 2009, 23:54:50 UTC - in response to Message 913803.  

ILL TAKE THOSE CRUNCHERS OFF OF YOUR HANDS!!!!

(seriously, we need to find a way to reroute that garbage truck to Illinois)
ID: 913805 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 913807 - Posted: 4 Jul 2009, 0:00:04 UTC - in response to Message 913805.  

ILL TAKE THOSE CRUNCHERS OFF OF YOUR HANDS!!!!

(seriously, we need to find a way to reroute that garbage truck to Illinois)


LOL don't think dustcarts (what we call em in the UK) do transatlantic crossings really well!!

Bernie


ID: 913807 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913809 - Posted: 4 Jul 2009, 0:10:20 UTC - in response to Message 913807.  

FedEx work for you?
ID: 913809 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 913825 - Posted: 4 Jul 2009, 1:05:54 UTC - in response to Message 913803.  
Last modified: 4 Jul 2009, 1:07:29 UTC

So there is no longer an upload crisis?

There was never a crisis. There was a bit of a squash, like fans leaving a football ground at the end of a match, but game's over now and the freeway is flowing again.

As Ned keeps (very properly) reminding us, BOINC was designed to cope with these ups and downs. It quietly goes about its business, backing off at the first sign of trouble, and biding its time with saintly patience. It's only us humans that use words like 'panic' and 'crisis'.


Well you might say they wasn't a "crisis" but I had 4 crunchers that I wanted to reduce to 2 due to the heat in my flat, but they couldn't upload for DAYS and DAYS, I didn't want to abort for the sake of the wingman as it isn't their fault. I have finally decide to stop ALL seti@home as I cannot control it. The current temperature in my flat is 33c (91F) at 0:48am!! with only 2 machines running. No, seti@home is out of user control, and whilst I am crunching WCG, I will finish this week end and my 2 redundant crunchers are destined for the council recycle bin.

Bernie

PS If anyone is interested they are BAV-3 and BAV-4 in my list


LOL...

I don't need anymore old timers, got enough already! ;-)

Just an FYI: Only you can see the host names you have for your machines. All the rest of us can only see their HID.

Alinator
ID: 913825 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13765
Credit: 208,696,464
RAC: 304
Australia
Message 913830 - Posted: 4 Jul 2009, 1:14:55 UTC - in response to Message 913825.  


Hmm.
Something's borked. Assimilator queue is draining, uploads are tapering off. But the splitters still haven't kicked in.
Grant
Darwin NT
ID: 913830 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913836 - Posted: 4 Jul 2009, 1:30:55 UTC - in response to Message 913830.  

Assimilator... what exactly is getting absorbed?
ID: 913836 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13765
Credit: 208,696,464
RAC: 304
Australia
Message 913840 - Posted: 4 Jul 2009, 1:41:44 UTC - in response to Message 913836.  

Assimilator... what exactly is getting absorbed?

Results that have been validated get moved in to the Science database.
Grant
Darwin NT
ID: 913840 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 913841 - Posted: 4 Jul 2009, 1:46:45 UTC

Yeah... that table is a little confusing...

I just want to know the following, really, at any given time:

How many people are waiting to upload

How many tasks are ready to be shipped out right now

How many are in the feeder ready to be put into the scheduler

What day are we at
ID: 913841 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (18) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.