Panic Mode On (70) Server problems?

Message boards : Number crunching : Panic Mode On (70) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1203311 - Posted: 7 Mar 2012, 11:21:49 UTC
Last modified: 7 Mar 2012, 11:25:35 UTC

07/03/2012 15:20:18 SETI@home Reporting 259 completed tasks, requesting new tasks for CPU and GPU
07/03/2012 15:20:40 Project communication failed: attempting access to reference site
07/03/2012 15:20:40 SETI@home Scheduler request failed: Couldn't connect to server


07/03/2012 15:24:46 SETI@home Reporting 261 completed tasks, requesting new tasks for CPU and GPU
07/03/2012 15:25:08 Project communication failed: attempting access to reference site
07/03/2012 15:25:08 SETI@home Scheduler request failed: Couldn't connect to server
07/03/2012 15:25:10 Internet access OK - project servers may be temporarily down.
ID: 1203311 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1203313 - Posted: 7 Mar 2012, 11:28:30 UTC

yes, Synergy has trouble handling the connections.

It was running smoothly at first, I wonder what changed.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1203313 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1203317 - Posted: 7 Mar 2012, 11:56:03 UTC - in response to Message 1203313.  

yes, Synergy has trouble handling the connections.

It was running smoothly at first, I wonder what changed.

Probably too many hosts reporting at once, some of them would have been backed off earlier as the project was down for maintenance.

Claggy
ID: 1203317 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1203323 - Posted: 7 Mar 2012, 12:43:55 UTC

I would vote for synergy just being over loaded. Look at the server status page. Half of the list is on synergy. Granted.. some of them are disabled, but it's still a lot of resource-intensive processes running simultaneously.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1203323 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1203338 - Posted: 7 Mar 2012, 13:56:59 UTC

Looking back in my logs I see the expected "Scheduler request failed: HTTP gateway timeout" messages after the maintenance completed. Since the most of my requests are met with "Project has no tasks available" or "This computer has reached a limit on tasks in progress".

It is a little odd seeing several no tasks message and then the limit message. It seems to me like the logic to check that would go before checking for available tasks. Through the logs I see the response for limit reached occurring on average much faster than the repose for no tasks. It doesn't seem to be a great difference. In the 0-3 second range for limit and 5-30 seconds for no tasks.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1203338 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1203360 - Posted: 7 Mar 2012, 14:59:52 UTC

And now at 14:58hrs GMT
07/03/2012 14:54:04 | SETI@home | Sending scheduler request: To fetch work.
07/03/2012 14:54:04 | SETI@home | Requesting new tasks for NVIDIA GPU
07/03/2012 14:55:27 | SETI@home | Scheduler request failed: HTTP internal server error
07/03/2012 14:57:29 | SETI@home | Sending scheduler request: To fetch work.
07/03/2012 14:57:29 | SETI@home | Reporting 1 completed tasks, requesting new tasks for NVIDIA GPU
07/03/2012 14:57:51 | SETI@home | Scheduler request failed: Couldn't connect to server
07/03/2012 14:57:54 | | Project communication failed: attempting access to reference site
07/03/2012 14:57:56 | | Internet access OK - project servers may be temporarily down.

Are we back to square one?


Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1203360 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1203372 - Posted: 7 Mar 2012, 16:06:36 UTC - in response to Message 1203370.  
Last modified: 7 Mar 2012, 16:09:01 UTC

Just got 4 x GPU tasks.. smidgin after 08:00 PT:-)
Someone [?Matt?] must have gotten in early and given a server or two a boot in the OS's:-)
[edit]

And 7 mins later boinc asks for more crunchies and gets told to go play with itself.. There aint no more available.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1203372 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1203394 - Posted: 7 Mar 2012, 16:57:31 UTC

The more whining I hear about the limits, the more I am tempted to say next time a problem crops up 'sod it' and just leave you big guys to your own devices.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1203394 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1203405 - Posted: 7 Mar 2012, 17:27:25 UTC - in response to Message 1203379.  
Last modified: 7 Mar 2012, 17:30:07 UTC

there always 200,000 WUs to 300,000 WUs to sent , but never get anything.

1 part of the problem is the tiny-minuscule-microscopic-subatomic-nothingness Server sub-cache which contain nothing, 2 persons on 500-3000 Querries/second get something and all the other ones dont get nothing-niet-nada.

and you cannot ask again before 5 more minutes :(
and if you dont get answer : you cannot ask again before another 5 minutes.

that Server cache NEEDS to be more than double, needs to be 10X times bigger.

if it s 100 tasks cached per second: need to be 1000
if it s 1000 tasks cached per minute: need to be 10,000 !


another part of the problem is the little limits of task a PC get, we arent in 1990s anymore. the PC crunchers have 20X-200X-2000X the processing power we've got last century. the maximum task should be based on the RAC the PC has and not on a max limit whatever the power you have.


With a RAC of 19,743.64 I can't see why you have a big problem with the current limits!
ID: 1203405 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1203435 - Posted: 7 Mar 2012, 18:38:02 UTC - in response to Message 1203405.  


Network traffic is still very ragged, and my log is full of "Scheduler request failed: Couldn't connect to server" & "Scheduler request failed: HTTP internal server" errors.
Grant
Darwin NT
ID: 1203435 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1203436 - Posted: 7 Mar 2012, 18:39:31 UTC - in response to Message 1203394.  

The more whining I hear about the limits, the more I am tempted to say next time a problem crops up 'sod it' and just leave you big guys to your own devices.

It would be nice if they could sort out the DCF problem. It's been a while now.
Grant
Darwin NT
ID: 1203436 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1203438 - Posted: 7 Mar 2012, 18:43:21 UTC

No errors on my 5 machines just


1513 SETI@home 07/03/2012 18:31:02 Project has no tasks available

ID: 1203438 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1203439 - Posted: 7 Mar 2012, 18:46:22 UTC - in response to Message 1203438.  

No errors on my 5 machines just


1513 SETI@home 07/03/2012 18:31:02 Project has no tasks available

I've got a few of those, along with the odd request that does result in work.
But most of the requests result in an error message.
Grant
Darwin NT
ID: 1203439 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1203443 - Posted: 7 Mar 2012, 19:06:54 UTC - in response to Message 1203442.  
Last modified: 7 Mar 2012, 19:07:15 UTC


With a RAC of 19,743.64 I can't see why you have a big problem with the current limits!


if it s me : i have 27k RAC (24k seti) and should be : 35k seti - 0 everything else

He was referring to your fastest computer, not your total.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1203443 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1203488 - Posted: 7 Mar 2012, 21:15:31 UTC - in response to Message 1203442.  


With a RAC of 19,743.64 I can't see why you have a big problem with the current limits!


if it s me : i have 27k RAC (24k seti) and should be : 35k seti - 0 everything else


It's the per computer RAC that matters which are 19,818.22 and 7,865.20. No sensible regime could be based on the overall RAC.

ID: 1203488 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22221
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1203489 - Posted: 7 Mar 2012, 21:16:29 UTC
Last modified: 7 Mar 2012, 21:21:56 UTC

The number of tasks available for distribution sits at around 200k. There are upper and lower limits in place and the pool cycles between the two limits.

They are distributed in lots of 100. When a lot is assigned to a cruncher another lot of 100 is requested. If you make your request for new work when there are some in the available to be assigned you will get some, otherwise you will get the message about no work available. Some crunchers are far better at hitting the short window of work being available than others - its a fact of life... (One of my crunchers gets work about two out of three attempts, the other about one in five - there are no prizes for guessing which one wants the most work....)

RAC (Recent Average Credit) is a sort of rolling average, it is supposed to smooth out the lumps and bumps, but is very much more sensitive to a period of low credit, such as happens when there is an outage than a sudden burst of high credit such as might happen shortly after an extended outage such as the one we've just been through - if you are really that anxious about your rate of credit accrual then it would be far better to generate your own figures from the raw data, this will show trends more clearly.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1203489 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1203492 - Posted: 7 Mar 2012, 21:28:21 UTC - in response to Message 1203489.  

The number of tasks available for distribution sits at around 200k. There are upper and lower limits in place and the pool cycles between the two limits.

They are distributed in lots of 100. When a lot is assigned to a cruncher another lot of 100 is requested. If you make your request for new work when there are some in the available to be assigned you will get some, otherwise you will get the message about no work available. Some crunchers are far better at hitting the short window of work being available than others - its a fact of life... (One of my crunchers gets work about two out of three attempts, the other about one in five - there are no prizes for guessing which one wants the most work....)

That seems a common observation.

I do wonder if what happens with a large request from a fast computer comes in might be:

Quick check to see if there are any in the 100 feeder-lot.
OK, there are, we can continue.
Long time spent reconciling the work in progress with work allocated, seeing if any need to be resent.
Long time spend housekeeping on the work being reported, and acknowledging it.
Er, what was the question again?
Ooops, they've all gone - those slippery little 1-WU requests have slipped in and out again, emptying the pot.

LOL.
ID: 1203492 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1203529 - Posted: 8 Mar 2012, 0:32:52 UTC

My AP-only cache is nearing empty. I've got a little more than 1 full day left. It was full at 10 days before the problems arose. I'll be idle waiting for the v6 theory to be implemented.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1203529 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1203593 - Posted: 8 Mar 2012, 5:06:48 UTC - in response to Message 1202414.  
Last modified: 8 Mar 2012, 5:10:25 UTC

Still getting "Scheduler request failed: Couldn't connect to server" messages, but not as many as i had been. Now it's mostly "Project has no tasks available" messages. Every now & then i get a Wu or 2.
Network traffic is still looking very ragged.
Grant
Darwin NT
ID: 1203593 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1203614 - Posted: 8 Mar 2012, 7:28:19 UTC

My single core machine is keeping full. Every time it asks for work, it gets 1 MB.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1203614 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Panic Mode On (70) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.