Panic Mode On (81) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next
Author Message
KathyProject donor
Avatar
Send message
Joined: 5 Jan 03
Posts: 311
Credit: 4,875,303
RAC: 8,927
United States
Message 1335459 - Posted: 7 Feb 2013, 15:28:36 UTC

Haven't been able to connect for 3 days now, and get the same messages every time:

2/7/2013 10:25:47 AM | SETI@home | Reporting 81 completed tasks, requesting new tasks for CPU and ATI
2/7/2013 10:26:11 AM | SETI@home | Scheduler request failed: Couldn't connect to server
2/7/2013 10:26:15 AM | | Project communication failed: attempting access to reference site
2/7/2013 10:26:16 AM | | Internet access OK - project servers may be temporarily down.

____________

Profile James SotherdenProject donor
Avatar
Send message
Joined: 16 May 99
Posts: 8791
Credit: 34,109,593
RAC: 59,305
United States
Message 1335460 - Posted: 7 Feb 2013, 15:30:55 UTC
Last modified: 7 Feb 2013, 15:32:00 UTC

Yes it seems lately we have had a lot of problems. No work,ghost work units, Cant report, Cant download, Cant upload. Or download speeds so slow you could go to the lab and get them faster inperson.

I can assure you that no one likes it. Not even the lab crew. But when you are the Number 1 Boinc project in terms of active members and underfunded its understandable.

I do other projects when I have no work. Id rather not, but I do just to keep my computers busy. And they are worthy projects in their own right.
____________

Old James

Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 978
Credit: 51,441,145
RAC: 37,041
United States
Message 1335483 - Posted: 7 Feb 2013, 16:34:53 UTC

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.
____________


I don't buy computers, I build them!!

Gone
Send message
Joined: 31 May 99
Posts: 150
Credit: 125,774,760
RAC: 2
United Kingdom
Message 1335485 - Posted: 7 Feb 2013, 16:43:45 UTC - in response to Message 1335483.

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.





Similar thing just happened to me, except I was only dreaming.

When I awoke all was still broken ...

:)
____________

Profile Cliff HardingProject donor
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 978
Credit: 51,441,145
RAC: 37,041
United States
Message 1335490 - Posted: 7 Feb 2013, 16:55:38 UTC - in response to Message 1335485.

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.





Similar thing just happened to me, except I was only dreaming.

When I awoke all was still broken ...

:)


My luck is still holding, just got 2 more Opencl units since my initial post.
____________


I don't buy computers, I build them!!

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1335495 - Posted: 7 Feb 2013, 17:09:51 UTC

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 11,565,199
RAC: 12,724
United Kingdom
Message 1335496 - Posted: 7 Feb 2013, 17:19:32 UTC - in response to Message 1335495.

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?


This is the place.
Welcome to Room 101.
____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1335500 - Posted: 7 Feb 2013, 17:27:22 UTC - in response to Message 1335496.

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?


This is the place.
Welcome to Room 101.


Thanks. Guess I need a crash course in reading comprehension. :)

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,124,661
RAC: 4,389
United States
Message 1335501 - Posted: 7 Feb 2013, 17:28:16 UTC - in response to Message 1335495.
Last modified: 7 Feb 2013, 17:47:59 UTC

I had been running NNT to trim out the existing work units. I don't run GPU work units from SETI for a number of reasons so I suppose my frustration level is lower than many. These days, as the project repeatedly bounces its collective head against the AP mass traffic tie up - something which has happened many times over the past few months without a resolution -- when I encounter the 'Dead SETI scenaria, I simply suspend the project on the handful of workstations still running SETI and feed projects that don't suffer from this sort of persistent reliability problem.

Perhaps at some point the folks back at the project will figure out that running the high traffic volume AP work units along with the worthless less than one minute CPU work units only serves to reduce the projects effective work and thus might be best avoided.

Until then, I figure to watch closely, run and report in remaining work during the increasingly rare full functionality periods and once that clears, simply watch to see if a project learning process is demonstrated.

Alternatively, it might be part of a grand plan at the project to reduce problems by pushing away users until the traffic levels are low enough to be sustained.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5811
Credit: 58,788,301
RAC: 48,478
Australia
Message 1335513 - Posted: 7 Feb 2013, 17:59:36 UTC - in response to Message 1335501.


Woke up this morning to find both systems rapidly running out of work. Large backoffs due to not being able to connect to the Scheduler when it tries to do so.
____________
Grant
Darwin NT.

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 11,565,199
RAC: 12,724
United Kingdom
Message 1335515 - Posted: 7 Feb 2013, 18:07:27 UTC - in response to Message 1335513.


Woke up this morning to find both systems rapidly running out of work. Large backoffs due to not being able to connect to the Scheduler when it tries to do so.

This is obviously the start of a 1920s Blues song.
"Woke up this morning,
Found I was nearly out of work.
Woke up this morning,
etc...
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3413
Credit: 46,478,578
RAC: 8,143
Russia
Message 1335519 - Posted: 7 Feb 2013, 18:13:19 UTC
Last modified: 7 Feb 2013, 18:13:40 UTC

Can't report tasks, scheduler not responding. Anyone sees the same now ?
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5811
Credit: 58,788,301
RAC: 48,478
Australia
Message 1335531 - Posted: 7 Feb 2013, 18:36:53 UTC - in response to Message 1335519.
Last modified: 7 Feb 2013, 18:44:51 UTC

Can't report tasks, scheduler not responding. Anyone sees the same now ?

For about 3-4 days now.


EDIT- and if you are able to contact the Scheduler & don't get a "Failure when receiving data from the peer message" it takes 2-5min to get a response.
____________
Grant
Darwin NT.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,807,818
RAC: 2,219
United States
Message 1335544 - Posted: 7 Feb 2013, 19:05:48 UTC
Last modified: 7 Feb 2013, 19:44:55 UTC

Now here is a possible cause that super crunchers won't likely realize. Because of the 100 unit max limit is less than what I would normally have based on queue size, the client is requesting more units every five minutes only to be (when I get through) rebuffed due to the 100 unit limit. Now super crunchers are likely to actually have completed units to report every five minutes but for me, maybe only 2-3 units per hour on average. So 9 or 10 pointless requests than are required are sent to the server. Now multiply that by all the hosts that are crunching as "slow" or slower than me (daily RAC ranks my host in the top 4,500-5,000 range) and that adds up to a lot of meaningless requests to the server.

Could that be causing a problem?
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5811
Credit: 58,788,301
RAC: 48,478
Australia
Message 1335548 - Posted: 7 Feb 2013, 19:15:57 UTC - in response to Message 1335544.

Could that be causing a problem?

It wouldn't be helping with the bandwidth issues, but it wouldn't be causing a problem in it's own right.
We've had Scheduler issues on & off for months now, but the current one began about 4 days ago.
Prior to that, for a while at least, Scheduler repsonses were nice & quick. Now they take forever, if you can connect & if you don't get an error once you've done so.

____________
Grant
Darwin NT.

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 161
Credit: 16,242,894
RAC: 24,714
Netherlands
Message 1335551 - Posted: 7 Feb 2013, 19:18:34 UTC

sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S
____________

ExchangeMan
Volunteer tester
Send message
Joined: 9 Jan 00
Posts: 108
Credit: 135,717,275
RAC: 241,880
United States
Message 1335558 - Posted: 7 Feb 2013, 19:41:36 UTC - in response to Message 1335551.

sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S

I hate it when I get tons of shorties. Too much bandwidth for too little credit.

____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 12,124,661
RAC: 4,389
United States
Message 1335560 - Posted: 7 Feb 2013, 19:50:47 UTC - in response to Message 1335558.

Given the evidence of the past many months, it seems that there is no solution envisioned or affordable for the clear bandwidth choke point, so perhaps instead some software filtering solution might be a reasonable approach to mitigating the obvious problem.

Of course, there is another approach, which takes no effort at all, as people realize that the project reliability has in fact been seriously compromised due to traffic levels, they may do as Greeley suggested -- 'go elsewhere dear user' -- there are in fact many of other more reliable and interesting projects out there.



sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S

I hate it when I get tons of shorties. Too much bandwidth for too little credit.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,807,818
RAC: 2,219
United States
Message 1335561 - Posted: 7 Feb 2013, 19:51:14 UTC - in response to Message 1335548.
Last modified: 7 Feb 2013, 19:51:54 UTC

Could that be causing a problem?

It wouldn't be helping with the bandwidth issues, but it wouldn't be causing a problem in it's own right.
We've had Scheduler issues on & off for months now, but the current one began about 4 days ago.
Prior to that, for a while at least, Scheduler repsonses were nice & quick. Now they take forever, if you can connect & if you don't get an error once you've done so.


I know but it's acting a lot like a memory leak. Everything stays responsive right up to the point the application starts hitting the swap space hard and it becomes slow and unresponsive.

Not saying it's a memory leak, just it's acting as if some buffer that isn't being emptied quite as fast as it is being filled finally overflows and it becomes a crapshot whether or not the next item actually gets into the buffer or lost.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Copyright © 2014 University of California