Panic Mode On (81) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next
Author Message
Kathy
Avatar
Send message
Joined: 5 Jan 03
Posts: 310
Credit: 4,527,781
RAC: 10,203
United States
Message 1335459 - Posted: 7 Feb 2013, 15:28:36 UTC

Haven't been able to connect for 3 days now, and get the same messages every time:

2/7/2013 10:25:47 AM | SETI@home | Reporting 81 completed tasks, requesting new tasks for CPU and ATI
2/7/2013 10:26:11 AM | SETI@home | Scheduler request failed: Couldn't connect to server
2/7/2013 10:26:15 AM | | Project communication failed: attempting access to reference site
2/7/2013 10:26:16 AM | | Internet access OK - project servers may be temporarily down.

____________

Profile James Sotherden
Avatar
Send message
Joined: 16 May 99
Posts: 8554
Credit: 31,480,458
RAC: 57,505
United States
Message 1335460 - Posted: 7 Feb 2013, 15:30:55 UTC
Last modified: 7 Feb 2013, 15:32:00 UTC

Yes it seems lately we have had a lot of problems. No work,ghost work units, Cant report, Cant download, Cant upload. Or download speeds so slow you could go to the lab and get them faster inperson.

I can assure you that no one likes it. Not even the lab crew. But when you are the Number 1 Boinc project in terms of active members and underfunded its understandable.

I do other projects when I have no work. Id rather not, but I do just to keep my computers busy. And they are worthy projects in their own right.
____________

Old James

Profile Cliff Harding
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 892
Credit: 49,484,102
RAC: 38,702
United States
Message 1335483 - Posted: 7 Feb 2013, 16:34:53 UTC

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.
____________


I don't buy computers, I build them!!

Gone
Send message
Joined: 31 May 99
Posts: 150
Credit: 125,774,629
RAC: 154
United Kingdom
Message 1335485 - Posted: 7 Feb 2013, 16:43:45 UTC - in response to Message 1335483.

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.





Similar thing just happened to me, except I was only dreaming.

When I awoke all was still broken ...

:)
____________

Profile Cliff Harding
Volunteer tester
Avatar
Send message
Joined: 18 Aug 99
Posts: 892
Credit: 49,484,102
RAC: 38,702
United States
Message 1335490 - Posted: 7 Feb 2013, 16:55:38 UTC - in response to Message 1335485.

Don't know what happened, just woke up from dozing off for about an hour and noticed that the scheduler has dumped almost a full fuel load into my fastest rig, finishing as I opened my eyes. Now if it would only bless my other rig, both machines will quit bitc(&^$$^%%$inhg.





Similar thing just happened to me, except I was only dreaming.

When I awoke all was still broken ...

:)


My luck is still holding, just got 2 more Opencl units since my initial post.
____________


I don't buy computers, I build them!!

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1335495 - Posted: 7 Feb 2013, 17:09:51 UTC

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 10,946,358
RAC: 13,370
United Kingdom
Message 1335496 - Posted: 7 Feb 2013, 17:19:32 UTC - in response to Message 1335495.

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?


This is the place.
Welcome to Room 101.
____________

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1335500 - Posted: 7 Feb 2013, 17:27:22 UTC - in response to Message 1335496.

Sure would be nice to have a place to get some information as to what is being done to correct this issue. Or is there and I just don't know where to look?


This is the place.
Welcome to Room 101.


Thanks. Guess I need a crash course in reading comprehension. :)

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,907,222
RAC: 3,818
United States
Message 1335501 - Posted: 7 Feb 2013, 17:28:16 UTC - in response to Message 1335495.
Last modified: 7 Feb 2013, 17:47:59 UTC

I had been running NNT to trim out the existing work units. I don't run GPU work units from SETI for a number of reasons so I suppose my frustration level is lower than many. These days, as the project repeatedly bounces its collective head against the AP mass traffic tie up - something which has happened many times over the past few months without a resolution -- when I encounter the 'Dead SETI scenaria, I simply suspend the project on the handful of workstations still running SETI and feed projects that don't suffer from this sort of persistent reliability problem.

Perhaps at some point the folks back at the project will figure out that running the high traffic volume AP work units along with the worthless less than one minute CPU work units only serves to reduce the projects effective work and thus might be best avoided.

Until then, I figure to watch closely, run and report in remaining work during the increasingly rare full functionality periods and once that clears, simply watch to see if a project learning process is demonstrated.

Alternatively, it might be part of a grand plan at the project to reduce problems by pushing away users until the traffic levels are low enough to be sustained.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5698
Credit: 56,466,006
RAC: 49,094
Australia
Message 1335513 - Posted: 7 Feb 2013, 17:59:36 UTC - in response to Message 1335501.


Woke up this morning to find both systems rapidly running out of work. Large backoffs due to not being able to connect to the Scheduler when it tries to do so.
____________
Grant
Darwin NT.

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 922
Credit: 10,946,358
RAC: 13,370
United Kingdom
Message 1335515 - Posted: 7 Feb 2013, 18:07:27 UTC - in response to Message 1335513.


Woke up this morning to find both systems rapidly running out of work. Large backoffs due to not being able to connect to the Scheduler when it tries to do so.

This is obviously the start of a 1920s Blues song.
"Woke up this morning,
Found I was nearly out of work.
Woke up this morning,
etc...
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,109,345
RAC: 20,113
Russia
Message 1335519 - Posted: 7 Feb 2013, 18:13:19 UTC
Last modified: 7 Feb 2013, 18:13:40 UTC

Can't report tasks, scheduler not responding. Anyone sees the same now ?
____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38327
Credit: 561,202,244
RAC: 651,901
United States
Message 1335523 - Posted: 7 Feb 2013, 18:18:48 UTC - in response to Message 1335519.

Can't report tasks, scheduler not responding. Anyone sees the same now ?

Been extremely hard to contact the scheduler for days now.....
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5698
Credit: 56,466,006
RAC: 49,094
Australia
Message 1335531 - Posted: 7 Feb 2013, 18:36:53 UTC - in response to Message 1335519.
Last modified: 7 Feb 2013, 18:44:51 UTC

Can't report tasks, scheduler not responding. Anyone sees the same now ?

For about 3-4 days now.


EDIT- and if you are able to contact the Scheduler & don't get a "Failure when receiving data from the peer message" it takes 2-5min to get a response.
____________
Grant
Darwin NT.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,705,455
RAC: 2,228
United States
Message 1335544 - Posted: 7 Feb 2013, 19:05:48 UTC
Last modified: 7 Feb 2013, 19:44:55 UTC

Now here is a possible cause that super crunchers won't likely realize. Because of the 100 unit max limit is less than what I would normally have based on queue size, the client is requesting more units every five minutes only to be (when I get through) rebuffed due to the 100 unit limit. Now super crunchers are likely to actually have completed units to report every five minutes but for me, maybe only 2-3 units per hour on average. So 9 or 10 pointless requests than are required are sent to the server. Now multiply that by all the hosts that are crunching as "slow" or slower than me (daily RAC ranks my host in the top 4,500-5,000 range) and that adds up to a lot of meaningless requests to the server.

Could that be causing a problem?
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5698
Credit: 56,466,006
RAC: 49,094
Australia
Message 1335548 - Posted: 7 Feb 2013, 19:15:57 UTC - in response to Message 1335544.

Could that be causing a problem?

It wouldn't be helping with the bandwidth issues, but it wouldn't be causing a problem in it's own right.
We've had Scheduler issues on & off for months now, but the current one began about 4 days ago.
Prior to that, for a while at least, Scheduler repsonses were nice & quick. Now they take forever, if you can connect & if you don't get an error once you've done so.

____________
Grant
Darwin NT.

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 159
Credit: 15,396,205
RAC: 27,229
Netherlands
Message 1335551 - Posted: 7 Feb 2013, 19:18:34 UTC

sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S
____________

ExchangeMan
Volunteer tester
Send message
Joined: 9 Jan 00
Posts: 108
Credit: 125,553,691
RAC: 159,942
United States
Message 1335558 - Posted: 7 Feb 2013, 19:41:36 UTC - in response to Message 1335551.

sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S

I hate it when I get tons of shorties. Too much bandwidth for too little credit.

____________

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,907,222
RAC: 3,818
United States
Message 1335560 - Posted: 7 Feb 2013, 19:50:47 UTC - in response to Message 1335558.

Given the evidence of the past many months, it seems that there is no solution envisioned or affordable for the clear bandwidth choke point, so perhaps instead some software filtering solution might be a reasonable approach to mitigating the obvious problem.

Of course, there is another approach, which takes no effort at all, as people realize that the project reliability has in fact been seriously compromised due to traffic levels, they may do as Greeley suggested -- 'go elsewhere dear user' -- there are in fact many of other more reliable and interesting projects out there.



sometime this afternoon the server "granted" my main rig with 200 WU's ... Soon to discover that all GPU were shorties... increasing the server load even more :S

I hate it when I get tons of shorties. Too much bandwidth for too little credit.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,705,455
RAC: 2,228
United States
Message 1335561 - Posted: 7 Feb 2013, 19:51:14 UTC - in response to Message 1335548.
Last modified: 7 Feb 2013, 19:51:54 UTC

Could that be causing a problem?

It wouldn't be helping with the bandwidth issues, but it wouldn't be causing a problem in it's own right.
We've had Scheduler issues on & off for months now, but the current one began about 4 days ago.
Prior to that, for a while at least, Scheduler repsonses were nice & quick. Now they take forever, if you can connect & if you don't get an error once you've done so.


I know but it's acting a lot like a memory leak. Everything stays responsive right up to the point the application starts hitting the swap space hard and it becomes slow and unresponsive.

Not saying it's a memory leak, just it's acting as if some buffer that isn't being emptied quite as fast as it is being filled finally overflows and it becomes a crapshot whether or not the next item actually gets into the buffer or lost.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (81) Server Problems?

Copyright © 2014 University of California