Panic Mode On (78) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next
Author Message
Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 20,510,654
RAC: 29,165
United States
Message 1301867 - Posted: 3 Nov 2012, 23:42:58 UTC - in response to Message 1301815.

Thank you, Juan!

Now we know that they know, and we all know that means they will fix the problem as soon as they can, as they have always done before. :)

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 369
Credit: 2,479,500
RAC: 2,353
United States
Message 1301871 - Posted: 4 Nov 2012, 0:00:59 UTC - in response to Message 1301828.

They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab.

I wish that were true.
I'll mention it again- even with only a couple of tasks to report & No New Tasks set the Scheduler still usually times out. Theres 1 time in about 20 where it doesn't.
Uploads have been slow for much of this time as well.

Well it does on my system. Task gets done, it uploads and then the scheduler reports it, it goes through and the task vanishes from my client's task list.

Don't know what to say.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,450,263
RAC: 42,585
Australia
Message 1301872 - Posted: 4 Nov 2012, 0:02:10 UTC - in response to Message 1301858.

Something other than just bandwidth saturation is at play here.

That's my feeling.
There have been many times in the past where network traffic has been maxed out, and downloads are pretty much impossible, but you are still able to contact the Scheduler to report work & get more work allocated.

The fact is that even now with the network traffic maxed out, if you do (some how) manage to get some work, it's downloading fairly quickly. Certainly much, much faster than in the past, and when you were still able to get a response from the Scheduler.

Over the last few months we've had issues with Scheduler timeouts, but not for nearly as long as this time, nor nearly as severe- from memory i would get a response about 1 in 5 to 7 attemps. Now i'm lucky if it's 1 in 20, No New Tasks set or not.
Hence i suspect it's a system configuration/load problem, not a network load one.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37434
Credit: 500,547,857
RAC: 535,489
United States
Message 1301875 - Posted: 4 Nov 2012, 0:05:24 UTC - in response to Message 1301871.

They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab.

I wish that were true.
I'll mention it again- even with only a couple of tasks to report & No New Tasks set the Scheduler still usually times out. Theres 1 time in about 20 where it doesn't.
Uploads have been slow for much of this time as well.

Well it does on my system. Task gets done, it uploads and then the scheduler reports it, it goes through and the task vanishes from my client's task list.

Don't know what to say.

It gets really hard to quantify it when I have 9 rigs trying to report 1000s of WUs. Hits and misses go by unnoticed by me. Until I check the stats page and I see some rigs have not reported for hours.
That page is usually my barometer for the rigs, if I see one has not reported for a while, I suspect a crash and check it out.

Not a reliable barometer at the moment.
____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,450,263
RAC: 42,585
Australia
Message 1301878 - Posted: 4 Nov 2012, 0:07:17 UTC - in response to Message 1301871.

They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab.

I wish that were true.
I'll mention it again- even with only a couple of tasks to report & No New Tasks set the Scheduler still usually times out. Theres 1 time in about 20 where it doesn't.
Uploads have been slow for much of this time as well.

Well it does on my system. Task gets done, it uploads and then the scheduler reports it, it goes through and the task vanishes from my client's task list.

Don't know what to say.

My client_state is 2.4MB in size, my sched_request_setiathome.berkeley.edu is 450kB in size. I suspect yours are a lot smaller. You're in the US, i'm a few thousand kms away.
End result- you may be able to get work, i'm lucky if i can even report work- even after 30min of endless Update clicking with No New Tasks set.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37434
Credit: 500,547,857
RAC: 535,489
United States
Message 1301880 - Posted: 4 Nov 2012, 0:14:38 UTC - in response to Message 1301878.

They will be acknowledged as report right now only if you select "No New Tasks" on the projects tab.

I wish that were true.
I'll mention it again- even with only a couple of tasks to report & No New Tasks set the Scheduler still usually times out. Theres 1 time in about 20 where it doesn't.
Uploads have been slow for much of this time as well.

Well it does on my system. Task gets done, it uploads and then the scheduler reports it, it goes through and the task vanishes from my client's task list.

Don't know what to say.

My client_state is 2.4MB in size, my sched_request_setiathome.berkeley.edu is 450kB in size. I suspect yours are a lot smaller. You're in the US, i'm a few thousand kms away.
End result- you may be able to get work, i'm lucky if i can even report work- even after 30min of endless Update clicking with No New Tasks set.

Grant, don't feel special.
The kitties are equally fukayed from the midwest USA.....
I could be on the Berk campus and still be screwed as bad as you right now.

____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,450,263
RAC: 42,585
Australia
Message 1301887 - Posted: 4 Nov 2012, 0:26:29 UTC - in response to Message 1301880.


Just to add to the present fun, i'm now getting some "Couldn't connect to server" messages in response to a Scheduler request.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37434
Credit: 500,547,857
RAC: 535,489
United States
Message 1301888 - Posted: 4 Nov 2012, 0:28:22 UTC - in response to Message 1301887.


Just to add to the present fun, i'm now getting some "Couldn't connect to server" messages in response to a Scheduler request.

Been getting those for the last 24 hours or more.
#2 rig has not been able to get through for an hour and a half.
____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2204
Credit: 8,022,753
RAC: 4,307
United States
Message 1301899 - Posted: 4 Nov 2012, 1:10:33 UTC

I'm having no trouble reporting 1-5 tasks every couple of hours. My cache got a little over-filled when I started hoarding APs, so I'm not asking for more work presently, which I suspect is nearly the equivalent of NNT, since both are effectively "not asking for more work."

When I was downloading the APs, they were coming in at around 25KB/sec for each one, sometimes I had 5-8 of them going at a time.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1301901 - Posted: 4 Nov 2012, 1:12:11 UTC

my linux boxes are reporting and obtaining normal work. but my one windoz box shows nothing but scheduler timeout. it seems to upload ok .. slowly but ok.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37434
Credit: 500,547,857
RAC: 535,489
United States
Message 1301903 - Posted: 4 Nov 2012, 1:18:22 UTC

You folks don't have a clue about hosts processing and trying to return 100s of results an hour.

Not the same as saying....oh, my rig did 2 tasks last hour, and both got reported just fine.

Yah, right. The kitties got YOUR back, bud.

Not dissing anybody, but it's a different class of problems here.

The kitties need access to the servers 24/7.


____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,450,263
RAC: 42,585
Australia
Message 1301905 - Posted: 4 Nov 2012, 1:29:07 UTC - in response to Message 1301899.

When I was downloading the APs, they were coming in at around 25KB/sec for each one, sometimes I had 5-8 of them going at a time.

Which is more support for the Scheduler issues being server related, not network traffic.
____________
Grant
Darwin NT.

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1301912 - Posted: 4 Nov 2012, 1:42:33 UTC

msattler wrote:
The kitties need access to the servers 24/7.


They ought to just send you 500gb unsplit raw drives to crunch :-)
Then we'd all have some network bandwidth to spare.

Profile betreger
Avatar
Send message
Joined: 29 Jun 99
Posts: 1753
Credit: 3,616,781
RAC: 7,811
United States
Message 1301914 - Posted: 4 Nov 2012, 1:51:19 UTC - in response to Message 1301903.

I can't believe it is Karma. It is just that the big guys are constipating the system. Everybody knows that. Until a politically acceptable and economic doable solution is proposed this is just venting.
____________

Profile betreger
Avatar
Send message
Joined: 29 Jun 99
Posts: 1753
Credit: 3,616,781
RAC: 7,811
United States
Message 1301916 - Posted: 4 Nov 2012, 1:54:01 UTC - in response to Message 1301912.

Tron, that gets away from the concept of distributed computing.
____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37434
Credit: 500,547,857
RAC: 535,489
United States
Message 1301917 - Posted: 4 Nov 2012, 1:55:18 UTC - in response to Message 1301912.

msattler wrote:
The kitties need access to the servers 24/7.


They ought to just send you 500gb unsplit raw drives to crunch :-)
Then we'd all have some network bandwidth to spare.

Hey, that might work....LOL.

I'll broach the subject with Eric in our next chat.

I'd have to install the splitting software. I think the GPUUG has sent enough HDs for shuttle service. That would be a grand solution, I think.

They would have to trust the kitties' science trail....err, tails.

Something tells me that letting raw data out of the house would not work scientifically.

The kitties are simply scintellated by the thought, though.
____________
******************
Seti whacko, resident evil, and town clown...

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1301919 - Posted: 4 Nov 2012, 1:59:16 UTC - in response to Message 1301916.

Tron, that gets away from the concept of distributed computing.



they still distribute the work , and only say... the top 25 machines would participate in a HD exchange program
I don't think the work would need to be split either.. just one long stream with nominal checkpointing.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2204
Credit: 8,022,753
RAC: 4,307
United States
Message 1301920 - Posted: 4 Nov 2012, 1:59:25 UTC - in response to Message 1301903.

You folks don't have a clue about hosts processing and trying to return 100s of results an hour.

Not the same as saying....oh, my rig did 2 tasks last hour, and both got reported just fine.

Yah, right. The kitties got YOUR back, bud.

Not dissing anybody, but it's a different class of problems here.

The kitties need access to the servers 24/7.


I wasn't trying to say that it was a situation of "it must just be a problem on your end," I was merely pointing out that I don't have any/many scheduler contact issues, even when only reporting a very small number of tasks. Others are having connection issues when reporting a small number of tasks, and so are those who are reporting a large number.

Consider my message as a data point on a graph, or a breadcrumb for trying to pin-point the actual problem.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,450,263
RAC: 42,585
Australia
Message 1301958 - Posted: 4 Nov 2012, 4:54:15 UTC


...and to rub salt into the wounds, on the one occasion where the Scheduler responded to a request for work, i got 1 WU.
1,000 would be nice, a couple of thosand would be better. Probably at least half as many again to fill my caches.
1 WU!
____________
Grant
Darwin NT.

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 20,510,654
RAC: 29,165
United States
Message 1301975 - Posted: 4 Nov 2012, 5:34:07 UTC

Perhaps this might be a situation where there is more than one problem occurring at the same time. That, as most of us know from experience, makes diagnosis exceedingly difficult, which seems to fit the current crisis.

Could this be a possibility? What do you think?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Copyright © 2014 University of California