Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next
Author Message
juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5343
Credit: 298,485,828
RAC: 463,843
Brazil
Message 1310022 - Posted: 25 Nov 2012, 2:58:44 UTC - in response to Message 1310017.
Last modified: 25 Nov 2012, 2:59:36 UTC

100 per CPU/GPU = 200 on a GPU host

I only use My gpus of course...

0 CPU + 100 GPU = 100
That´s why you have a 100WU cache only.
____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1286
Credit: 48,688,130
RAC: 112,701
United States
Message 1310026 - Posted: 25 Nov 2012, 3:12:39 UTC
Last modified: 25 Nov 2012, 4:06:06 UTC

Get Shorty!

I thought 4 minute shorties were a pain until I saw where the 1 and only AstroPulse I snagged went. Check it out http://setiathome.berkeley.edu/workunit.php?wuid=1110816835 After working out how long 660,846.5 seconds is, I can't find it in me to complain about a 4 minute shorty...

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2268
Credit: 8,713,643
RAC: 3,937
United States
Message 1310030 - Posted: 25 Nov 2012, 4:13:24 UTC

Wooow.. 7.64 days of run time but only 2 seconds of CPU time before erroring out. That's brutal.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310044 - Posted: 25 Nov 2012, 5:22:07 UTC - in response to Message 1307257.

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!
____________
Grant
Darwin NT.

Mark FiskeProject donor
Send message
Joined: 15 Aug 11
Posts: 713
Credit: 7,392,921
RAC: 0
United States
Message 1310060 - Posted: 25 Nov 2012, 6:29:29 UTC - in response to Message 1310044.

Well, I wasn't expecting this but I just got 94 CPU WU's out of the blue. Happy Camper!

Mark

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310067 - Posted: 25 Nov 2012, 7:26:14 UTC - in response to Message 1310044.

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!

Back to Timeouts, Timeouts, Timeouts!!!!!!! again.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310075 - Posted: 25 Nov 2012, 7:57:21 UTC - in response to Message 1310067.
Last modified: 25 Nov 2012, 7:57:46 UTC

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!

Back to Timeouts, Timeouts, Timeouts!!!!!!! again.


I think it's now just throwing random errors. Timeouts, couldn't connect & failure when receiving data from the peer depending on the mood it's in.


I've even received a No tasks sent, but that was on some GPU requests- i got a whole bunch of VLARs on one CPU request so at least that particular response makes sense.
____________
Grant
Darwin NT.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5343
Credit: 298,485,828
RAC: 463,843
Brazil
Message 1310111 - Posted: 25 Nov 2012, 10:11:29 UTC - in response to Message 1310075.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310206 - Posted: 25 Nov 2012, 18:03:01 UTC - in response to Message 1310111.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.

Then i've got the hassle of finding a working proxy, then finding a new one every few days when the working one nolonger does.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310213 - Posted: 25 Nov 2012, 18:08:45 UTC - in response to Message 1310206.


Just noticed that the AP Science Database & Assimilators haven't been running for a few days- lots of work to be assimilated is backing up.
____________
Grant
Darwin NT.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,836,127
RAC: 2,198
United States
Message 1310216 - Posted: 25 Nov 2012, 18:16:03 UTC

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.

11/25/2012 1:01:03 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:01:03 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:01:25 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:01:25 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:01:26 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:03:06 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:03:06 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:03:29 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:03:29 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:03:30 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:06:11 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:06:11 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:06:34 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:06:34 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:06:35 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.


____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 370
Credit: 2,836,127
RAC: 2,198
United States
Message 1310230 - Posted: 25 Nov 2012, 18:34:01 UTC - in response to Message 1310226.

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.



I've been getting a lot of can't connect errors here too. Not on your end.

See what I mean... VOODOO!

11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:27:50 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:27:50 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:29:03 PM | SETI@home | Scheduler request completed: got 11 new tasks

____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Lionel
Send message
Joined: 25 Mar 00
Posts: 545
Credit: 230,812,080
RAC: 249,479
Australia
Message 1310389 - Posted: 26 Nov 2012, 4:43:53 UTC - in response to Message 1310111.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

____________

Lionel
Send message
Joined: 25 Mar 00
Posts: 545
Credit: 230,812,080
RAC: 249,479
Australia
Message 1310390 - Posted: 26 Nov 2012, 4:46:42 UTC - in response to Message 1310230.

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.



I've been getting a lot of can't connect errors here too. Not on your end.

See what I mean... VOODOO!

11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:27:50 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:27:50 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:29:03 PM | SETI@home | Scheduler request completed: got 11 new tasks


getting a lot of that here as well...suspect that many of us will be...




____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310413 - Posted: 26 Nov 2012, 7:08:23 UTC - in response to Message 1310389.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310414 - Posted: 26 Nov 2012, 7:09:55 UTC - in response to Message 1310413.
Last modified: 26 Nov 2012, 7:10:58 UTC

Even with all the wierdness going on, my systems have managed to stay busy while at work.

And while the inbound network traffic has been rather odd (little peaks here & there & gradually increasing overall) since coming back up after the multiple Scheduler breakdowns, there have been a couple of significant dips while i was away. And they also affected the download traffic.
____________
Grant
Darwin NT.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12569
Credit: 6,878,964
RAC: 6,681
United States
Message 1310417 - Posted: 26 Nov 2012, 7:27:07 UTC - in response to Message 1310413.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?

Because as Eric stated there is a problem upstream from SSL possibly in the Campus tunnel. Has nothing to do with pipe size.

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?

____________

Lionel
Send message
Joined: 25 Mar 00
Posts: 545
Credit: 230,812,080
RAC: 249,479
Australia
Message 1310420 - Posted: 26 Nov 2012, 7:52:22 UTC - in response to Message 1310417.

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?

Because as Eric stated there is a problem upstream from SSL possibly in the Campus tunnel. Has nothing to do with pipe size.

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?


maximum throughput (down to us) is governed by the maximum rate at which work can be created and sent...that is the natural ceiling...given a maximum down rate you can approximate a maximum up rate based on average returned work unit size...the pipe should be wider than these to allow for other things such as overhead/management traffic...



____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5831
Credit: 59,392,535
RAC: 47,405
Australia
Message 1310421 - Posted: 26 Nov 2012, 7:55:41 UTC - in response to Message 1310417.
Last modified: 26 Nov 2012, 7:56:29 UTC

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

Maybe, maybe not.
When the Scheduler was using the campus network, it was responding in less than 7 seconds, often within 2-4
So it would appear the network congestion is a factor- remove it & no more ghosts at all.


IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?

Keep in mind if there were a 5 fold increase in available bandwidth, the load on the servers would drop 5 times faster.
The load would probably be less than it is now becasue there wouldn't be all the re-tries going on, or the acccumulation of ghosts.

I have no doubt we'd find some new major problem sooner rather than later, but it would erase completely several existing ones.
____________
Grant
Darwin NT.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California