Panic Mode On (79) Server Problems?

Message boards : Number crunching : Panic Mode On (79) Server Problems?

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next

AuthorMessage
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,561,823
RAC: 7,818
Panama
Message 1310022 - Posted: 25 Nov 2012, 2:58:44 UTC - in response to Message 1310017.  
Last modified: 25 Nov 2012, 2:59:36 UTC

100 per CPU/GPU = 200 on a GPU host

I only use My gpus of course...

0 CPU + 100 GPU = 100
That´s why you have a 100WU cache only.
ID: 1310022 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3070
Credit: 122,689,297
RAC: 92,075
United States
Message 1310026 - Posted: 25 Nov 2012, 3:12:39 UTC
Last modified: 25 Nov 2012, 4:06:06 UTC

Get Shorty!

I thought 4 minute shorties were a pain until I saw where the 1 and only AstroPulse I snagged went. Check it out http://setiathome.berkeley.edu/workunit.php?wuid=1110816835 After working out how long 660,846.5 seconds is, I can't find it in me to complain about a 4 minute shorty...
ID: 1310026 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,211
RAC: 332
United States
Message 1310030 - Posted: 25 Nov 2012, 4:13:24 UTC

Wooow.. 7.64 days of run time but only 2 seconds of CPU time before erroring out. That's brutal.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1310030 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310044 - Posted: 25 Nov 2012, 5:22:07 UTC - in response to Message 1307257.  

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!
Grant
Darwin NT
ID: 1310044 · Report as offensive
Mark Fiske

Send message
Joined: 15 Aug 11
Posts: 713
Credit: 7,392,921
RAC: 0
United States
Message 1310060 - Posted: 25 Nov 2012, 6:29:29 UTC - in response to Message 1310044.  

Well, I wasn't expecting this but I just got 94 CPU WU's out of the blue. Happy Camper!

Mark
ID: 1310060 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310067 - Posted: 25 Nov 2012, 7:26:14 UTC - in response to Message 1310044.  

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!

Back to Timeouts, Timeouts, Timeouts!!!!!!! again.
Grant
Darwin NT
ID: 1310067 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310075 - Posted: 25 Nov 2012, 7:57:21 UTC - in response to Message 1310067.  
Last modified: 25 Nov 2012, 7:57:46 UTC

Timeouts, Timeouts, Timeouts!!!!!!!

Now it's Couldn't connect to server, Couldn't connect to server, Couldn't connect to server!!!

Back to Timeouts, Timeouts, Timeouts!!!!!!! again.


I think it's now just throwing random errors. Timeouts, couldn't connect & failure when receiving data from the peer depending on the mood it's in.


I've even received a No tasks sent, but that was on some GPU requests- i got a whole bunch of VLARs on one CPU request so at least that particular response makes sense.
Grant
Darwin NT
ID: 1310075 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,561,823
RAC: 7,818
Panama
Message 1310111 - Posted: 25 Nov 2012, 10:11:29 UTC - in response to Message 1310075.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.
ID: 1310111 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310206 - Posted: 25 Nov 2012, 18:03:01 UTC - in response to Message 1310111.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.

Then i've got the hassle of finding a working proxy, then finding a new one every few days when the working one nolonger does.
Grant
Darwin NT
ID: 1310206 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310213 - Posted: 25 Nov 2012, 18:08:45 UTC - in response to Message 1310206.  


Just noticed that the AP Science Database & Assimilators haven't been running for a few days- lots of work to be assimilated is backing up.
Grant
Darwin NT
ID: 1310213 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 384
Credit: 4,636,724
RAC: 1,958
United States
Message 1310216 - Posted: 25 Nov 2012, 18:16:03 UTC

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.

11/25/2012 1:01:03 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:01:03 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:01:25 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:01:25 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:01:26 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:03:06 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:03:06 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:03:29 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:03:29 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:03:30 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:06:11 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:06:11 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:06:34 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:06:34 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:06:35 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.


"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1310216 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45941
Credit: 815,391,067
RAC: 124,936
United States
Message 1310226 - Posted: 25 Nov 2012, 18:29:37 UTC - in response to Message 1310216.  

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.



I've been getting a lot of can't connect errors here too. Not on your end.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1310226 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 384
Credit: 4,636,724
RAC: 1,958
United States
Message 1310230 - Posted: 25 Nov 2012, 18:34:01 UTC - in response to Message 1310226.  

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.



I've been getting a lot of can't connect errors here too. Not on your end.

See what I mean... VOODOO!

11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:27:50 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:27:50 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:29:03 PM | SETI@home | Scheduler request completed: got 11 new tasks

"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1310230 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 665
Credit: 351,231,210
RAC: 139,384
Australia
Message 1310389 - Posted: 26 Nov 2012, 4:43:53 UTC - in response to Message 1310111.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

ID: 1310389 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 665
Credit: 351,231,210
RAC: 139,384
Australia
Message 1310390 - Posted: 26 Nov 2012, 4:46:42 UTC - in response to Message 1310230.  

Just posting what I'm currently getting in my event log since it seems to succeed within an hour of me posting what I'm currently getting in my event log. I don't question the voodoo, I just go with it.



I've been getting a lot of can't connect errors here too. Not on your end.

See what I mean... VOODOO!

11/25/2012 1:12:29 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:12:29 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:12:52 PM | | Project communication failed: attempting access to reference site
11/25/2012 1:12:52 PM | SETI@home | Scheduler request failed: Couldn't connect to server
11/25/2012 1:12:54 PM | | Internet access OK - project servers may be temporarily down.
11/25/2012 1:27:50 PM | SETI@home | Sending scheduler request: To fetch work.
11/25/2012 1:27:50 PM | SETI@home | Reporting 24 completed tasks, requesting new tasks for CPU and ATI
11/25/2012 1:29:03 PM | SETI@home | Scheduler request completed: got 11 new tasks


getting a lot of that here as well...suspect that many of us will be...




ID: 1310390 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310413 - Posted: 26 Nov 2012, 7:08:23 UTC - in response to Message 1310389.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?
Grant
Darwin NT
ID: 1310413 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310414 - Posted: 26 Nov 2012, 7:09:55 UTC - in response to Message 1310413.  
Last modified: 26 Nov 2012, 7:10:58 UTC

Even with all the wierdness going on, my systems have managed to stay busy while at work.

And while the inbound network traffic has been rather odd (little peaks here & there & gradually increasing overall) since coming back up after the multiple Scheduler breakdowns, there have been a couple of significant dips while i was away. And they also affected the download traffic.
Grant
Darwin NT
ID: 1310414 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 18645
Credit: 21,482,808
RAC: 19,362
United States
Message 1310417 - Posted: 26 Nov 2012, 7:27:07 UTC - in response to Message 1310413.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?

Because as Eric stated there is a problem upstream from SSL possibly in the Campus tunnel. Has nothing to do with pipe size.

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?

ID: 1310417 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 665
Credit: 351,231,210
RAC: 139,384
Australia
Message 1310420 - Posted: 26 Nov 2012, 7:52:22 UTC - in response to Message 1310417.  

Use a proxy, with them the comunications errors are minimized, and you could easely rebuild your caches.


As Grant has basically said, it doesn't always work...they are waking up to the traffic that seti puts through and soon this avenue will be closed for many of us...what they need to do is increase the bandwidth beyond 100Mbps...

That would help (massively- till the next bottleneck is hit), but what doesn't make sense is why using a proxy does give better connections & speeds than not using one?

Because as Eric stated there is a problem upstream from SSL possibly in the Campus tunnel. Has nothing to do with pipe size.

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?


maximum throughput (down to us) is governed by the maximum rate at which work can be created and sent...that is the natural ceiling...given a maximum down rate you can approximate a maximum up rate based on average returned work unit size...the pipe should be wider than these to allow for other things such as overhead/management traffic...



ID: 1310420 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,160,661
RAC: 46,425
Australia
Message 1310421 - Posted: 26 Nov 2012, 7:55:41 UTC - in response to Message 1310417.  
Last modified: 26 Nov 2012, 7:56:29 UTC

Oh and can you imagine how much worse the scheduler ghosts woes would be if the pipe was 10X wider? Would there be 10X the number of ghosts?

Maybe, maybe not.
When the Scheduler was using the campus network, it was responding in less than 7 seconds, often within 2-4
So it would appear the network congestion is a factor- remove it & no more ghosts at all.


IIRC Eric was able to get a test in and a 5X increase in pipe size hits a bottleneck that may not be surmountable. I also have a question, do you think the hardware can take 5X additional 24/7 or what will break next?

Keep in mind if there were a 5 fold increase in available bandwidth, the load on the servers would drop 5 times faster.
The load would probably be less than it is now becasue there wouldn't be all the re-tries going on, or the acccumulation of ghosts.

I have no doubt we'd find some new major problem sooner rather than later, but it would erase completely several existing ones.
Grant
Darwin NT
ID: 1310421 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.