Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 32 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1726868 - Posted: 18 Sep 2015, 15:41:34 UTC - in response to Message 1726863.  


EDIT....
And as I posted this, Jeff replied that he just spied the replica problem and is working on it now.

And back up, albeit ~ 20 hrs behind ...

It'll catch up soon enough.

Now, if he can just get the splitters shifted into a higher gear.....
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1726868 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1726873 - Posted: 18 Sep 2015, 16:03:18 UTC - in response to Message 1726871.  


EDIT....
And as I posted this, Jeff replied that he just spied the replica problem and is working on it now.

And back up, albeit ~ 20 hrs behind ...

It'll catch up soon enough.

Now, if he can just get the splitters shifted into a higher gear.....

Don't forget about hoping that they fix the two letter things, that isn't "M" and "B" :-)

Yeah, well....
That might come after we work down the MB queue again.
Which will prove to be more difficult at the moment, as RTS has just hit zero again.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1726873 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1726874 - Posted: 18 Sep 2015, 16:07:41 UTC

Since the RTS = 0 it is time to bring out the big red PANIC
ID: 1726874 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1726886 - Posted: 18 Sep 2015, 16:47:06 UTC - in response to Message 1726874.  

Since the RTS = 0 it is time to bring out the big red PANIC


I don't think so. Splitters are feeding WU's all time, latest "Project has no tasks available" -message is over one hour old.

They just can't refill RTS to 300.000, but they are holding on.
ID: 1726886 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1726915 - Posted: 18 Sep 2015, 18:37:56 UTC - in response to Message 1726873.  


EDIT....
And as I posted this, Jeff replied that he just spied the replica problem and is working on it now.

And back up, albeit ~ 20 hrs behind ...

It'll catch up soon enough.

Now, if he can just get the splitters shifted into a higher gear.....

Don't forget about hoping that they fix the two letter things, that isn't "M" and "B" :-)

Yeah, well....
That might come after we work down the MB queue again.
Which will prove to be more difficult at the moment, as RTS has just hit zero again.

That would probably explain why my main cruncher just downloaded 48 new Betas even though I lowered Beta's RS recently.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1726915 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1726921 - Posted: 18 Sep 2015, 19:01:48 UTC

Somebody in the lab PANICKED as the SSP shows most of the servers are down.
ID: 1726921 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1726931 - Posted: 18 Sep 2015, 19:32:01 UTC - in response to Message 1726922.  

It must be contagious, I have been getting this message from Einstein.
9/18/2015 12:29:04 PM | Einstein@Home | Project is temporarily shut down for maintenance

ID: 1726931 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1726945 - Posted: 18 Sep 2015, 20:05:05 UTC

We-e-ll, I was just watching Great British Menu on iPlayer (my broadband is currently at 2268 kbps, just enough to watch 1280x720 HD) when it all stopped, even though gkrellm was showing the net flatlining at 248 kBps. Looked at BOINC and it was downloading shedloads of new jobs and a new executable. When the dust settled, there had been 91 new jobs and setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah -- so I guess that's out of beta. My RAC should go up, as I'd never got around to setting up the anonymous mechanism on this PC, so I've only had GPU work when there are AP jobs to be had. We shall see...
Can't check my full stats, though, with the replica being behind times.
Oh, and Matt beat Eve for the North-west crown once the judging was finished </spoiler>.
ID: 1726945 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1726946 - Posted: 18 Sep 2015, 20:08:41 UTC - in response to Message 1726945.  

setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah --


Wonder if there is an equivalent Windows release as well?

I'd like to take a whack at it outside of Beta as well.

Hmm.. But you are right, anonymous platform will mean modification to try and get that app and executable...
ID: 1726946 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1726955 - Posted: 18 Sep 2015, 20:42:24 UTC - in response to Message 1726946.  

setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah --

Wonder if there is an equivalent Windows release as well?

Not according to http://setiathome.berkeley.edu/apps.php , though there are some new Mac CPU apps as well.

I'd like to take a whack at it outside of Beta as well.

Hmm.. But you are right, anonymous platform will mean modification to try and get that app and executable...

ID: 1726955 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1726959 - Posted: 18 Sep 2015, 21:07:02 UTC - in response to Message 1726945.  

We-e-ll, I was just watching Great British Menu on iPlayer (my broadband is currently at 2268 kbps, just enough to watch 1280x720 HD) when it all stopped, even though gkrellm was showing the net flatlining at 248 kBps. Looked at BOINC and it was downloading shedloads of new jobs and a new executable. When the dust settled, there had been 91 new jobs and setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah -- so I guess that's out of beta. My RAC should go up, as I'd never got around to setting up the anonymous mechanism on this PC, so I've only had GPU work when there are AP jobs to be had. We shall see...

Might need some tuning -- the GPU jobs are hogging a CPU each!
top - 22:01:36 up 5 days, 19:07,  6 users,  load average: 15.07, 14.97, 14.72
Tasks: 274 total,  13 running, 261 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.0 us,  7.1 sy, 77.1 ni, 13.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8129424 total,  6893796 used,  1235628 free,   396776 buffers
KiB Swap:  5119996 total,       36 used,  5119960 free.  3226776 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                            
 8112 ivan      30  10 28.277g  80788  32484 R  99.7  1.0   8:10.46 setiathome_7.08                    
 8610 ivan      30  10 28.279g 104412  48080 R  99.7  1.3   5:58.53 setiathome_7.08                    
 4032 ivan      39  19  108976  39884     12 R  81.8  0.5  26:53.10 setiathome_7.01                    
 4791 ivan      39  19  107752  38664     12 R  74.5  0.5  21:07.65 setiathome_7.01                    
30201 ivan      39  19  109012  39972     12 R  72.1  0.5  81:36.75 setiathome_7.01                    
 6779 ivan      39  19  108060  39252     12 R  52.5  0.5  11:25.18 setiathome_7.01                    
 6643 ivan      39  19  107548  38568     12 R  49.5  0.5  11:20.55 setiathome_7.01                    
20264 ivan      39  19  110816  42320     12 R  47.9  0.5 165:22.65 setiathome_7.01                    
32684 ivan      39  19  108824  39732     12 R  47.2  0.5  58:39.61 setiathome_7.01                    
 7160 ivan      39  19  109356  40016     12 R  42.9  0.5   8:52.95 setiathome_7.01                    
 


[homepc01:BOINC] > nvidia-smi
Fri Sep 18 22:03:55 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.30     Driver Version: 352.30         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 660 Ti  Off  | 0000:01:00.0     N/A |                  N/A |
| 30%   65C    P0    N/A /  N/A |    644MiB /  2043MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

ID: 1726959 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1726967 - Posted: 18 Sep 2015, 21:30:19 UTC - in response to Message 1726959.  
Last modified: 18 Sep 2015, 21:30:58 UTC

We-e-ll, I was just watching Great British Menu on iPlayer (my broadband is currently at 2268 kbps, just enough to watch 1280x720 HD) when it all stopped, even though gkrellm was showing the net flatlining at 248 kBps. Looked at BOINC and it was downloading shedloads of new jobs and a new executable. When the dust settled, there had been 91 new jobs and setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah -- so I guess that's out of beta. My RAC should go up, as I'd never got around to setting up the anonymous mechanism on this PC, so I've only had GPU work when there are AP jobs to be had. We shall see...

Might need some tuning -- the GPU jobs are hogging a CPU each!
top - 22:01:36 up 5 days, 19:07,  6 users,  load average: 15.07, 14.97, 14.72
Tasks: 274 total,  13 running, 261 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.0 us,  7.1 sy, 77.1 ni, 13.7 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   8129424 total,  6893796 used,  1235628 free,   396776 buffers
KiB Swap:  5119996 total,       36 used,  5119960 free.  3226776 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                            
 8112 ivan      30  10 28.277g  80788  32484 R  99.7  1.0   8:10.46 setiathome_7.08                    
 8610 ivan      30  10 28.279g 104412  48080 R  99.7  1.3   5:58.53 setiathome_7.08                    
 4032 ivan      39  19  108976  39884     12 R  81.8  0.5  26:53.10 setiathome_7.01                    
 4791 ivan      39  19  107752  38664     12 R  74.5  0.5  21:07.65 setiathome_7.01                    
30201 ivan      39  19  109012  39972     12 R  72.1  0.5  81:36.75 setiathome_7.01                    
 6779 ivan      39  19  108060  39252     12 R  52.5  0.5  11:25.18 setiathome_7.01                    
 6643 ivan      39  19  107548  38568     12 R  49.5  0.5  11:20.55 setiathome_7.01                    
20264 ivan      39  19  110816  42320     12 R  47.9  0.5 165:22.65 setiathome_7.01                    
32684 ivan      39  19  108824  39732     12 R  47.2  0.5  58:39.61 setiathome_7.01                    
 7160 ivan      39  19  109356  40016     12 R  42.9  0.5   8:52.95 setiathome_7.01                    
 


[homepc01:BOINC] > nvidia-smi
Fri Sep 18 22:03:55 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.30     Driver Version: 352.30         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 660 Ti  Off  | 0000:01:00.0     N/A |                  N/A |
| 30%   65C    P0    N/A /  N/A |    644MiB /  2043MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+



If those are the Opencl_nvidia_sah, that is normal for them.

Here's a little info for you on that. If you run 2 at a time of those, the usage will actually go down to about 85% of a core for each work unit.

I would not recommend going over 2 OpenCl_nviidia_sah per card.

On the plus side, you will help to get rid of all those VLARS ;)
ID: 1726967 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1726973 - Posted: 18 Sep 2015, 21:43:29 UTC

Guess we still got major drama with the servers hopefully it will get fixed almost outa units or at least I will by dinner time
ID: 1726973 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1726975 - Posted: 18 Sep 2015, 21:48:56 UTC - in response to Message 1726967.  

If those are the Opencl_nvidia_sah, that is normal for them.

Here's a little info for you on that. If you run 2 at a time of those, the usage will actually go down to about 85% of a core for each work unit.

I would not recommend going over 2 OpenCl_nviidia_sah per card.

On the plus side, you will help to get rid of all those VLARS ;)

I think the only machine I run >2 GPU jobs on is my work Linux machine, which is running anonymous. Not got the feedback yet (because the replica DB was still catching up last I looked) to see how this will affect my RAC -- I don't expect it to go down! What happens when the next batch of AP WUs are available is another matter.
ID: 1726975 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1726992 - Posted: 18 Sep 2015, 22:46:53 UTC - in response to Message 1726868.  
Last modified: 18 Sep 2015, 22:53:52 UTC

Now, if he can just get the splitters shifted into a higher gear.....

Still no joy.
Server Status Page shows green, but splitter output is poor at best, often non-existent over the last few hours.


To add to that, if you do happen to score some work, downloads aren't happening. They try (download timer is running), but not a byte is being downloaded; project backoffs soon follow.



EDIT- after excessive use of the retry button I've managed to download a few WUs. Usually my download speed is around 155kB/s, the best at the moment is 40kB/s, most around 20-30. A couple only at 10.
Eventually they started (and finished) downloading by themselves.
Grant
Darwin NT
ID: 1726992 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1726999 - Posted: 18 Sep 2015, 23:08:24 UTC - in response to Message 1726992.  

Server Status Page shows green, but splitter output is poor at best, often non-existent over the last few hours.

The last time I checked, at 22:40 UTC there were over 25,000 MBs and 14 APs available for download. It had grown from 22:10 UTC's 700 MBs to that over 25,000 in just half an hour. Now (SSP shows 23:00 UTC) already all gone.
ID: 1726999 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1727003 - Posted: 18 Sep 2015, 23:26:38 UTC - in response to Message 1726999.  
Last modified: 18 Sep 2015, 23:27:07 UTC

Server Status Page shows green, but splitter output is poor at best, often non-existent over the last few hours.

The last time I checked, at 22:40 UTC there were over 25,000 MBs and 14 APs available for download. It had grown from 22:10 UTC's 700 MBs to that over 25,000 in just half an hour. Now (SSP shows 23:00 UTC) already all gone.


There's definitely some sort of server issues going on- just before 22:00hrs Berkeley time MB splitter output fell over, Awaiting Validation started climbing, and there was a drop in the received-in-the-last-hour numbers, then a sharp spike.
In Progress fell steadily, recovered a bit around 01:00hrs Berkeley time and then levelled off slightly (hard to see on the graph). Splitter output has picked up again over the last hour or so, but once again it s just in spurts- they just don't seem to be able to crank up, pump out the work, then shut down once the ready-to-send buffer is full.
Looking at the graphs for the last few hour the splitters were running, and the ready-to-send buffer was empty, there were occasional peaks to 35/s, lots of drops to 25/s, several to less than 20/s.
Grant
Darwin NT
ID: 1727003 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1727005 - Posted: 18 Sep 2015, 23:42:07 UTC - in response to Message 1726967.  

On the plus side, you will help to get rid of all those VLARS ;)

I like the sound of that. I'm looking forward to when it comes over to here. I have no idea when that will be
ID: 1727005 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1727012 - Posted: 19 Sep 2015, 0:22:01 UTC

Geez....
The ol' kittyman goes in to work some OT this afternoon and everything falls apart.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1727012 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1727033 - Posted: 19 Sep 2015, 18:17:45 UTC

Woohoo, first to post after the unplanned outage! :)
Servers are reachable again but still to busy...
Aloha, Uli

ID: 1727033 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.