Panic Mode On (107) Server Problems?

Message boards : Number crunching : Panic Mode On (107) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 29 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1884171 - Posted: 16 Aug 2017, 12:12:19 UTC - in response to Message 1884107.  


Not by much really. I have a 40 Mhz bump in the core clock and a 200 Mhz bump in the memory clock. The cards do their own GPU Boost thing which amazes me since two of them are running close to 80° C because of their poor ventilation since they are in adjacent slots.


. . Well my units are Gainwards and factory clocked fairly high but your times are 15 to 30 secs better on all task types. Very impressive. Maybe I am running my units too cool ... :)

Stephen

:)
ID: 1884171 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884208 - Posted: 16 Aug 2017, 15:32:09 UTC - in response to Message 1884171.  


. . Well my units are Gainwards and factory clocked fairly high but your times are 15 to 30 secs better on all task types. Very impressive. Maybe I am running my units too cool ... :)

Stephen

:)

The 970s are reference cards from EVGA. I do run them in power state 0 instead of power state 2 that the Nvidia driver defaults to. I just used the same mild overclocks I had them on when they were in Windows machines. Cooling was not an issue with them then because there were only two per machine with adequate cooling spacing.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884208 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1884220 - Posted: 16 Aug 2017, 16:20:58 UTC

Umm, could someone mount a "tape" or two on the Beta machines? Beta is out of work!
.

Hello, from Albany, CA!...
ID: 1884220 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884356 - Posted: 17 Aug 2017, 1:55:41 UTC

And just when I was thinking to myself that .... hmmmm..... strange ...... the work caches have been full all day, I wonder what has changed. Now the Win 10 and Linux machines are failing to keep up with work turned in. Jinxed I say, Jinxed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884356 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884365 - Posted: 17 Aug 2017, 2:56:50 UTC

Looks like the best way to get tasks flowing again is the 'lost task recovery' mechanism that has been mentioned.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884365 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1884508 - Posted: 17 Aug 2017, 17:33:49 UTC - in response to Message 1882794.  

My mind must have still been in the mountains.


That is still a lovely place to be :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1884508 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1884510 - Posted: 17 Aug 2017, 17:43:28 UTC - in response to Message 1883932.  

That is a nice idea to try about automating a BOINC exit. I have been thinking I needed to set up some sort of remote desktop for all the boxes so I can monitor them when I'm away this weekend.


I think I read somewhere that using the classic Windows remote desktop program crashes things. So I wouldn't try that without testing it while your home.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1884510 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 1884513 - Posted: 17 Aug 2017, 17:45:54 UTC - in response to Message 1884208.  

The 970s are reference cards from EVGA. I do run them in power state 0 instead of power state 2 that the Nvidia driver defaults to.


Is that possible under Windows? And does anyone know if Gtx 1060 3GB's have this same power state 2/0? What utility?

Thank you,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1884513 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884531 - Posted: 17 Aug 2017, 19:18:06 UTC - in response to Message 1884510.  

That is a nice idea to try about automating a BOINC exit. I have been thinking I needed to set up some sort of remote desktop for all the boxes so I can monitor them when I'm away this weekend.


I think I read somewhere that using the classic Windows remote desktop program crashes things. So I wouldn't try that without testing it while your home.

Tom

I'm going to try Team Viewer since it was recommended.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884531 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884533 - Posted: 17 Aug 2017, 19:22:22 UTC - in response to Message 1884513.  

The 970s are reference cards from EVGA. I do run them in power state 0 instead of power state 2 that the Nvidia driver defaults to.


Is that possible under Windows? And does anyone know if Gtx 1060 3GB's have this same power state 2/0? What utility?

Thank you,
Tom

As far as I can remember, Nvidia has relegated their gpus to run in P2 state with reduced clocks whenever the driver detects that distributed computing is running on the cards. That goes back at least to whatever driver level was current when I got my 970s.

In Windows, I use Nvidia Inspector to set my cards back to P0 state plus add some additional minor overclocks when they are doing distributed computing. The drop in clocks in P2 state is rather significant and the performance suffers accordingly. I prefer to run the cards at what they are capable of.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884533 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1884538 - Posted: 17 Aug 2017, 19:28:36 UTC - in response to Message 1884531.  

I'm going to try Team Viewer since it was recommended.
I use VNC at home, but TeamViewer for remote access since it is so firewall friendly, where with VNC you usually have to open up the ports on your router. I haven't tried it on Linux yet though but should - it would be better than TeamViewer to Windows box, then VNC into the others ... a bit SLOW but I rarely need to do remote access.
ID: 1884538 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1884563 - Posted: 17 Aug 2017, 21:21:09 UTC - in response to Message 1884208.  
Last modified: 17 Aug 2017, 21:24:29 UTC


The 970s are reference cards from EVGA. I do run them in power state 0 instead of power state 2 that the Nvidia driver defaults to. I just used the same mild overclocks I had them on when they were in Windows machines. Cooling was not an issue with them then because there were only two per machine with adequate cooling spacing.


. . Hi Keith

. . I have checked mine and they are running in P2. The only difference is the memory clock is 6GHz instead of 7GHz. So how did you manage to persuade yours to jump to the higher clock rate? The nVidia app will not budge it, I set it to max performance but it stays at P2.

Stephen

?

P.S. I just read about the app you use in Windows but what about Linux?
ID: 1884563 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884577 - Posted: 17 Aug 2017, 22:07:53 UTC - in response to Message 1884563.  


The 970s are reference cards from EVGA. I do run them in power state 0 instead of power state 2 that the Nvidia driver defaults to. I just used the same mild overclocks I had them on when they were in Windows machines. Cooling was not an issue with them then because there were only two per machine with adequate cooling spacing.


. . Hi Keith

. . I have checked mine and they are running in P2. The only difference is the memory clock is 6GHz instead of 7GHz. So how did you manage to persuade yours to jump to the higher clock rate? The nVidia app will not budge it, I set it to max performance but it stays at P2.

Stephen

?

P.S. I just read about the app you use in Windows but what about Linux?

Stephen I use some scripts to set persistence mode and an overclock.
unrestricted.sh
#!/bin/bash

exec 1> >(logger -s -t $(basename $0)) 2>&1

/usr/bin/nvidia-smi -pm 1

/usr/bin/nvidia-smi -acp UNRESTRICTED


gpuoverclock.sh
#!/bin/bash

exec 1> >(logger -s -t $(basename $0)) 2>&1

/usr/bin/nvidia-settings -a "[gpu:0]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUFanControlState=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUFanControlState=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUPowerMizerMode=1"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUPowerMizerMode=1"

/usr/bin/nvidia-settings -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=200" -a "[gpu:0]/GPUGraphicsClockOffset[3]=40"
/usr/bin/nvidia-settings -a "[gpu:1]/GPUMemoryTransferRateOffset[3]=200" -a "[gpu:1]/GPUGraphicsClockOffset[3]=40"
/usr/bin/nvidia-settings -a "[gpu:2]/GPUMemoryTransferRateOffset[3]=200" -a "[gpu:2]/GPUGraphicsClockOffset[3]=40"


/usr/bin/nvidia-smi -i 0 -ac 3605,1394
/usr/bin/nvidia-smi -i 1 -ac 3605,1394
/usr/bin/nvidia-smi -i 2 -ac 3605,1394

GPUPriority.sh
#Run in root terminal, NOT sudo

exec 1> >(logger -s -t $(basename $0)) 2>&1

nvidia-smi -pm 1

for (( ; ; ))
do
  # Assign CPU Priority (19=Nice/LowPriority, 0=Normal, -20=HighPriority)
 # This was code Petri gave out
 # GPU Tasks get high Priority
  schedtool -n -20 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda80`
  schedtool -n -20 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # CPU Tasks get (a little) Below Normal Priority (0 being normal) to make sure it doesn't choke the OS
  schedtool -n   5 `pidof ap_7.05r2728_bdver1_linux64`
  schedtool -n   5 `pidof MBv8_8.05r3345_avx_linux64`

  # Assign CPU Usage Threads (0-7)
 # Brent added this to Petri's code
 # Keep GPU tasks on threads 1 3 5 7
  schedtool -a 1,3,5,7 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda80`
  schedtool -a 1,3,5,7 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # Keep CPU tasks on threads 0 2 4 6
  schedtool -a 0,2,4,6 `pidof MBv8_8.05r3345_avx_linux64`
  schedtool -a 0,2,4,6 `pidof ap_7.05r2728_bdver1_linux64`


  #    CPU Priority Assignment Script
  date
  # lscpu | grep MHz
  sleep 5
  echo  "  CPU Priority and Assignment Script (8 Threads)" 
done

I run these scripts from root Terminal before I start BOINC. Most everyone is borrowed from Petri or Brent code posted in the forum. It all works quite well along with Jeff's fan control gui app. Hope you can edit to your use and get your 970 into Performance Level 1 in Linux.

Cheers
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884577 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1884581 - Posted: 17 Aug 2017, 22:15:18 UTC - in response to Message 1884577.  

I just noticed I switched those comments Re: Petri/Brent are opposite, Petri was doing thread assignment, I added the priority part.
No big deal, just though I better mention it.
ID: 1884581 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1884589 - Posted: 17 Aug 2017, 22:48:13 UTC - in response to Message 1882794.  

Sorry everyone, this was entirely my fault. I had just gotten back from vacation, started the outage, and then managed to accidentally kill the outage. I discovered this much later and restarted it. My mind must have still been in the mountains.

These things happen Jeff. I am interested to know as part of the maintenance window automated since you restarted it? I hope you had a good time while you were in the mountains
ID: 1884589 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1884596 - Posted: 17 Aug 2017, 23:15:35 UTC - in response to Message 1884581.  

I just noticed I switched those comments Re: Petri/Brent are opposite, Petri was doing thread assignment, I added the priority part.
No big deal, just though I better mention it.

As you say ... no big deal. I do appreciate the effort both of you put into the script to get the special app running 'especially' well.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1884596 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1884602 - Posted: 17 Aug 2017, 23:38:25 UTC - in response to Message 1884577.  


I run these scripts from root Terminal before I start BOINC. Most everyone is borrowed from Petri or Brent code posted in the forum. It all works quite well along with Jeff's fan control gui app. Hope you can edit to your use and get your 970 into Performance Level 1 in Linux.
Cheers


. . Thanks Keith,

. . I will have a play with them when I get home ...

Stephen

:)
ID: 1884602 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13864
Credit: 208,696,464
RAC: 304
Australia
Message 1884650 - Posted: 18 Aug 2017, 5:25:06 UTC

Forums extremely sluggish at the moment, along with checking accounts & tasks, computers.
And the whole web site has been going missing for short periods.
Grant
Darwin NT
ID: 1884650 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1884656 - Posted: 18 Aug 2017, 5:33:03 UTC - in response to Message 1884650.  

Starting to get no work responses from the servers
ID: 1884656 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13864
Credit: 208,696,464
RAC: 304
Australia
Message 1884661 - Posted: 18 Aug 2017, 5:46:32 UTC - in response to Message 1884656.  
Last modified: 18 Aug 2017, 5:47:02 UTC

Starting to get no work responses from the servers

Just looked in my Event Log- it's been that way for about 5 hours. 3-7 report/requests to actually get any work.
At least there's no sign of Scheduler errors.
Grant
Darwin NT
ID: 1884661 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (107) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.