What percent utilization to expect on the CUDA90 linux client?

Message boards : Number crunching : What percent utilization to expect on the CUDA90 linux client?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2005092 - Posted: 1 Aug 2019, 0:04:38 UTC - in response to Message 2005084.  
Last modified: 1 Aug 2019, 0:48:39 UTC

...Notice for first time that gpu tasks were waiting for memory. I fixed that by assigning %95 for both in use and idle time. Have pair of 4 gig sticks. This system not used for anything else so probably don't need any more although I do have a GTX650ti not being used and an empty slot on the 4-in-1
I know someone who had the Exact same problem with BOINC saying 'waiting for memory', he too had 8 GB of ram. Only he accused the BOINC Error as somehow being caused by the CUDA App instead of him being Low on memory. To this day he still blames the CUDA App even though No One Else had that problem with the CUDA App. I'd suggest you add more RAM if you don't want any more problems.


I changed ram from %50 to %95 for when system is "in use"

Just looked at memory:

jstateson@tb85-nvidia:~$ free -t
              total        used        free      shared  buff/cache   available
Mem:        8104636     4390076      725680     1658244     2988880     1802888
Swap:       2097148           0     2097148
Total:     10201784     4390076     2822828


If I read this correctly the Swap is actually disk space not real memory ad current swap usage is 0 and there is a huge buffer / cache space.

I saw the warning "suspended waiting for memory" immediately upon booting after replacing the USB3 cables. After a few minutes that "memory warning" disappeared but then it would appear seemingly at random. Usually one WU sometimes two WU were suspended. I was monitoring all of this using boinctasks on another system.

It looks like changing the %ram from 50 to 95 fixed the problem. All 8 cores are running nearly %100 which I normally never see on any other systems or projects. Was this caused by -nobs ? Since swap is zero then no cpus are doing any swapping and it would appear all the CPUs are really busy unless the -nobs causes "idle busy behavior"

Did -nobs cause the huge jump in CPU usage or was it when I replaced the faulty USB3 cables? Going to run some tests including trying that GEN-2 bios change

[EDIT] Got one answer - that "nobs" is just idle busy. CPU must be looping looking to feed GPU and not returning to the scheduler to get a different assignment. I removed It and CPU dropped to %30 or under as before. Will monitor GPU thruput using my boinctasks history analyzer to see the difference. This may allow me to add addition GPUs as CPUs are not really %100 utilized.

[EDIT-2] Removing "nobs" did decrease the %GPU utilization slightly. It really does make a noticeable, thought slight, difference.

[EDIT-3] set GEn2 in bios and see this
nvidia-smi --query-gpu=name,pci.bus_id,pcie.link.gen.current --format=csv
GeForce GTX 1070, 00000000:01:00.0, 2
GeForce GTX 1060 6GB, 00000000:02:00.0, 2
GeForce GTX 1060 3GB, 00000000:03:00.0, 1
GeForce GTX 1060 3GB, 00000000:04:00.0, 1
GeForce GTX 1070, 00000000:05:00.0, 1
GeForce GTX 1060 3GB, 00000000:08:00.0, 2
GeForce GTX 1060 3GB, 00000000:09:00.0, 2
GeForce GTX 1060 3GB, 00000000:0A:00.0, 2
jstateson@tb85-nvidia:~$


Also was mistaken, previous gen was "auto' not gen-1. I had reset the CMOS and then gen1 went to auto
ID: 2005092 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2005095 - Posted: 1 Aug 2019, 1:01:54 UTC

The parameter stands for "no blocking sync" It pins the cpu thread to the gpu task so its attention doesn't get pulled away to service another process and then the gpu thread has to wait its turn in the round robin thread process rotation before it can ask for more data from the cpu. So it speeds up the gpu task because the thread gets serviced promptly every time the gpu asks for more data or attention.

I believe your other handle over on the other projects is BeemerBiker isn't it? Or at least your avatar looks similar. The change from blocking sync to spin sync over at GPUGrid knocks significant times off the crunching time. It is the recommended version of sync to use at that project. It can save several seconds here at Seti with the special sauce app.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2005095 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2005098 - Posted: 1 Aug 2019, 1:43:30 UTC - in response to Message 2005092.  
Last modified: 1 Aug 2019, 1:46:29 UTC

I changed ram from %50 to %95 for when system is "in use"

Just looked at memory:
jstateson@tb85-nvidia:~$ free -t
              total        used        free      shared  buff/cache   available
Mem:        8104636     4390076      725680     1658244     2988880     1802888
Swap:       2097148           0     2097148
Total:     10201784     4390076     2822828
From 50 to 95% is a big jump, and the buffer is supposed to be available if needed. With BOINC running at such a low priority I'm not sure how well that works though.
Compare that to my 7 GPU machine with 16 GB
             total        used        free      shared  buff/cache   available
Mem:       16377688     4030396     8365100     1432360     3982192    10570160
Swap:      12287996           0    12287996
Total:     28665684     4030396    20653096
It's constantly showing 8 GB 'Free', or about half. In the System Monitor/Resources it shows only 5.5 GB in use, until I fire up FireFox...
But, free -t does show mine as having 10 Gb available.

The old CUDA App had -poll, which does the same as -nobs, it tries to assign a full CPU core to the CUDA App even though it doesn't need a full CPU. It will use what it has though. which is why my 14 GPU machine shows around 50% CPU use. nvidia-setting uses a Full CPU, that leaves 7 Cores for 14 GPUs or 50%. That 50% saves just as much time as 100% would if it could use 100%. I think it works better, the same savings only using 50% CPU. I did see quite a difference in RAC by going to -nobs on the 14 GPU machine. All machines are different though and you'll just have to test it. My CPU coolers work fine on the i7-6700, but don't cut it on the 6700K.
ID: 2005098 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2005099 - Posted: 1 Aug 2019, 1:53:44 UTC - in response to Message 2005095.  
Last modified: 1 Aug 2019, 1:55:36 UTC

The parameter stands for "no blocking sync" It pins the cpu thread to the gpu task so its attention doesn't get pulled away to service another process and then the gpu thread has to wait its turn in the round robin thread process rotation before it can ask for more data from the cpu. So it speeds up the gpu task because the thread gets serviced promptly every time the gpu asks for more data or attention.

I believe your other handle over on the other projects is BeemerBiker isn't it? Or at least your avatar looks similar. The change from blocking sync to spin sync over at GPUGrid knocks significant times off the crunching time. It is the recommended version of sync to use at that project. It can save several seconds here at Seti with the special sauce app.



I was guessing No B.S. and was looking at cc_config for usage. Only today realized it was in app_config and SETI.

In win 10 "User variables for josep" I have SWAN_SYNC set to 1 but have not rebooted or signed out/in so has not taken effect
I assume that is the same nobs? Also do you know if it goes into "System Variables" instead?

It takes a long time to re-sync my grc wallet and I don't want to reboot when staking so have not tested that yet.

Also a number of projects have outdated google captcha and I have been unable to change personal information because the captcha does not work. gpugrid is one of those projects but I was able to change my nickname, just can change any personal into till they update their web server to latest google captcha. Retired 10 years ago. Most of the 80k miles on my R1100rt were before I retired and I passed it on to my grandson two months ago.
ID: 2005099 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2005101 - Posted: 1 Aug 2019, 2:05:50 UTC - in response to Message 2005099.  

You input the -nobs parameter into the cmdline entry in either the app_info or an app_config. It is only for Seti special app.

This is my entry in my app_config.
<cmdline>-nobs -pfb 32</cmdline>
The advantage of putting it in app_config is that you don't have to restart BOINC to read it. You can use the Manager to re-read config files to pick up the change. If you put it into app_info, they you need to stop and restart BOINC to pick it up.

I fondly remember all the years and miles with my BMW R90S. Until I crashed it. Oh well.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2005101 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2005109 - Posted: 1 Aug 2019, 3:19:21 UTC - in response to Message 2005035.  

Is %85 utilization about right? I am guessing as nvidia-smi shows the instantaneous value at the time I run the program. I do see 0% and 100% occassionially and the power meter fluctuates from 790 to 900 watts. I have 3 gpus on a single splitter and the other 5 are on their own slots.


I have a raft of gtx 1060/1070 running on a 1 to 4 expander. When I use the -nobs on the command line with 1 cpu / gpu I usually get 98%+ from my nvidia-smi command except when changing tasks.

I don't think you are running any cpu tasks. So I would experiment with both -nobs and without it. As well as experiment with 0.49 cpu / gpu as well as 1 to 1 on the cpu to gpu ratio.

When I run without -nobs, the Linux task manager normally shows the gpus mostly stopped and running just a bit. You can also try running without -nobs but running 1 cpu to 1 gpu. While this will use all your cpu threads it will not load the cpu up as much (I think that was a result I had).

HTH,
Tom
A proud member of the OFA (Old Farts Association).
ID: 2005109 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2005115 - Posted: 1 Aug 2019, 4:21:04 UTC - in response to Message 2005092.  
Last modified: 1 Aug 2019, 4:34:02 UTC

Glad you found the setting to increase the memory share to fix the problem. You really don’t need a lot of RAM if you’re just running SETI. 8GB is more than sufficient once you bump that setting up. As you said, you’re not doing anything else with the system. No need to have almost half your ram sitting there doing nothing when it could be used.

Oh and forget about trying to run the 650ti. At least not in that same system. That card is to old to be used on the special app. You could put it in a different system running SoG.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2005115 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2005120 - Posted: 1 Aug 2019, 6:32:33 UTC

Just tried gtx 1050ti and not working with the cuda90 app.
"not enough vram for autocorrelations"

   Device 5: GeForce GTX 1050 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 1050 Ti
Using pfb = 32 from command line args
Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1

setiathome v8 enhanced x41p_V0.98b1, Cuda 9.00 special
Modifications done by petri33, compiled by TBar

Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.007390
Sigma 378
Sigma > GaussTOffsetStop: 378 > -314
plan autocorr R2C batched FFT failed 5
Not enough VRAM for Autocorrelations...
setiathome_CUDA: CUDA runtime ERROR in device memory allocation, attempt 1 of 6
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.
13 waiting 5 seconds...
 Reinitialising Cuda Device...
Cuda error 'Couldn't get cuda device count
' in file 'cuda/cudaAcceleration.cu' in line 154 : invalid device symbol


I thought gtx1050ti was OK. Maybe this is one of those fakes sold on ebay occasionally?
anyway it was worth a try. It does run Einstein OK, just cratered on SETI.

tb85-nvidia

5			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 390.11, CUDA version 9.1, compute capability 6.1, 4096MB, 3986MB available, 6852 GFLOPS peak)	
6			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 1: GeForce GTX 1060 6GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 4096MB, 3988MB available, 4698 GFLOPS peak)	
7			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 2: GeForce GTX 1060 3GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 3019MB, 2944MB available, 3936 GFLOPS peak)	
8			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 3: GeForce GTX 1060 3GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 3019MB, 2944MB available, 3936 GFLOPS peak)	
9			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 4: GeForce GTX 1050 Ti (driver version 390.11, CUDA version 9.1, compute capability 2.1, 4027MB, 3924MB available, 576 GFLOPS peak)	
10			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 5: GeForce GTX 1060 3GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 3019MB, 2944MB available, 3936 GFLOPS peak)	
11			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 6: GeForce GTX 1070 (driver version 390.11, CUDA version 9.1, compute capability 6.1, 4096MB, 3986MB available, 6463 GFLOPS peak)	
12			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 7: GeForce GTX 1060 3GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 3019MB, 2944MB available, 3936 GFLOPS peak)	
13			8/1/2019 1:13:54 AM	CUDA: NVIDIA GPU 8: GeForce GTX 1060 3GB (driver version 390.11, CUDA version 9.1, compute capability 6.1, 3019MB, 2944MB available, 3936 GFLOPS peak)	
14			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 390.116, device version OpenCL 1.2 CUDA, 8118MB, 3986MB available, 6852 GFLOPS peak)	
15			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 1: GeForce GTX 1060 6GB (driver version 390.116, device version OpenCL 1.2 CUDA, 6078MB, 3988MB available, 4698 GFLOPS peak)	
16			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 2: GeForce GTX 1060 3GB (driver version 390.116, device version OpenCL 1.2 CUDA, 3019MB, 2944MB available, 3936 GFLOPS peak)	
17			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 3: GeForce GTX 1060 3GB (driver version 390.116, device version OpenCL 1.2 CUDA, 3019MB, 2944MB available, 3936 GFLOPS peak)	
18			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 4: GeForce GTX 1050 Ti (driver version 390.116, device version OpenCL 1.1 CUDA, 4027MB, 3924MB available, 576 GFLOPS peak)	
19			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 5: GeForce GTX 1060 3GB (driver version 390.116, device version OpenCL 1.2 CUDA, 3019MB, 2944MB available, 3936 GFLOPS peak)	
20			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 6: GeForce GTX 1070 (driver version 390.116, device version OpenCL 1.2 CUDA, 8120MB, 3986MB available, 6463 GFLOPS peak)	
21			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 7: GeForce GTX 1060 3GB (driver version 390.116, device version OpenCL 1.2 CUDA, 3019MB, 2944MB available, 3936 GFLOPS peak)	
22			8/1/2019 1:13:54 AM	OpenCL: NVIDIA GPU 8: GeForce GTX 1060 3GB (driver version 390.116, device version OpenCL 1.2 CUDA, 3019MB, 2944MB available, 3936 GFLOPS peak)	
ID: 2005120 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2005121 - Posted: 1 Aug 2019, 6:33:13 UTC - in response to Message 2005092.  
Last modified: 1 Aug 2019, 7:02:44 UTC

...I'd suggest you add more RAM if you don't want any more problems.
Here's a Tip you and others might try when testing new configurations to help cut down on the reported Errors,
Before starting BOINC, open client_state.xml, go to the end of the file, and set it to;
    </file_ref>
</project_files>
<active_task_set>
</active_task_set>
By removing all the active tasks in between <active_task_set>&</active_task_set>, this will make all the tasks start from the beginning.
and;
<user_network_request>3</user_network_request>
The 3 will start BOINC in Network Suspended Mode and prevent tasks from uploading.
Then copy client_state_prev.xml & client_state.xml and paste them to a different directory, I use my Home folder since the BOINC folder is also there.
Then Start BOINC. If tasks immediately Error out, Stop BOINC, make sure all tasks are Stopped, then Simply copy and paste those two files back into the BOINC folder overwriting the ones that are now filled with Errors. Since No tasks were uploaded, you are back to a Clean client_state.xml without Errors. If it doesn't Spew errors, you can simply Resume Networking once everything is running OK.
Otherwise, you will be reporting quite a Few Errors. It works for me, most of the time...

Best thing to do with the 1050Ti is to test it in another Linux CUDA machine, preferably by itself, and see how it works there. I always test my 'new' cards in a limited setting before adding them to the flock.
In fact, you don't even need to do that this time with that Fake 1050Ti, here, CUDA: NVIDIA GPU 4: GeForce GTX 1050 Ti (driver version 390.11, CUDA version 9.1, compute capability 2.1, 4027MB, 3924MB available...
It's a Fermi card, faked to look like a Pascal, straight from China no doubt. I have a Fake 970 that also reports compute capability 2.1, mine's really a 550Ti, with 6 One GB RAM chips on it.
ID: 2005121 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : What percent utilization to expect on the CUDA90 linux client?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.