New CUDA 10.2 v0.99 Mutex Special App

Message boards : Number crunching : New CUDA 10.2 v0.99 Mutex Special App
Message board moderation

To post messages, you must log in.

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022381 - Posted: 8 Dec 2019, 18:25:28 UTC
Last modified: 8 Dec 2019, 18:31:46 UTC

Both petri33 and Oddbjornik have given the OK to release this more publicly so here it is!

This is the work sync mutex version of the famed Linux "special sauce" application. named V0.99. petri33 is the author of the bulk of the source code, but with the work sync mutex function added by Oddbjornik. I only compiled it :). This app cuts out the load time of the next WU by pre-loading it while the first WU is running. This saves 1-5 seconds, sometimes more, per WU processing time. As such, it is slightly more productive overall than the regular v0.98 application.

Download here: https://drive.google.com/open?id=1hTuDeZhtGDwyEiVQnsUZfZvcvrFZIPF6

Requirements/notes:
1. You must be running Linux, I built/tested this on Ubuntu 18.04
2. You must have Nvidia driver version 440.xx+ installed, 440.36 is the latest driver at this time. this is a CUDA 10.2 requirement, not a Special app requirement. If you don't
3. As always, make sure the executable is set to allow execution, or you will generate compute errors due to lack of permissions.
4. This is the app ONLY. you need to add this to either the AIO package distributed by Tbar and edit the app_info.xml file appropriately, or add this to your boinc directory for a repository install (if you've added the special app previously to your repo install then you should already know where and how to put this)
5. For this to function as intended, you need to configure BOINC to run 2 WUs at a time on your GPU(s). you do this using an app_config.xml file like so:
 <app_config>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config>

*Note, If you run AP tasks on your GPU and you want them to be processed in FIFO fashion you need to also configure 2x WUs for AP tasks, or they will never run, or only run when they get close to timing out or you have no MB/v8 tasks to work on.

The one downside:
Since you are running double the number of WUs, you will use double the system memory resources, both on the CPU and GPU.

with 2 RTX 2080s running this:
system using ~4GB of system RAM
and ~3GB per GPU.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36       Driver Version: 440.36       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    On   | 00000000:01:00.0  On |                  N/A |
| 65%   70C    P2   193W / 200W |   3254MiB /  7979MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 2080    On   | 00000000:03:00.0 Off |                  N/A |
| 50%   64C    P2   198W / 200W |   2832MiB /  7982MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       970      G   /usr/lib/xorg/Xorg                           240MiB |
|    0      1109      G   /usr/bin/gnome-shell                         195MiB |
|    0      1704      C   ./keepP2                                     111MiB |
|    0     19733      C   ...p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102  1351MiB |
|    0     19793      C   ...p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102  1351MiB |
|    1       970      G   /usr/lib/xorg/Xorg                             6MiB |
|    1      1705      C   ./keepP2                                     111MiB |
|    1     19692      C   ...p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102  1351MiB |
|    1     19759      C   ...p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102  1351MiB |
+-----------------------------------------------------------------------------+


So first and foremost, you may need to increase the BOINC memory allocation (the default is 50%, if you're "crunching only" then going to 90% is fine), or simply add more memory if you are able to or want to.

You may run into problems if you run this on 2GB GPUs, 3GB GPUs that are driving a monitor, or with >=2GB system RAM per GPU. I would probably say only use this on 4GB+ cards to be safe.

Enjoy!
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022381 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022401 - Posted: 8 Dec 2019, 21:26:28 UTC
Last modified: 8 Dec 2019, 21:54:42 UTC

Excellent!

If anyone want to try be aware: The mutex special sauce crunching program works on SETI project only. AFAIK the rest of the projects does not have any equivalent app.
ID: 2022401 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022415 - Posted: 8 Dec 2019, 22:50:10 UTC - in response to Message 2022401.  

Yeah this is just the SETI Science app. It doesn’t have anything to do with BOINC other than the setting to run 2 WU/GPU.

You may need to play around with your resource allocation if you’re running multiple projects. I only run SETI right now, so I can’t really comment on it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022415 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022437 - Posted: 9 Dec 2019, 1:35:25 UTC - in response to Message 2022415.  

Yeah this is just the SETI Science app. It doesn’t have anything to do with BOINC other than the setting to run 2 WU/GPU.

You may need to play around with your resource allocation if you’re running multiple projects. I only run SETI right now, so I can’t really comment on it.

I found it needed too much fiddling with on a host running multiple concurrent projects.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022437 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022440 - Posted: 9 Dec 2019, 1:49:14 UTC - in response to Message 2022437.  

What was the issue you had specifically?

I’m guessing that it tended to only run SETI jobs and not allow other projects to run?
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022440 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022442 - Posted: 9 Dec 2019, 2:05:09 UTC - in response to Message 2022440.  

What was the issue you had specifically?

I’m guessing that it tended to only run SETI jobs and not allow other projects to run?

No, I had set my other projects to 0.5 gpu count usage too. I just didn't see the speedup when using the mutex app mainly because I have SSD and M.2 storage so load access was minimal already.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022442 · Report as offensive
wujj123456

Send message
Joined: 5 Sep 04
Posts: 40
Credit: 20,877,975
RAC: 219
China
Message 2022445 - Posted: 9 Dec 2019, 2:44:17 UTC - in response to Message 2022442.  
Last modified: 9 Dec 2019, 2:46:17 UTC

What was the issue you had specifically?

I’m guessing that it tended to only run SETI jobs and not allow other projects to run?

No, I had set my other projects to 0.5 gpu count usage too. I just didn't see the speedup when using the mutex app mainly because I have SSD and M.2 storage so load access was minimal already.

I am curious what you see with "nvidia-smi dmon". For me, it usually shows a 3-4 second pause in SM load when switching to next WU and that gap is completely gone with the mutex app. I still have the exact same setup for my rig with one 1080 and one 1660 super. Previous I set 1 CPU + 1 GPU per WU and now it's 0.5 CPU + 0.5 GPU per WU. The CPU utilization is same from what I observe. I have a cron pulling WU count every 10 min and it used to churn out 15-16 WU every 10 minutes, but now it's 16-18. Fairly sure it's statistically significant given how consistent the rate is.

I don't see obvious difference in that pause period between my two cards which are on16X and 4X PCIE3 respectively, so I assume they are mostly disk loads. My Linux is currently on a temporary USB stick (100MB/s read) since my only M.2 slot is occupied by my Windows installation, but I am upgrading soon. I was actually thinking about just going with a cheap SATA SSD, but if M.2 SSD is fast enough to make the pause entirely go away, I will definitely make 2+ M.2 slot a requirement when looking for a new MB. The price difference between decent SATA and M.2 SSD isn't that big at this point.
ID: 2022445 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022447 - Posted: 9 Dec 2019, 2:49:50 UTC - in response to Message 2022442.  
Last modified: 9 Dec 2019, 2:57:16 UTC

What was the issue you had specifically?

I’m guessing that it tended to only run SETI jobs and not allow other projects to run?

No, I had set my other projects to 0.5 gpu count usage too. I just didn't see the speedup when using the mutex app mainly because I have SSD and M.2 storage so load access was minimal already.

Since you use fast storage devices & CPU's the gain is minimal.
Slower is your host (CPU, Memory, PCIe speed, HDD, SSD, M2, WU cache size, etc. everything counts) bigger is your gain.
That is one of the wonders of the mutex. LOL
And since others projects not has the mutex support there are no gain on a multiproject host.
That is why i always say, use the mutex mainly when you have a SETI only host with a lot of memory to spare.
ID: 2022447 · Report as offensive
Profile Joseph Stateson Project Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 309
Credit: 70,759,933
RAC: 3
United States
Message 2022485 - Posted: 9 Dec 2019, 15:27:50 UTC

Ran incredible new program overnight. Got 3000+ work units processes on each of two system using two per gpu.
total of 16,000 samples. Both system had adequate ram and similar CPU and not using "-nobs"

GTX 1070/80 with SSD: got 3 second improvement over the single WU / GPU
1.17 => 1.14

GTX 1060 only with HD: got 12 second improvement ==> 500 extra work units a day.
2.17 => 2.05
ID: 2022485 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2022487 - Posted: 9 Dec 2019, 15:44:12 UTC - in response to Message 2022485.  

I have SSDs on all my crunchers. When I tested the 0.99 App I saw no improvement over 0.98. It seems if you have modern fast storage there isn't any advantage in load times.
I'm working on 0.98, which may at some point be accepted to SETI Beta. I doubt very seriously 0.99 will ever be considered for the SETI servers, which means I won't waste my, or anyone else's, time on it.
ID: 2022487 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022495 - Posted: 9 Dec 2019, 17:16:45 UTC - in response to Message 2022487.  

All of my crunchers use SATA 3 SSDs also. I still saw 3-5s improvements. It’s more than just the SSD. Slower CPUs will benefit also.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022495 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022497 - Posted: 9 Dec 2019, 17:18:59 UTC - in response to Message 2022485.  

Ran incredible new program overnight. Got 3000+ work units processes on each of two system using two per gpu.
total of 16,000 samples. Both system had adequate ram and similar CPU and not using "-nobs"

GTX 1070/80 with SSD: got 3 second improvement over the single WU / GPU
1.17 => 1.14

GTX 1060 only with HD: got 12 second improvement ==> 500 extra work units a day.
2.17 => 2.05


Thanks for the results. Glad it’s working well for you :)
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022497 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022506 - Posted: 9 Dec 2019, 18:00:01 UTC

Another benefit of this Mutex build, Is that running mutex or no mutex is user configurable.

If you want to run the mutex. You set the app_config to run 2 WU per GPU as above. If you do not want to run the mutex, you just set it to 1 WU per GPU and it’s functionally identical to the V0.98 app. You have the choice.

As has been postulated in the past, future development on the Special app is likely to continue on from V0.99 since petri seems happy with it.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022506 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022515 - Posted: 9 Dec 2019, 18:41:06 UTC
Last modified: 9 Dec 2019, 18:42:32 UTC

It's a reality the Mutex builds are here and works. So why not use them?

Something to think about:

If you run a SETI only cruncher...

- Mutex are better? Yes. They give you few seconds of extra crunching time per WU and have the ability to work as a normal crunching program (1 WU at a time) or using the Mutex option (2 or more WU at a time). Bigger your WU cache or slower your host file access time more likely the mutex are for you.

- Mutex are for anyone? - No. They use 2 x more memory resources when you run 2 WU at a time. So only who has a lot of memory to spare could try.

If you run a multiproject host....

- Stay away from the mutex. For now there are no support for the mutex on the other projects. So weird things could happening.

And BTW Mutex are Linux only. Unless someone decide to port them to Windows.

my 0.02
ID: 2022515 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022517 - Posted: 9 Dec 2019, 18:51:11 UTC - in response to Message 2022515.  
Last modified: 9 Dec 2019, 18:57:42 UTC


If you run a multiproject host....

- Stay away from the mutex. For now there are no support for the mutex on the other projects. So weird things could happening.


Keith posted that it does work with other projects if you set them to also run 2 WU at a time. when another project's WU lines up behind a seti task, it wont wait like the SETI task would. it would run them both at the same time. the same thing happens with AP tasks. If your resource share at other projects is fairly low, this is probably tolerable. you can also play around with the exact resource allocation on each app type.

but if you are running a large portion of your computer resources on other projects, I would either stick to running 1 WU per GPU in your settings and the mutex code wont ever come into play at all. it doesn't hurt anything to do that.

but this app certainly is aimed more at those who are only running SETI.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022517 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022519 - Posted: 9 Dec 2019, 19:02:12 UTC

Yes it does work for other projects occupying the same card as a Seti task. But I notice that once the shared Seti task finishes a mate of the other project running joins it so you end up with the normal two tasks from a single project running for as long as they need to to rebalance the REC debt to Seti or deadlines are approaching. Then once that has happened then a Seti task resumes until the other project task finishes and then the expected mutex lock Seti task takes residence. I only tested with MilkyWay and Einstein. MW task run around 2 minutes or so and Einstein tasks run for around 10 minutes. So that is the longest possible time you might see two different project tasks running on the mutex lock app. All the various project tasks ran correctly and validated. I didn't see anything different with the card memory or temps. I never run the GPUGrid tasks at two up so that was never tested. Those tasks utilize every bit of the gpu and run for 4-5 hours.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022519 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2032756 - Posted: 17 Feb 2020, 3:16:48 UTC

I trimmed some fat from my previous compile.

get the collection of apps here: https://drive.google.com/open?id=1FK5qg-KaSn4kDeNbi90n8UUWQ-csCMZt

There are 3 versions:

_MPT = support for all Maxwell/Pascal/Turing cards
_MP = Support for Maxwell/Pascal only, no Turing support (should run a little faster than MPT in some cases)
_PT = Support for Pascal/Turing only, no Maxwell support (should run a little faster than MPT in some cases)

I figured most people would be running these combinations of card generations, or all the same card. I ran several hours of tests on a GTX 1650 and the PT app was a hair faster overall than the MPT app, and even the MPT app was faster than the previous version. so unless you're running all three card types in the same system, or constantly moving cards around, you're probably better off using one of the other more targeted apps.

let me know if you have any issues. should have the permissions set this time, but as always make sure you double check.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2032756 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2032785 - Posted: 17 Feb 2020, 11:19:15 UTC - in response to Message 2032756.  
Last modified: 17 Feb 2020, 11:26:50 UTC

Looks like there was a problem with the MP file. I guess the compile didnt like the way that I omitted some code with comments. I regenerated that one. it should work now

here's a new link to the package of builds: https://drive.google.com/open?id=1ZXl8naZRdfTfozWUzZWAnS21keu5CYCH

I've also made it so that you do not have to download all 3, you can pick which one you want to download that applies to your system(s).
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2032785 · Report as offensive

Message boards : Number crunching : New CUDA 10.2 v0.99 Mutex Special App


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.