Message boards :
Number crunching :
SETI applications for NVIDIA GPU improvement - how you can help
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 14 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Good to hear that, thanks. Please try to use smth like -tt 30 and -tt 60 - will lags appear and on what number? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Raistmer, what is the difference between -tt 1500 and -high_prec_timer? -high_prec_timer sets system's multimedia timer resolution to 1 ms. This will allow real 1ms Sleep if Sleep(1) specified (instead of smth like 15 ms on some hosts). So, -use_sleep will work more precise and cause less performance degradation while keeping CPU consumption low. But beware, such system-wide change could have some side-effects. Testing needed. And -tt F option just sets desired partial PulseFind kernel length in ms. So quite strange you see big difference with and w/o -tt 1500 on VHAR task. I think some statistics collection will be required here. SETI apps news We're not gonna fight them. We're gonna transcend them. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greetings Raistmer Apologies, but my crunchers are dual GPU setups and have an AMD card driving the Video Display and the NVIDIA GPU's are in the 2nd slot of all PC's crunching. I have 3 AMD cards but only 2 are crunching and they are running default R3500 with no cmd line options. I should have mentioned this in my previous email. Regards |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
New set of builds (r3500) available here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip I know you would like varied ARs, however, all WUs originate from Beta and are all Guppi VLARs.  Let me know if my testing needs improving. >  wo -use_sleep
Task 24493997 Run time : 9s CPU time : 7s WU true angle range : 0.008175 Task 24494050 Run time : 18m 24s CPU time : 18m 19s >Â Â -tt 1000
Task 24494101 Run time : 17m 32s CPU time : 17m 25s WU true angle range : 0.008175 Task 24494145 Run time : 8m 45s CPU time : 8m 43s WU true angle range : 0.008175 Task 24494245 Run time : 9s CPU time : 7s >Â Â -use_sleep -tt 1000
Task 24495067 Run time : 20m 40s CPU time : 3m 16s WU true angle range : 0.008168 Task 24495176 Run time : 9s CPU time : 7s WU true angle range : 0.008175 Task 24494294 Run time : 6m CPU time : 1m 51s >Â Â -use_sleep with -high_prec_timer
Task 24505601 Run time : 18m 45s CPU time : 6m 7s WU true angle range : 0.008175 Task 24505832 Run time : 18m 46s CPU time : 6m 15s |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
 Let me know if my testing needs improving. Thanks for report. Lot of listed tasks are overflows. They are not representative ones so can be excluded from future reports completely.
These 2 ran in identical environments, right? Still show very big timings divergence. I would like to see some statistics for each of options then (let say 5 non-overflowed results per each of set of settings). Also, did you run single task per GPU in this test or more? P.S. From this app's point of view your device can be considered as high-end one so increasing -sbs N value (default is 128), switching to high-perf path and other options for high-end GPU can improve performance w/o GUI lags perhaps. Longest partial PulseFind was under 100ms (that is, there is no difference between -tt 100 and -tt 1000 for this device). P.P.S. Did you experience GUI lags with -tt 1000 and w/o -use_sleep ? SETI apps news We're not gonna fight them. We're gonna transcend them. |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
Lot of listed tasks are overflows. They are not representative ones so can be excluded from future reports completely. Single, sorry, should have mentioned that. P.S. From this app's point of view your device can be considered as high-end one so increasing -sbs N value (default is 128), switching to high-perf path and other options for high-end GPU can improve performance w/o GUI lags perhaps. Sporadic, only minor.  I'll add -high_perf to all subsequent testing. |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
r3500 Here are the latest, but I've a bad feeling that they are all overflows.  What is in the Stderr output that indicates overflows?  Are overflows preventable on my end?  The WUs are running upwards of 4 minutes longer since the stock r3486.  Additionally, my GPU is not OC'd, it's strictly stock with default settings. >  -high_perf -use_sleep
Task 24508400 Run time : 20m 26s CPU time : 4m 8s WU true angle range : 0.008175 Task 24508677 Run time : 20m 7s CPU time : 5m 22s WU true angle range : 0.008175 Task 24508496 Run time : 18m 59s CPU time : 4m 48s WU true angle range : 0.008175 Task 24508572 Run time : 20m 42s CPU time : 5m 9s WU true angle range : 0.008175 Task 24508494 Run time : 20m 42s CPU time : 3m 45s |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The sign of overflow is the next lines in stderr: SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. In properly working app overflow is completely data-driven and can't be prevented (it's just noisy data in task). Please provide similar set of results for -high_perf and w/o -use_sleep. SETI apps news We're not gonna fight them. We're gonna transcend them. |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
The sign of overflow is the next lines in stderr: Thanks for the clarification.  The WUs below were the fastest yet. >  -high_perf
Task 24508683 Run time : 18m 02s CPU time : 18m WU true angle range : 0.008175 Task 24508590 Run time : 17m 59s CPU time : 17m 56s WU true angle range : 0.008175 Task 24508506 Run time : 18m 03s CPU time : 18m 01s WU true angle range : 0.008175 Task 24508481 Run time : 18m 10s CPU time : 18m 05s WU true angle range : 0.008168 Task 24508042 Run time : 18m 03s CPU time : 18m |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
thanks. Seems for your GPU/CPU combo enabling sleep costs only 2/18 of GPU performance. Interesting how many of GUPPIU tasks (less than 1 of course, but what ~% of task) could your CPU complete for 14 minutes? SETI apps news We're not gonna fight them. We're gonna transcend them. |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
r3500 These were the next fastest WUs.  They were 30s - 60s longer than -high_perf > -high_perf -sbs 384 -high_prec_timer
Task 24508711 Run time : 18m 10s CPU time : 18m 03s WU true angle range : 0.008175 Task 24508487 Run time : 18m 03s CPU time : 17m 58 WU true angle range : 0.008175 Task 24508354 Run time : 18m 09s CPU time : 18m 05s WU true angle range : 0.008175 Task 24508413 Run time : 18m 11s CPU time : 18m 05s WU true angle range : 0.008604 Task 24513083 Run time : 18m 04s CPU time : 18m ------------------------------------------
Task 24508474 Run time : 20m 29s CPU time : 3m 55s WU true angle range : 0.008175 Task 24508392 Run time : 20m 02s CPU time : 4m 32s WU true angle range : 0.008604 Task 24513116 Run time : 21m 12s CPU time : 3m 35s WU true angle range : 0.008604 Task 24513081 Run time : 19m 30s CPU time : 4m 36s WU true angle range : 0.008168 Task 24513798 Run time : 20m 11s CPU time : 4m 10s
|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Let me know if you want any other combination of parameters in the cmdline.  I'm going to test some that are listed in "ReadMe_MultiBeam_OpenCL.txt under the "NV specific info" section.  Please check your PM inbox. Please remind your GPU model. And please try complete "stock" config with only single option added: -tt F Values to try as F: 30, 60, 90, 120, 180 Look for crunching experience with BLC* tasks - on what value GUI lags begin and how strong they are? P.S. I would like to receive similar feedback from others too. Especially from ones with low-end, midrange GPU models. High-end ones can't saturate on PulseFind so give distorted results for this test. P.P.S. also would be good to repeat exactly same test with another option added: -sbs 256 SETI apps news We're not gonna fight them. We're gonna transcend them. |
rob smith Send message Joined: 7 Mar 03 Posts: 22457 Credit: 416,307,556 RAC: 380 |
A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s. (BOINC reports this cruncher as having three GTX1080, if only, if only...) When it comes to setting up the "command line" should I use the settings for the GTX970 or the GTX1080, or something in between. Or Maybe I should try and run two copies of BOINC, one for each of the GPU types? I'm not near my computers just now, and won't be able to access them for a few hours. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s. I'm afraid both GTX970 and GTX1080 are too fast to provide reliable results for usability testing with -tt F option. Try -tt 1500 - will you see any lags on BLC VLARs? And unless one uses special config file for MB app option will be applied to all installed GPUs of the same vendor. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Mike Send message Joined: 17 Feb 01 Posts: 34354 Credit: 79,922,639 RAC: 80 |
A question, one of my crunchers has two different GPUs - a GTX1080 and two GTX970s. I can provide device specific comand line file if you are interested. PM me for details. With each crime and every kindness we birth our future. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Raistmer, . . I have added the command line for r3500 as follows: -use_sleep -high_prec_timer -sbs 512 -tt 1500 . . I think they are the settings you recommended for GTX970 cards. . . Runtimes are now about 8/9 mins for nonVLARs and 14/16 mins for Guppies, but they are BLC5 guppies, the slowest and meanest of all the 2bit guppies. The good news is that I am experiencing NO perceptible mouse or keyboard lag. Now my typos are all just due to my thick clumsy fingers not lag. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
That means that GTX970 can't be saturated on Partial PulseFind indeed. Otherwise you will experience quite noticeable lags. -tt 1500 means 1,5 s (!) kernels. Obviously, each particular PulseFind just can't provide enough data for GTX970 to allow so long running kernel. That's why I asking owners of low-end and midrange GPUs to provide data about -tt F option and GUI usability. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . OK, after the WOW event is over I will try it out on the GT730 rig. I like to think of it as higher end of low range with CC of 3.5 but definitely low range with only 2 CUs. So in 2 weeks I will let you know what effects I observe on that rig. . . As for the 970s, 4 lockups since installing r3500 this am but I am still convinced that is more a system management problem between Win10 and the MoBo. But usage is less than optimum running one WU. Should I try the tuning command line Richard listed in the Lunatics 0.45 Beta thread? He missed out the 970 but I am guessing the 980 string might work. . . There is a definite ultilisation pattern observable. For the first half of the WU useage is high like about 80/90% but then drops low to about 40/50% on nonVLARs. It remains more constant on Guppies but there is a noticeable drop after the halfway mark, by about 10/15%. . . I am not sure whether to try multiple instances or not at this point. P.S. If 1500 is over the top should I reduce -tt to a lower value? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
But usage is less than optimum running one WU. Should I try the tuning command line Richard listed in the Lunatics 0.45 Beta thread? He missed out the 970 but I am guessing the 980 string might work. One should clearly understand that requests I make in this thread in no way connected with optimization of particular host setup. Such efforts should go elsewhere. My goal is to improve app for all.
This info can be translated in search types and their relative performance: task processing arranged from zero to highest possible chirps (that is, account for doppler shift). With chrp value increase algorithm switches to more and more rough searches leaving only some of FFT sizes and turning off Gaussian search and even PulseFind. Remaining searches are less compute-intensive so GPU usage drops. Hence it's recommended to run at least 2 tasks per GPU in general (of course, if context switch penalty low enough, never run 2 tasks per pre-FERMI GPU for example). P.S. If 1500 is over the top should I reduce -tt to a lower value? Windows TDR delay is 2 seconds so 1,5s should go w/o driver restarts. Actually, I used -tt 1500 on GT720 when did some other profiling and wanted to reduce partial PulseFind kernels number to minimum. Same on C-60. With great GUI lags, of course, but w/o driver restarts. What is "lockup" you mention? Driver restart or GUI complete freeze and cold-reset required? If latter then it probably driver issue. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . It is the latter case. I am pretty sure it is a WIN10 driver issue as it did not happen at all when running Win7. I am more and more convinced I need to return to Win7 to solve that completely. But for now mimimising the effect is my short term objective. . . I will take the tuning issue to another thread. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.