Message boards :
Number crunching :
High performance Linux clients at SETI
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 20 · Next
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
It appears to be running faster with -nobs -pfb 6 but it is the first time I have tried it on either machine. My larger box is running a mix 0f gtx 1070Ti's and gtx 1060 3GB's. Do you have an idea of which -pbf might be better? Or is letting the default which (I hope) changes based on the video card? Tom A proud member of the OFA (Old Farts Association). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
i guess i was most surprised with the difference between the 1080 and the 1080ti. 1080 acts more like the lesser cards with a 10-15% increase, but the 1080ti gets a larger percentage increase, 20+%. even though these cards are pretty similar in design, same architecture, same memory type. I would expect the same percentage increase. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
if you don't know what you're doing, it's best to just leave default values. Petri did mention in a previous post that you can try to use a larger unroll value. default is 1, try adding -unroll 2 to the command line. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
W3Perl Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606 |
1060 => 10 1070 Ti => 19 So have a try with 16 ! You can use the benchmark test to check which value if the best for you. On my GTX 1070, I don't see any difference between 16 and 32...so I keep 16 (1920/128 = 15 so this is what I expected). using -nobs will give you only a few seconds extra. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error. ┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┠│Job#│Slot│xPU│app_name │ start │ finish │tot_time │ state │ │ │ │ │app_args │wu_name │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │0 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:15:31│20:16:10│0:00:39.035│COMPLETE│ │ │ │ │ -device 0 -nobs │21no18aa.19740.24238.14.41.27.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │1 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:16:10│20:16:49│0:00:39.037│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │21no18aa.19740.24238.14.41.27.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │2 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:16:49│20:17:52│0:01:03.052│COMPLETE│ │ │ │ │ -device 0 -nobs │10jn08ab.12748.230459.14.41.51.vlar.wu│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │3 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:17:52│20:18:55│0:01:03.060│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │10jn08ab.12748.230459.14.41.51.vlar.wu│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │4 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:18:55│20:19:58│0:01:03.045│COMPLETE│ │ │ │ │ -device 0 -nobs │12fe07ac.6929.16025.15.42.86.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │5 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:19:58│20:21:01│0:01:03.047│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │12fe07ac.6929.16025.15.42.86.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │6 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:21:01│20:21:55│0:00:54.042│COMPLETE│ │ │ │ │ -device 0 -nobs │blc04_2bit_blc04_guppi_57898_17662_DIA│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │7 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:21:55│20:22:49│0:00:54.041│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_blc04_guppi_57898_17662_DIA│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │8 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:22:49│20:23:49│0:01:00.048│COMPLETE│ │ │ │ │ -device 0 -nobs │08ja07aa.588.24203.13.40.58.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │9 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:23:49│20:24:49│0:01:00.061│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │08ja07aa.588.24203.13.40.58.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │10 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:24:49│20:25:28│0:00:39.037│COMPLETE│ │ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_08930_HIP74235_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │11 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:25:28│20:26:07│0:00:39.040│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_08930_HIP74235_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │12 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:26:07│20:26:46│0:00:39.031│COMPLETE│ │ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_09361_HIP74981_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │13 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:26:46│20:27:25│0:00:39.039│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_09361_HIP74981_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │14 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:27:25│20:28:28│0:01:03.055│COMPLETE│ │ │ │ │ -device 0 -nobs │07mr07ai.12583.49160.3.30.231.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │15 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:28:28│20:29:29│0:01:00.053│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │07mr07ai.12583.49160.3.30.231.vlar.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │16 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:29:29│20:30:08│0:00:39.038│COMPLETE│ │ │ │ │ -device 0 -nobs │blc13_2bit_guppi_58405_85972_GJ687_002│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │17 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:30:08│20:30:47│0:00:39.029│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc13_2bit_guppi_58405_85972_GJ687_002│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │18 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:30:47│20:31:23│0:00:36.029│COMPLETE│ │ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_09694_HIP74284_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │19 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:31:23│20:31:59│0:00:36.027│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_09694_HIP74284_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │20 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:31:59│20:32:35│0:00:36.033│COMPLETE│ │ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_10365_HIP74315_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │21 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:32:35│20:33:11│0:00:36.033│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_10365_HIP74315_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │22 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:33:11│20:33:50│0:00:39.030│COMPLETE│ │ │ │ │ -device 0 -nobs │blc13_2bit_guppi_58406_23240_HIP20842_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │23 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:33:50│20:34:29│0:00:39.041│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │blc13_2bit_guppi_58406_23240_HIP20842_│ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │24 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:34:29│20:35:17│0:00:48.044│COMPLETE│ │ │ │ │ -device 0 -nobs │13ap08ab.27985.20931.12.39.96.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │25 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:35:17│20:36:05│0:00:48.036│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │13ap08ab.27985.20931.12.39.96.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │26 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:36:05│20:36:53│0:00:48.041│COMPLETE│ │ │ │ │ -device 0 -nobs │16dc18ab.471.25016.10.37.208.wu │ ├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤ │27 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:36:53│20:37:41│0:00:48.039│COMPLETE│ │ │ │ │ -device 0 -nobs -pfb 32 │16dc18ab.471.25016.10.37.208.wu │ └────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘ I think I will experiment with -unroll values of 2 next. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
BoincSpy Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353 |
Hi Everyone, I have installed the cuda apps from TBAR and get the following error: My configuration Fedora 29, BOINC client 7.4.12. NVIDIA driver : 418.56 Observations: If the application choses the GTX 660 I get the error if it uses the RTX 2070 it works okay... Is there anything I can try to fix this error ( IE use the stock cuda driver for the GTX 660 ) ? I wont be able to test for today as I hit by BOINC cuda limit for the day.. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536 computeCap 7.5, multiProcs 36 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 660 Ti, 1999 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 7 pciBusID = 1, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 660 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 660 Ti Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 setiathome v8 enhanced x41p_V0.98b1, Cuda 9.00 special Modifications done by petri33, compiled by TBar Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.440868 Sigma 3 Cuda error '(cudaBindTextureToArray( dev_gauss_dof_lcgf_cache_TEX, dev_gauss_dof_lcgf_cache, channelDesc))' in file 'cuda/cudaAcc_gaussfit.cu' in line 851 : invalid texture reference. </stderr_txt> ]]> |
Wiggo Send message Joined: 24 Jan 00 Posts: 34748 Credit: 261,360,520 RAC: 489 |
The GTX 660 isn't supported by the app. You need at least a Maxwell based card or later for it to run (I'd just ignore or remove the GTX 660); Cheers. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error..... I think I will experiment with -unroll values of 2 next.I've never seen any improvements using any setting other than nobs either. Even back in the Windows days I never saw any difference either, except, Maybe if I turned the setting up All-the-Way I could see half a second,. But then I got more than a few Inconclusive results with it set that high. I spent two weeks trying to get a Speed improvement on My Maxwell GPUs, finally gave up after testing every conceivable Toolkit. On the BLCs the 750 Ti gives better times, the 950/960 don't, and the 970 is a few seconds slower. I don't have a 980, but considering the trajectory, it should be slower. Fortunately, they all should be faster on the Arecibo tasks, especially the Arecibo VLARs... shame most of the Arecibo tasks are gone now. With any luck, the improvements might be enough to get the App on Beta. We'll see. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I just tested offline with -unroll 2 argument and except for one task which was 3 seconds faster, there was no difference in run_times. Differences again of just hundredths of a second. Measurement error likely. [Edit] I tested on the RTX 2080. Didn't try on the Pascal cards. I tested on a mix of Arecibo standard AR, Arecibo VLAR's and BLC tasks and didn't see any differences. The only difference I found was on the blc04_2bit_blc04_guppi_57898_17662_DIAG_KIC8462852_OFF_0020.11514.409.17.26.12.vlar.wu which processed 1 second faster with -unroll 2 parameter compared to stock -unroll 1 in actual time. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
yup, as previously mentioned, your 660ti is not supported unfortunately. Best to move that 660ti into another system and run a different app on it (like the stock SoG app), and run petri's special app on the 2070. that will be best. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
BoincSpy Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353 |
What I did is add the following to cc_config.xml. Will see if this works tomorrow... <exclude_gpu> <url>setiathome.berkeley.edu</url> <device_num>1</device_num> <type>NVIDIA</type> <app>setiathome_v8</app> </exclude_gpu> </options> It would be nice if the <app> would be the specialized app name and then use the stock cuda app otherwise... I am not complaining to badly as the RTX 2070 will process a typical WU in about 1 minute and 10 seconds. One question, is there away to increase the number of cuda units in a day? Cheers, Bob |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Assume you are referring to your previous post about reaching a task limit for the day? That would be because of your errored tasks trying to run the 660 Ti on the special app. Once you turn in completed and validated tasks for the 2070, BOINC will send more work to your host. Your steady state cache size will be 100 cpu tasks and 100 gpu tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
New petri binary is now running here :) . . Hi Laurent, . . Those numbers surprised me, I have better times on Arecibo tasks than Blc(32) on all 4 boxes and GPU types (GTX1050, 1050ti, 970 and 1060), but I am still running v0.97 on the 2 Linux boxes and SoG on the Windows boxes (the x2 indicates running 2 concurrent tasks). Perhaps where you say Arecibo they are VLAR tasks? 1050 (SoG x 2) : Arecibo => 20 to 21 mins : Blc32 => 28 to 29 mins 1060 (SoG x 2) : Arecibo => 11 to 13 mins : Blc32 => 15 to 17 mins 1050ti (0.97) : Arecibo => 235 to 245 secs : Blc32 => 260 to 265 secs 970 . . . (0.97) : Arecibo => 135 to 140 secs : Blc32 => 160 to 165 secs Stephen ? ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Trying "-nobs -pfb 6" . . That works for me on the 1050ti. For a 1060 3GB I would suggest -pfb 9. Stephen . . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
1080 acts more like the lesser cards with a 10-15% increase, but the 1080ti gets a larger percentage increase, 20+%. even though these cards are pretty similar in design, same architecture, same memory type. I would expect the same percentage increase. Probably with the extra compute units, the previous memory I/O requirements resulted in a bigger impact than on cards with less compute units. So those reductions in memory requirement & I/O gave a much bigger improvement in performance than occurred on the lesser cards. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . 1060 3GB has 9 Cus but it is the 1060 6GB that has 10 CUs. Stephen . . |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error. . . As Ian said, Petri stated that for 0.98 10.1 there was negigible difference between unroll 1 and unroll 2, so your data will help clarify that. But that would indicate that higher values are relatively meaningless as your posted data verifies. Stephen . . |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error. The key is what Petri said in this quote. The pulse find algorithm search stage was completely rewritten . It does not need any buffer for temporary values. That is why playing around with -pfb and -unroll is fruitless. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I am now getting a significant # of inconclusives on a daily basis on both of my Linux boxes while running CUDA 10.1 Is this the price of doing business? These are on the CPU app. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.