High performance Linux clients at SETI

Message boards : Number crunching : High performance Linux clients at SETI
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 20 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990519 - Posted: 18 Apr 2019, 14:58:30 UTC - in response to Message 1990510.  



Thank you. I previously was not running -nobs on my test system because I was trying to maximize my cpu thread count. Since apparently I can't do that I will see if the above command line will work on a gtx 1060 3GB.
Trying "-nobs -pfb 6"
Tom


Try rather 8 or 16 (1060 have 1280 cores => 1280/128 = 10)


It appears to be running faster with -nobs -pfb 6 but it is the first time I have tried it on either machine.

My larger box is running a mix 0f gtx 1070Ti's and gtx 1060 3GB's. Do you have an idea of which -pbf might be better? Or is letting the default which (I hope) changes based on the video card?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1990519 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1990521 - Posted: 18 Apr 2019, 15:04:07 UTC - in response to Message 1990515.  

i guess i was most surprised with the difference between the 1080 and the 1080ti.

1080 acts more like the lesser cards with a 10-15% increase, but the 1080ti gets a larger percentage increase, 20+%. even though these cards are pretty similar in design, same architecture, same memory type. I would expect the same percentage increase.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1990521 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1990522 - Posted: 18 Apr 2019, 15:05:50 UTC - in response to Message 1990519.  



It appears to be running faster with -nobs -pfb 6 but it is the first time I have tried it on either machine.

My larger box is running a mix 0f gtx 1070Ti's and gtx 1060 3GB's. Do you have an idea of which -pbf might be better? Or is letting the default which (I hope) changes based on the video card?

Tom


if you don't know what you're doing, it's best to just leave default values.

Petri did mention in a previous post that you can try to use a larger unroll value. default is 1, try adding -unroll 2 to the command line.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1990522 · Report as offensive
W3Perl Project Donor
Volunteer tester

Send message
Joined: 29 Apr 99
Posts: 251
Credit: 3,696,783,867
RAC: 12,606
France
Message 1990531 - Posted: 18 Apr 2019, 16:32:08 UTC - in response to Message 1990519.  
Last modified: 18 Apr 2019, 16:33:53 UTC



It appears to be running faster with -nobs -pfb 6 but it is the first time I have tried it on either machine.

My larger box is running a mix 0f gtx 1070Ti's and gtx 1060 3GB's. Do you have an idea of which -pbf might be better? Or is letting the default which (I hope) changes based on the video card?

Tom


1060 => 10
1070 Ti => 19

So have a try with 16 !
You can use the benchmark test to check which value if the best for you.
On my GTX 1070, I don't see any difference between 16 and 32...so I keep 16 (1920/128 = 15 so this is what I expected).

using -nobs will give you only a few seconds extra.
ID: 1990531 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990548 - Posted: 18 Apr 2019, 20:41:32 UTC

I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error.

┌────┬────┬───┬────────────────────────────────────────────────────────────┬────────┬────────┬───────────┬────────┐
│Job#│Slot│xPU│app_name │ start │ finish │tot_time │ state │
│ │ │ │app_args │wu_name │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│0 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:15:31│20:16:10│0:00:39.035│COMPLETE│
│ │ │ │ -device 0 -nobs │21no18aa.19740.24238.14.41.27.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│1 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:16:10│20:16:49│0:00:39.037│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │21no18aa.19740.24238.14.41.27.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│2 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:16:49│20:17:52│0:01:03.052│COMPLETE│
│ │ │ │ -device 0 -nobs │10jn08ab.12748.230459.14.41.51.vlar.wu│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│3 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:17:52│20:18:55│0:01:03.060│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │10jn08ab.12748.230459.14.41.51.vlar.wu│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│4 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:18:55│20:19:58│0:01:03.045│COMPLETE│
│ │ │ │ -device 0 -nobs │12fe07ac.6929.16025.15.42.86.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│5 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:19:58│20:21:01│0:01:03.047│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │12fe07ac.6929.16025.15.42.86.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│6 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:21:01│20:21:55│0:00:54.042│COMPLETE│
│ │ │ │ -device 0 -nobs │blc04_2bit_blc04_guppi_57898_17662_DIA│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│7 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:21:55│20:22:49│0:00:54.041│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_blc04_guppi_57898_17662_DIA│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│8 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:22:49│20:23:49│0:01:00.048│COMPLETE│
│ │ │ │ -device 0 -nobs │08ja07aa.588.24203.13.40.58.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│9 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:23:49│20:24:49│0:01:00.061│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │08ja07aa.588.24203.13.40.58.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│10 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:24:49│20:25:28│0:00:39.037│COMPLETE│
│ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_08930_HIP74235_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│11 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:25:28│20:26:07│0:00:39.040│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_08930_HIP74235_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│12 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:26:07│20:26:46│0:00:39.031│COMPLETE│
│ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_09361_HIP74981_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│13 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:26:46│20:27:25│0:00:39.039│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_09361_HIP74981_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│14 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:27:25│20:28:28│0:01:03.055│COMPLETE│
│ │ │ │ -device 0 -nobs │07mr07ai.12583.49160.3.30.231.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│15 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:28:28│20:29:29│0:01:00.053│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │07mr07ai.12583.49160.3.30.231.vlar.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│16 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:29:29│20:30:08│0:00:39.038│COMPLETE│
│ │ │ │ -device 0 -nobs │blc13_2bit_guppi_58405_85972_GJ687_002│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│17 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:30:08│20:30:47│0:00:39.029│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc13_2bit_guppi_58405_85972_GJ687_002│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│18 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:30:47│20:31:23│0:00:36.029│COMPLETE│
│ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_09694_HIP74284_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│19 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:31:23│20:31:59│0:00:36.027│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_09694_HIP74284_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│20 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:31:59│20:32:35│0:00:36.033│COMPLETE│
│ │ │ │ -device 0 -nobs │blc04_2bit_guppi_57976_10365_HIP74315_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│21 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:32:35│20:33:11│0:00:36.033│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc04_2bit_guppi_57976_10365_HIP74315_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│22 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:33:11│20:33:50│0:00:39.030│COMPLETE│
│ │ │ │ -device 0 -nobs │blc13_2bit_guppi_58406_23240_HIP20842_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│23 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:33:50│20:34:29│0:00:39.041│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │blc13_2bit_guppi_58406_23240_HIP20842_│
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│24 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:34:29│20:35:17│0:00:48.044│COMPLETE│
│ │ │ │ -device 0 -nobs │13ap08ab.27985.20931.12.39.96.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│25 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:35:17│20:36:05│0:00:48.036│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │13ap08ab.27985.20931.12.39.96.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│26 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:36:05│20:36:53│0:00:48.041│COMPLETE│
│ │ │ │ -device 0 -nobs │16dc18ab.471.25016.10.37.208.wu │
├────┼────┼───┼────────────────────────────────────────────────────────────┼────────┬────────┬───────────┬────────┤
│27 │ NA │GPU│setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101 │20:36:53│20:37:41│0:00:48.039│COMPLETE│
│ │ │ │ -device 0 -nobs -pfb 32 │16dc18ab.471.25016.10.37.208.wu │
└────┴────┴───┴────────────────────────────────────────────────────────────┴──────────────────────────────────────┘

I think I will experiment with -unroll values of 2 next.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990548 · Report as offensive
BoincSpy
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 146
Credit: 124,775,115
RAC: 353
Canada
Message 1990549 - Posted: 18 Apr 2019, 20:42:23 UTC

Hi Everyone,

I have installed the cuda apps from TBAR and get the following error:

My configuration Fedora 29, BOINC client 7.4.12. NVIDIA driver : 418.56

Observations: If the application choses the GTX 660 I get the error if it uses the RTX 2070 it works okay... Is there anything I can try to fix this error ( IE use the stock cuda driver for the GTX 660 ) ? I wont be able to test for today as I hit by BOINC cuda limit for the day..

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce RTX 2070, 7952 MiB, regsPerBlock 65536
computeCap 7.5, multiProcs 36
pciBusID = 2, pciSlotID = 0
Device 2: GeForce GTX 660 Ti, 1999 MiB, regsPerBlock 65536
computeCap 3.0, multiProcs 7
pciBusID = 1, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
Device 2: GeForce GTX 660 Ti is okay
SETI@home using CUDA accelerated device GeForce GTX 660 Ti
Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1

setiathome v8 enhanced x41p_V0.98b1, Cuda 9.00 special
Modifications done by petri33, compiled by TBar

Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is : 0.440868
Sigma 3
Cuda error '(cudaBindTextureToArray( dev_gauss_dof_lcgf_cache_TEX, dev_gauss_dof_lcgf_cache, channelDesc))' in file 'cuda/cudaAcc_gaussfit.cu' in line 851 : invalid texture reference.

</stderr_txt>
]]>
ID: 1990549 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34748
Credit: 261,360,520
RAC: 489
Australia
Message 1990551 - Posted: 18 Apr 2019, 20:59:20 UTC

The GTX 660 isn't supported by the app.

You need at least a Maxwell based card or later for it to run (I'd just ignore or remove the GTX 660);

Cheers.
ID: 1990551 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1990553 - Posted: 18 Apr 2019, 21:12:01 UTC - in response to Message 1990548.  

I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error..... I think I will experiment with -unroll values of 2 next.
I've never seen any improvements using any setting other than nobs either. Even back in the Windows days I never saw any difference either, except, Maybe if I turned the setting up All-the-Way I could see half a second,. But then I got more than a few Inconclusive results with it set that high.

I spent two weeks trying to get a Speed improvement on My Maxwell GPUs, finally gave up after testing every conceivable Toolkit. On the BLCs the 750 Ti gives better times, the 950/960 don't, and the 970 is a few seconds slower. I don't have a 980, but considering the trajectory, it should be slower. Fortunately, they all should be faster on the Arecibo tasks, especially the Arecibo VLARs... shame most of the Arecibo tasks are gone now. With any luck, the improvements might be enough to get the App on Beta.
We'll see.
ID: 1990553 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990555 - Posted: 18 Apr 2019, 21:14:24 UTC
Last modified: 18 Apr 2019, 21:21:14 UTC

I just tested offline with -unroll 2 argument and except for one task which was 3 seconds faster, there was no difference in run_times. Differences again of just hundredths of a second. Measurement error likely.

[Edit] I tested on the RTX 2080. Didn't try on the Pascal cards. I tested on a mix of Arecibo standard AR, Arecibo VLAR's and BLC tasks and didn't see any differences.

The only difference I found was on the blc04_2bit_blc04_guppi_57898_17662_DIAG_KIC8462852_OFF_0020.11514.409.17.26.12.vlar.wu which processed 1 second faster with -unroll 2 parameter compared to stock -unroll 1 in actual time.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990555 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1990556 - Posted: 18 Apr 2019, 21:27:19 UTC - in response to Message 1990549.  

yup, as previously mentioned, your 660ti is not supported unfortunately. Best to move that 660ti into another system and run a different app on it (like the stock SoG app), and run petri's special app on the 2070. that will be best.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1990556 · Report as offensive
BoincSpy
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 146
Credit: 124,775,115
RAC: 353
Canada
Message 1990558 - Posted: 18 Apr 2019, 21:44:44 UTC - in response to Message 1990556.  

What I did is add the following to cc_config.xml. Will see if this works tomorrow...

<exclude_gpu>
<url>setiathome.berkeley.edu</url>
<device_num>1</device_num>
<type>NVIDIA</type>
<app>setiathome_v8</app>
</exclude_gpu>

</options>

It would be nice if the <app> would be the specialized app name and then use the stock cuda app otherwise... I am not complaining to badly as the RTX 2070 will process a typical WU in about 1 minute and 10 seconds.

One question, is there away to increase the number of cuda units in a day?

Cheers,
Bob
ID: 1990558 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990560 - Posted: 18 Apr 2019, 21:51:12 UTC - in response to Message 1990558.  

Assume you are referring to your previous post about reaching a task limit for the day? That would be because of your errored tasks trying to run the 660 Ti on the special app. Once you turn in completed and validated tasks for the 2070, BOINC will send more work to your host. Your steady state cache size will be 100 cpu tasks and 100 gpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990560 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990565 - Posted: 18 Apr 2019, 22:51:14 UTC - in response to Message 1990504.  

New petri binary is now running here :)
Average values :
750 TI => blc : 344 sec , arecibo : 467 sec
1050 Ti => blc : 262 sec, arecibo : 344 sec
1070 => blc : 95 sec, arecibo : 244 sec
1080 => blc : 79 sec, arecibo : 108 sec
Thanks again for such a nice piece of software :)


. . Hi Laurent,

. . Those numbers surprised me, I have better times on Arecibo tasks than Blc(32) on all 4 boxes and GPU types (GTX1050, 1050ti, 970 and 1060), but I am still running v0.97 on the 2 Linux boxes and SoG on the Windows boxes (the x2 indicates running 2 concurrent tasks). Perhaps where you say Arecibo they are VLAR tasks?

1050 (SoG x 2) : Arecibo => 20 to 21 mins : Blc32 => 28 to 29 mins
1060 (SoG x 2) : Arecibo => 11 to 13 mins : Blc32 => 15 to 17 mins

1050ti (0.97) : Arecibo => 235 to 245 secs : Blc32 => 260 to 265 secs
970 . . . (0.97) : Arecibo => 135 to 140 secs : Blc32 => 160 to 165 secs

Stephen

? ?
ID: 1990565 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990567 - Posted: 18 Apr 2019, 22:59:23 UTC - in response to Message 1990505.  

Trying "-nobs -pfb 6"
Tom


. . That works for me on the 1050ti. For a 1060 3GB I would suggest -pfb 9.

Stephen

. .
ID: 1990567 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1990568 - Posted: 18 Apr 2019, 23:04:43 UTC - in response to Message 1990521.  
Last modified: 18 Apr 2019, 23:10:02 UTC

1080 acts more like the lesser cards with a 10-15% increase, but the 1080ti gets a larger percentage increase, 20+%. even though these cards are pretty similar in design, same architecture, same memory type. I would expect the same percentage increase.

Probably with the extra compute units, the previous memory I/O requirements resulted in a bigger impact than on cards with less compute units. So those reductions in memory requirement & I/O gave a much bigger improvement in performance than occurred on the lesser cards.
Grant
Darwin NT
ID: 1990568 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990569 - Posted: 18 Apr 2019, 23:06:46 UTC - in response to Message 1990531.  


1060 => 10
1070 Ti => 19

using -nobs will give you only a few seconds extra.


. . 1060 3GB has 9 Cus but it is the 1060 6GB that has 10 CUs.

Stephen

. .
ID: 1990569 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1990573 - Posted: 18 Apr 2019, 23:12:15 UTC - in response to Message 1990548.  

I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error.

I think I will experiment with -unroll values of 2 next.


. . As Ian said, Petri stated that for 0.98 10.1 there was negigible difference between unroll 1 and unroll 2, so your data will help clarify that. But that would indicate that higher values are relatively meaningless as your posted data verifies.

Stephen

. .
ID: 1990573 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990579 - Posted: 19 Apr 2019, 0:14:38 UTC - in response to Message 1990573.  

I decided to test offline the -pfb argument for values of 16 and 32. Observed absolutely no difference running with or without the argument. The difference in run_times were hundredths of a second. Likely measurement error.

I think I will experiment with -unroll values of 2 next.


. . As Ian said, Petri stated that for 0.98 10.1 there was negigible difference between unroll 1 and unroll 2, so your data will help clarify that. But that would indicate that higher values are relatively meaningless as your posted data verifies.

Stephen

. .

The key is what Petri said in this quote.

The pulse find algorithm search stage was completely rewritten . It does not need any buffer for temporary values.
The scan is fully unrolled to all SM units but does not require any memory to store data.


That is why playing around with -pfb and -unroll is fruitless.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990579 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1990608 - Posted: 19 Apr 2019, 4:06:53 UTC

I am now getting a significant # of inconclusives on a daily basis on both of my Linux boxes while running CUDA 10.1

Is this the price of doing business? These are on the CPU app.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1990608 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1990613 - Posted: 19 Apr 2019, 6:00:48 UTC - in response to Message 1990608.  

What has the gpu app to do with inconclusives on the cpu? Either the computer is stable running cpu tasks or isn't. Either the computer is stable running gpu tasks or it isn't. If running both at the same time is causing inconclusives you need to back down settings on both parts of the computer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1990613 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 20 · Next

Message boards : Number crunching : High performance Linux clients at SETI


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.