Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 80 · 81 · 82 · 83

AuthorMessage
Profile Brent Norman Special Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2466
Credit: 290,092,782
RAC: 689,022
Canada
Message 1947748 - Posted: 3 Aug 2018, 20:01:23 UTC - in response to Message 1947741.  

Petri did mention the -pfl flag before in https://setiathome.berkeley.edu/forum_thread.php?id=78569&postid=1944686
<snip>
Could you try with a new flag: -pfl 64
It sets a splitting point to pulse finding which one of the internal functions to use.
I find the -pfl 64 give faster run times compared to default (no flag set equals to -pfl 512). I'll change that in confsettings.cpp to 64 in the future.
ID: 1947748 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5850
Credit: 411,414,585
RAC: 1,052,417
United States
Message 1947751 - Posted: 3 Aug 2018, 20:22:30 UTC - in response to Message 1947748.  

Thanks Brent, missed that earlier.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1947751 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4412
Credit: 283,229,046
RAC: 487,708
United States
Message 1947753 - Posted: 3 Aug 2018, 20:41:15 UTC - in response to Message 1947748.  

Yes, and on My machines it appears to be just as much of a Red Herring as the other settings clustered with it in the code;
//defaults for long pulsefinds, need to be set at runtime in InitConfig()
int pfBlocksPerSM = 0;
int pfPeriodsPerLaunch = 0;
int pfPeriodsPerLaunch2 = 1;
int g_pfFftLimit = 64;
int unroll = 0;
bool chirpInAdvance = false;

extern APP_INIT_DATA app_init_data;

void initConfig(int pcibusid,int pcislotid)
{
// mbcuda.cfg only used for Windows at the moment, curently uses Windows API
// should switch to standard C functions, if Linux/MAC needs similar functrionality
// Current non-Windows path just sets sensible defaults.

That's why I don't use any of those other settings either, to see any difference at all, you have to set it SO High that you start receiving Many Inconclusives.

Another setting that doesn't appear to do much, except on the 750 Ti, is S_LIMIT in cudaAcc_pulsefind.cu. On the 750 Ti the setting 255 reduces the times down to around 5 minutes on the BLC tasks, around 4.5 minutes on the lower numbered BLC tasks. On all My other GPUs, there doesn't appear to be any difference between 255, 319, and 383.

It appears the Server sent a couple Arecibo shorties, and as already stated, they don't work, Validate state : Invalid
Hopefully Petri is close to finding a solution for this one troublesome problem. It appears the old problem of Bad Best Pulse is still around also, but that one is not so troublesome as the failed shorties.
ID: 1947753 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4412
Credit: 283,229,046
RAC: 487,708
United States
Message 1947815 - Posted: 4 Aug 2018, 5:03:40 UTC - in response to Message 1947751.  

Here you go.
App setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 was compiled with the old default setting of g_pfFftLimit = 512
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 was compiled with the setting g_pfFftLimit = 64
Both Apps used the old S_LIMIT 255 instead of 383, using a Standard GTX 750 Ti on an old Intel board in a x4 PCIe1 slot.
It is a little faster on a board with PCIe2 and in a x16 slot, on my other HP board it's consistently right around 4.5 minutes.

Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
reference_work_unit_r3215.wu

Listing executable(s) in /APPS :
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92

Listing executable in /REF_APPS :
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91
----------------------------------------------------------------
Current WU: blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
----------------------------------------------------------------
Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time: ....................... 283 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time : ...................... 284 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 100.0%
----------------------------------------------------------------
Done with blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
====================================================================
Current WU: reference_work_unit_r3215.wu
----------------------------------------------------------------
Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time: ....................... 100 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time : ...................... 99 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 100.0%
----------------------------------------------------------------
Done with reference_work_unit_r3215.wu
I'd say that's pretty much inconclusive. Which is what I get when testing the other settings, pfBlocksPerSM, and pfPeriodsPerLaunch.
The S_LIMIT setting does make a difference with the 750 Ti though, 4.5 minutes is very good considering what it takes in OpenCL.
ID: 1947815 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1615
Credit: 385,228,084
RAC: 372,860
Finland
Message 1947839 - Posted: 4 Aug 2018, 8:07:35 UTC - in response to Message 1947741.  

@Petri, can you explain simply what the new tuning parameter in the latest 0.96 Beta does. I am referring to:

{Using default pulse Fft limit (-pfl 64)} in the stderr.txt output for a task. This seems to be a new parameter not mentioned in the original x41zi or x41p_zi3v notes.


The pfl sets a FftLength based limit for which implementation of pulse find algorithm to use.
Now that I have found out the default value works well, ill probably remove the flag. It was for testing purposes.

The 0.95 and 0.96 got a 30% speed gain from reorganizing the 'folding' part of the pulse find process. The reorganization reduced memory writes and reads by 50%.
0.96 is a bit slower than 0.95 but gives more accurate results with noise bombs. The 0.96 is still not ready. There is possibly lurking something bad with Arecibo shorties and some NV cards.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1947839 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5850
Credit: 411,414,585
RAC: 1,052,417
United States
Message 1947843 - Posted: 4 Aug 2018, 8:35:37 UTC - in response to Message 1947839.  

Thanks Petri. Nice to see some significant progress is being made. Hope the trend continues and you can fix the lurking bugs before your vacation runs out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1947843 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4412
Credit: 283,229,046
RAC: 487,708
United States
Message 1948179 - Posted: 6 Aug 2018, 16:44:10 UTC

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....
ID: 1948179 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1615
Credit: 385,228,084
RAC: 372,860
Finland
Message 1948187 - Posted: 6 Aug 2018, 18:01:28 UTC - in response to Message 1948179.  

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....


Looking at your MAC in FreeDC stats seems like you got a 30% boost a week ago. Your MAC will be at #8 in no time with 135-140 000 RAC.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1948187 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 5850
Credit: 411,414,585
RAC: 1,052,417
United States
Message 1948188 - Posted: 6 Aug 2018, 18:12:30 UTC - in response to Message 1948179.  

Ask whoever is in charge of the Seti SVN repository to give you access to the repository for a FTP upload so you can make changes to the branches.

I just had Arkayn give me access to the CA file directory structure so I could upload the missing All-in-One files of yours so that the download links work again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1948188 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 3769
Credit: 88,562,839
RAC: 159,540
Australia
Message 1948295 - Posted: 7 Aug 2018, 3:01:23 UTC - in response to Message 1948179.  
Last modified: 7 Aug 2018, 3:08:19 UTC

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....


. . Aarrrggghhh! The end of the world must be near ... :)

. . OK, so when it lists an Intel processor machine with an OS of Darwin that is a Mac then?

Stephen

:)
ID: 1948295 · Report as offensive
Previous · 1 . . . 80 · 81 · 82 · 83

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.