Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 80 · 81 · 82 · 83

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947751 - Posted: 3 Aug 2018, 20:22:30 UTC - in response to Message 1947748.  

Thanks Brent, missed that earlier.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947751 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1947753 - Posted: 3 Aug 2018, 20:41:15 UTC - in response to Message 1947748.  

Yes, and on My machines it appears to be just as much of a Red Herring as the other settings clustered with it in the code;
//defaults for long pulsefinds, need to be set at runtime in InitConfig()
int pfBlocksPerSM = 0;
int pfPeriodsPerLaunch = 0;
int pfPeriodsPerLaunch2 = 1;
int g_pfFftLimit = 64;
int unroll = 0;
bool chirpInAdvance = false;

extern APP_INIT_DATA app_init_data;

void initConfig(int pcibusid,int pcislotid)
{
// mbcuda.cfg only used for Windows at the moment, curently uses Windows API
// should switch to standard C functions, if Linux/MAC needs similar functrionality
// Current non-Windows path just sets sensible defaults.

That's why I don't use any of those other settings either, to see any difference at all, you have to set it SO High that you start receiving Many Inconclusives.

Another setting that doesn't appear to do much, except on the 750 Ti, is S_LIMIT in cudaAcc_pulsefind.cu. On the 750 Ti the setting 255 reduces the times down to around 5 minutes on the BLC tasks, around 4.5 minutes on the lower numbered BLC tasks. On all My other GPUs, there doesn't appear to be any difference between 255, 319, and 383.

It appears the Server sent a couple Arecibo shorties, and as already stated, they don't work, Validate state : Invalid
Hopefully Petri is close to finding a solution for this one troublesome problem. It appears the old problem of Bad Best Pulse is still around also, but that one is not so troublesome as the failed shorties.
ID: 1947753 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1947815 - Posted: 4 Aug 2018, 5:03:40 UTC - in response to Message 1947751.  

Here you go.
App setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 was compiled with the old default setting of g_pfFftLimit = 512
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 was compiled with the setting g_pfFftLimit = 64
Both Apps used the old S_LIMIT 255 instead of 383, using a Standard GTX 750 Ti on an old Intel board in a x4 PCIe1 slot.
It is a little faster on a board with PCIe2 and in a x16 slot, on my other HP board it's consistently right around 4.5 minutes.

Starting benchmark run...
----------------------------------------------------------------
Listing wu-file(s) in /testWUs :
blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
reference_work_unit_r3215.wu

Listing executable(s) in /APPS :
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92

Listing executable in /REF_APPS :
setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91
----------------------------------------------------------------
Current WU: blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
----------------------------------------------------------------
Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time: ....................... 283 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time : ...................... 284 seconds
Speed compared to default : ......... 99 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 100.0%
----------------------------------------------------------------
Done with blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu
====================================================================
Current WU: reference_work_unit_r3215.wu
----------------------------------------------------------------
Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time: ....................... 100 seconds
----------------------------------------------------------------
Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1
Best scores written
Out file closed
Cuda free done
Cuda device reset done
Elapsed Time : ...................... 99 seconds
Speed compared to default : ......... 101 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 100.0%
----------------------------------------------------------------
Done with reference_work_unit_r3215.wu
I'd say that's pretty much inconclusive. Which is what I get when testing the other settings, pfBlocksPerSM, and pfPeriodsPerLaunch.
The S_LIMIT setting does make a difference with the 750 Ti though, 4.5 minutes is very good considering what it takes in OpenCL.
ID: 1947815 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1947839 - Posted: 4 Aug 2018, 8:07:35 UTC - in response to Message 1947741.  

@Petri, can you explain simply what the new tuning parameter in the latest 0.96 Beta does. I am referring to:

{Using default pulse Fft limit (-pfl 64)} in the stderr.txt output for a task. This seems to be a new parameter not mentioned in the original x41zi or x41p_zi3v notes.


The pfl sets a FftLength based limit for which implementation of pulse find algorithm to use.
Now that I have found out the default value works well, ill probably remove the flag. It was for testing purposes.

The 0.95 and 0.96 got a 30% speed gain from reorganizing the 'folding' part of the pulse find process. The reorganization reduced memory writes and reads by 50%.
0.96 is a bit slower than 0.95 but gives more accurate results with noise bombs. The 0.96 is still not ready. There is possibly lurking something bad with Arecibo shorties and some NV cards.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1947839 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1947843 - Posted: 4 Aug 2018, 8:35:37 UTC - in response to Message 1947839.  

Thanks Petri. Nice to see some significant progress is being made. Hope the trend continues and you can fix the lurking bugs before your vacation runs out.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1947843 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1948179 - Posted: 6 Aug 2018, 16:44:10 UTC

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....
ID: 1948179 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1948187 - Posted: 6 Aug 2018, 18:01:28 UTC - in response to Message 1948179.  

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....


Looking at your MAC in FreeDC stats seems like you got a 30% boost a week ago. Your MAC will be at #8 in no time with 135-140 000 RAC.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1948187 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1948188 - Posted: 6 Aug 2018, 18:12:30 UTC - in response to Message 1948179.  

Ask whoever is in charge of the Seti SVN repository to give you access to the repository for a FTP upload so you can make changes to the branches.

I just had Arkayn give me access to the CA file directory structure so I could upload the missing All-in-One files of yours so that the download links work again.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1948188 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1948295 - Posted: 7 Aug 2018, 3:01:23 UTC - in response to Message 1948179.  
Last modified: 7 Aug 2018, 3:08:19 UTC

I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha
Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code?
It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97.
Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems.
In case you missed it, there is a Mac in the Top Ten at SETI....


. . Aarrrggghhh! The end of the world must be near ... :)

. . OK, so when it lists an Intel processor machine with an OS of Darwin that is a Mac then?

Stephen

:)
ID: 1948295 · Report as offensive
Previous · 1 . . . 80 · 81 · 82 · 83

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.