Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 80 · 81 · 82 · 83
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks Brent, missed that earlier. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yes, and on My machines it appears to be just as much of a Red Herring as the other settings clustered with it in the code; //defaults for long pulsefinds, need to be set at runtime in InitConfig() int pfBlocksPerSM = 0; int pfPeriodsPerLaunch = 0; int pfPeriodsPerLaunch2 = 1; int g_pfFftLimit = 64; int unroll = 0; bool chirpInAdvance = false; extern APP_INIT_DATA app_init_data; void initConfig(int pcibusid,int pcislotid) { // mbcuda.cfg only used for Windows at the moment, curently uses Windows API // should switch to standard C functions, if Linux/MAC needs similar functrionality // Current non-Windows path just sets sensible defaults. That's why I don't use any of those other settings either, to see any difference at all, you have to set it SO High that you start receiving Many Inconclusives. Another setting that doesn't appear to do much, except on the 750 Ti, is S_LIMIT in cudaAcc_pulsefind.cu. On the 750 Ti the setting 255 reduces the times down to around 5 minutes on the BLC tasks, around 4.5 minutes on the lower numbered BLC tasks. On all My other GPUs, there doesn't appear to be any difference between 255, 319, and 383. It appears the Server sent a couple Arecibo shorties, and as already stated, they don't work, Validate state : Invalid Hopefully Petri is close to finding a solution for this one troublesome problem. It appears the old problem of Bad Best Pulse is still around also, but that one is not so troublesome as the failed shorties. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here you go. App setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 was compiled with the old default setting of g_pfFftLimit = 512 setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 was compiled with the setting g_pfFftLimit = 64 Both Apps used the old S_LIMIT 255 instead of 383, using a Standard GTX 750 Ti on an old Intel board in a x4 PCIe1 slot. It is a little faster on a board with PCIe2 and in a x16 slot, on my other HP board it's consistently right around 4.5 minutes. Starting benchmark run... ---------------------------------------------------------------- Listing wu-file(s) in /testWUs : blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu reference_work_unit_r3215.wu Listing executable(s) in /APPS : setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 Listing executable in /REF_APPS : setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 ---------------------------------------------------------------- Current WU: blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu ---------------------------------------------------------------- Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1 Best scores written Out file closed Cuda free done Cuda device reset done Elapsed Time: ....................... 283 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1 Best scores written Out file closed Cuda free done Cuda device reset done Elapsed Time : ...................... 284 seconds Speed compared to default : ......... 99 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% ---------------------------------------------------------------- Done with blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu ==================================================================== Current WU: reference_work_unit_r3215.wu ---------------------------------------------------------------- Running default app with command :... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda91 -nobs -device 1 Best scores written Out file closed Cuda free done Cuda device reset done Elapsed Time: ....................... 100 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_V0.96_x86_64-pc-linux-gnu_cuda92 -nobs -device 1 Best scores written Out file closed Cuda free done Cuda device reset done Elapsed Time : ...................... 99 seconds Speed compared to default : ......... 101 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% ---------------------------------------------------------------- Done with reference_work_unit_r3215.wuI'd say that's pretty much inconclusive. Which is what I get when testing the other settings, pfBlocksPerSM, and pfPeriodsPerLaunch. The S_LIMIT setting does make a difference with the 750 Ti though, 4.5 minutes is very good considering what it takes in OpenCL. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
@Petri, can you explain simply what the new tuning parameter in the latest 0.96 Beta does. I am referring to: The pfl sets a FftLength based limit for which implementation of pulse find algorithm to use. Now that I have found out the default value works well, ill probably remove the flag. It was for testing purposes. The 0.95 and 0.96 got a 30% speed gain from reorganizing the 'folding' part of the pulse find process. The reorganization reduced memory writes and reads by 50%. 0.96 is a bit slower than 0.95 but gives more accurate results with noise bombs. The 0.96 is still not ready. There is possibly lurking something bad with Arecibo shorties and some NV cards. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks Petri. Nice to see some significant progress is being made. Hope the trend continues and you can fix the lurking bugs before your vacation runs out. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha Right now zi3v is listed as PetriR_raw3. With Jason MIA, who are we going to get to post the next code? It would be nice to change PetriR_raw3 to maybe PetriR_zi3v, and just add a new folder with the next release in it, maybe PetriR_V0.97. Providing of course, V0.97 fixes the Arecibo shorty problem and doesn't create any new problems. In case you missed it, there is a Mac in the Top Ten at SETI.... |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha Looking at your MAC in FreeDC stats seems like you got a 30% boost a week ago. Your MAC will be at #8 in no time with 135-140 000 RAC. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Ask whoever is in charge of the Seti SVN repository to give you access to the repository for a FTP upload so you can make changes to the branches. I just had Arkayn give me access to the CA file directory structure so I could upload the missing All-in-One files of yours so that the download links work again. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I was just thinking about the source code at https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha . . Aarrrggghhh! The end of the world must be near ... :) . . OK, so when it lists an Intel processor machine with an OS of Darwin that is a Mac then? Stephen :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.