Message boards :
Number crunching :
The GTX750(Ti) Thread
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp You are using local radix 8 change to 16 on your 8800. On this old card try only oclfft_plan 256 16 256 without any other switches. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Anything above uroll 18 has no effect using oclfft_plan speeds up by 15%. FFT kernels are processed in 8 point fft kernels by default. Using different fft kernel planning can speed up processing significantly. In most cases 16 point fft kernels are fastest for AstroPulse V7. Example -use_sleep -unroll 18 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1 With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp I tried that, I received; WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiathome.berkeley.edu\AP_clFFTplan_GeForce8800GT_32768_gr256_lr16_wg256_tw0_r2721.bin_26658, continue with recompile... The only setting that will run is -oclfft_plan 256 8 256. It runs but doesn't 'work'. I was using -unroll 6 -oclFFT_plan 256 16 256 -ffa_block 4096 -ffa_block_fetch 2048 |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
-use_sleep -unroll 16 -oclfft_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 4 1 -tune 2 64 4 1 -hp I think its the different memory managment. I had no chance to test on such a card. Can you run a offline bench on this card ? I would give you some param to try. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I think its the different memory managment. I suppose I could. I really wasn't going to keep using this machine though. I thought I would just run it until the AP_v7 credits came up a bit. Just trying to add some slow results to CreditFew hoping it might help. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I think its the different memory managment. You can just download from APV7 thread. I have one configured on my cloud. Go to benchcfg.txt. edit 4 entries on the bottom. AP7_win_x86_SSE2_OpenCL_NV_r2721.exe AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 64 AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 128 AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 256 On the rest of the lines just put a "#" in front. This deactivates the other params. Run ap_bench213.cmd Result is stored at testdata folder. Send me the result file via email. Its called ap_hostname_date_time.txt. Use the email addy on my website. With each crime and every kindness we birth our future. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Up until recently, on my old xw9400 I had been running a mixed GPU configuration with a GTX660, GTX650 and two GTX640s. Purely by coincidence, in the 2 weeks leading up to the AP v7 rollout, I had gradually upgraded that host to two GTX660s and two GTX750Tis, still mixed, but a bit more closely matched. Under the old configuration and running AP v6, I had made what I felt was conservative use of the ap_cmdline capabilities, with just a simple: -unroll 6 -ffa_block 2048 -ffa_block_fetch 1024 -hp This seemed to be accommodate the compute unit disparities between the 3 different GPU types I was running and I was content with it. Since AP v7 rolled out the day after I made the last upgrade to that machine, I thought I might just let it go back to the defaults, inasmuch as AP v7 was supposed to adjust those parameters according to the specific GPU it was running on. However, I've noticed that on my T7400, which runs stock, AP v7 is assigning the equivalent of "-unroll 6 -ffa_block 1536 -ffa_block_fetch 768" to a GTX660 which is a matching unit to one of the ones in the xw9400, with 6 CUs. These appear to be even more conservative values than what I'm already running on the xw9400, so I left the ap_cmdline alone for the time being. Anyway, now that the GPUs are more closely matched on that host (1 with 6 CUs and 3 with 5 CUs), and seeing that the AP v7 default values appear to be even more conservative than I am, what might a recommended ap_cmdline for that host look like? (For that matter, what values might work for the T7400, which currently has a mixed bag of GTX780, GTX670, and GTX660?) |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Up until recently, on my old xw9400 I had been running a mixed GPU configuration with a GTX660, GTX650 and two GTX640s. Purely by coincidence, in the 2 weeks leading up to the AP v7 rollout, I had gradually upgraded that host to two GTX660s and two GTX750Tis, still mixed, but a bit more closely matched. Since the slowest GPU in that host has 6 CU`s you can use. -use_sleep -unroll 12 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 Lets just run for a few days and maybe you can increas a little bit more. With each crime and every kindness we birth our future. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Up until recently, on my old xw9400 I had been running a mixed GPU configuration with a GTX660, GTX650 and two GTX640s. Purely by coincidence, in the 2 weeks leading up to the AP v7 rollout, I had gradually upgraded that host to two GTX660s and two GTX750Tis, still mixed, but a bit more closely matched. I assume you mean that for the T7400. How about for the xw9400, with the two GTX660s (one with 6 CUs and one with 5 CUs) and two GTX750Tis (both with 5 CUs)? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Up until recently, on my old xw9400 I had been running a mixed GPU configuration with a GTX660, GTX650 and two GTX640s. Purely by coincidence, in the 2 weeks leading up to the AP v7 rollout, I had gradually upgraded that host to two GTX660s and two GTX750Tis, still mixed, but a bit more closely matched. This would be. -use_sleep -unroll 10 -oclfft_plan 256 16 256 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -tune 2 64 4 1 With each crime and every kindness we birth our future. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
This would be. Okay, thanks, Mike. I'll try those out. Since I run both APs and MBs, less than 3% of the tasks I receive tend to be APs, so I don't expect the overall boost to be that great, but it should be interesting to try! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Go to benchcfg.txt. So, I started running it, and it appears to be stuck running r2262? It finished the GPU test then started the AP6 CPU test...about 20 minutes ago. All the files in testdata appear to be old files from some test with Vista... Hmmmm.... So it says -oclFFT_plan 128 8 256 is the quickest but it appears a little too quick to me. The CPU test was 74 seconds with the GPU tests around 5 secs. Now it's just running this AP6 CPU file. How long is it supposed to run? 26 minutes so far... ------------ Running app : AP6_win_x86_SSE_CPU_r2262.exe with WU : #ap_genwis.dat Started at : 19:47:40.453 Ended at : 19:48:54.562 74.000 secs Elapsed 70.750 secs CPU time ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with WU : #ap_genwis.dat Started at : 19:48:57.718 Ended at : 19:49:56.843 59.047 secs Elapsed 54.938 secs CPU time Speedup : 22.35% Ratio : 1.29x Skipping validation, genwis run. ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 64 with WU : #ap_genwis.dat Started at : 19:50:00.453 Ended at : 19:50:05.687 5.188 secs Elapsed 1.281 secs CPU time Speedup : 98.19% Ratio : 55.23x Skipping validation, genwis run. ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 128 with WU : #ap_genwis.dat Started at : 19:50:08.968 Ended at : 19:50:14.109 5.063 secs Elapsed 1.313 secs CPU time Speedup : 98.14% Ratio : 53.88x Skipping validation, genwis run. ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 256 with WU : #ap_genwis.dat Started at : 19:50:17.390 Ended at : 19:50:22.406 4.953 secs Elapsed 1.313 secs CPU time Speedup : 98.14% Ratio : 53.88x Skipping validation, genwis run. ------------ Running app : AP6_win_x86_SSE_CPU_r2262.exe with WU : ap_Zblank_9LC67.wu Started at : 19:50:25.671 !!!!!!!!!!!! OK, it finished the CPU and is now running the GPU... ------------ Running app : AP6_win_x86_SSE_CPU_r2262.exe with WU : ap_Zblank_9LC67.wu Started at : 19:50:25.671 Ended at : 20:33:34.078 Result : stored as ref for validations. 2588.344 secs Elapsed 2574.156 secs CPU time ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe with WU : ap_Zblank_9LC67.wu Started at : 20:33:37.218 ^^^^^^^^^^^ All the files are still old files... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It finally finished the first long test with the -oclFFT_plan. Apparently it doesn't work with a NV 8800 GT with Driver 266.58 in XP; ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 64 with WU : ap_Zblank_9LC67.wu Started at : 20:57:22.187 Ended at : 21:25:08.625 1666.375 secs Elapsed 38.938 secs CPU time Speedup : 98.49% Ratio : 66.11x ref-AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe-ap_Zblank_9LC67.wu.res: <ap_signal>18,<pulses>8,<best_pulses>10 result-2-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10 All Signals: Weakly similar or Different. Pulses: pulse at signal 0 has no match (direction -->) Weakly similar or Different. Best Pulses: Weakly similar or Different. -(.\testDatas\ref\ref-AP6_win_x86_SSE2_OpenCL_ATI_r2346.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 4 [Weak], 1 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 4 [Weak] Single Pulses (Best): 10 [Weak], 1 above threshold*THRESHOLD_FUDGE -(.\testDatas\result-2-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 0 [Weak], 0 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 0 [Weak] Single Pulses (Best): 0 [Weak], 0 above threshold*THRESHOLD_FUDGE ref-AP6_win_x86_SSE_CPU_r2262.exe-ap_Zblank_9LC67.wu.res: <ap_signal>18,<pulses>8,<best_pulses>10 result-2-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res: <ap_signal>10,<pulses>0,<best_pulses>10 All Signals: Weakly similar or Different. Pulses: pulse at signal 0 has no match (direction -->) Weakly similar or Different. Best Pulses: Weakly similar or Different. -(.\testDatas\ref\ref-AP6_win_x86_SSE_CPU_r2262.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 4 [Weak], 1 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 4 [Weak] Single Pulses (Best): 10 [Weak], 1 above threshold*THRESHOLD_FUDGE -(.\testDatas\result-2-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 0 [Weak], 0 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 0 [Weak] Single Pulses (Best): 0 [Weak], 0 above threshold*THRESHOLD_FUDGE ------------ Running app : AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 128 with WU : ap_Zblank_9LC67.wu Started at : 21:25:12.343... !!!!!!!!!!!!!!!!!!!!! The other 2 finished, they are the same; -(.\testDatas\result-3-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 0 [Weak], 0 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 0 [Weak] Single Pulses (Best): 0 [Weak], 0 above threshold*THRESHOLD_FUDGE -(.\testDatas\result-4-AP7_win_x86_SSE2_OpenCL_NV_r2721.exe-ap_Zblank_9LC67.wu.res)- Reportable Single Pulses: 0 [Weak], 0 above threshold*THRESHOLD_FUDGE Reportable Repeating Pulses: 0 [Weak] Single Pulses (Best): 0 [Weak], 0 above threshold*THRESHOLD_FUDGE FUDGE! |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
The CPU test was 74 seconds with the GPU tests around 5 secs. :) "with WU : #ap_genwis.dat" is Not CPU/GPU test - it is only to generate .bin/.wisdom files (to be used in next real runs) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
 Don't mind comparisons 'ref-AP6' vs 'result-2-AP7' - I think they will always be incompatible (will be just a coincidence if they match) 'FUDGE' is % of allowed "uncertainty" (I think it is 1% - THRESHOLD_FUDGE = 1.01) From ap_validate_inc.h (in ...\APbench\Tools\rescmpAP\src.zip\) /* don't consider mismatches between pulses of power below * threshold*THRESHOLD_FUDGE as errors.*/ static const float THRESHOLD_FUDGE = 1.01f; At the end of the test there will be file in ...\APbench\Testdatas\ with a name like: ComputerName-20140802-0548-benchAP.txt Look for 'Quick timetable' at bottom of that file.   - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
 The ap_Zblank_9LC67.wu has zero blanking and the results SHOULD match between AP v6 and AP v7 processing. The 8 signals which ought to be found cover both dispersion polarities of each kind of analysis, so a mismatch could possibly point to some specific kind of error in processing. But more commonly when things go wrong there will be many extra false signals found which didn't really come from the WU data. Joe |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...At the end of the test there will be file in ...\APbench\Testdatas\ with a name like: Well, the test was taking much longer than expected. It was getting late. The machine is in a bedroom. After it started what appeared to be a 3rd round of tests I shut it down. It never finished. The machine is working fine again since removing the settings, http://setiathome.berkeley.edu/results.php?hostid=6813106, are you suggesting it might work with one of those settings that produced No Found Pulses? |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Asus GTX750TI-OC-2GD5, core 1188Mhz, memory 2048MB DDR5 @ 1350Mhz, factory defaults. Now running AP v7 (after outages during weekend) and RAC has risen to ~12.5k. And still stock app, I just wan't to see what RAC is gonna be. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Asus GTX750TI-OC-2GD5, core 1188Mhz, memory 2048MB DDR5 @ 1350Mhz, factory defaults. Still running stock AP v7 (with two v6 in queue), RAC is now about 13.5K. And somebody is thinking right now why I don't run optimized applications with command line switches. And the answer is that I wan't to know what stock apps can do in my host. And to give comparison to optimized hosts. "Please keep Your signature under four lines so Internet traffic doesn't go up too much" - In 1992 when I had my first e-mail address - |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
Still running stock AP v7 (with two v6 in queue), RAC is now about 13.5K. RAC now about 16K, didn't get any task for while so RAC is lower as it should be. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.