OpenCL NV MultiBeam v8 SoG edition for Windows

Author	Message
Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1795788 - Posted: 12 Jun 2016, 21:58:27 UTC - in response to Message 1795782. TBar, Is that 1 at a time or multiples on the same card? ID: 1795788 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1795797 - Posted: 12 Jun 2016, 22:20:38 UTC - in response to Message 1795788. TBar, Is that 1 at a time or multiples on the same card? Those Apps are using Petri's code which runs One task at a time in streams. Modifications done by petri33. ID: 1795797 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795805 - Posted: 12 Jun 2016, 22:35:04 UTC - in response to Message 1795782. Last modified: 12 Jun 2016, 22:36:53 UTC This task was run with the App set to maxrregcount=128; http://setiathome.berkeley.edu/result.php?resultid=4979681158 Run time: 8 min 22 sec CPU time: 8 min 14 sec I'm afraid this particulr task not good example at least right now: Ð˜Ð¼Ñ blc2_2bit_guppi_57451_64017_HIP116936_0007.21158.416.17.26.209.vlar_1 Ð—Ð°Ð´Ð°Ñ‡Ð° 2182225561 Ð¡Ð¾Ð·Ð´Ð°Ð½ 11 Jun 2016, 7:20:47 UTC ÐžÑ‚Ð¿Ñ€Ð°Ð²Ð»ÐµÐ½ 11 Jun 2016, 13:55:11 UTC ÐšÑ€Ð°Ð¹Ð½Ð¸Ð¹ ÑÑ€Ð¾Ðº Ð¾Ñ‚Ñ‡Ñ‘Ñ‚Ð° 3 Aug 2016, 18:54:53 UTC ÐŸÐ¾Ð»ÑƒÑ‡ÐµÐ½ 12 Jun 2016, 4:51:58 UTC Ð¡Ð¾ÑÑ‚Ð¾ÑÐ½Ð¸Ðµ ÑÐµÑ€Ð²ÐµÑ€Ð° Ð—Ð°Ð²ÐµÑ€ÑˆÐµÐ½Ð¾ Ð ÐµÐ·ÑƒÐ»ÑŒÑ‚Ð°Ñ‚ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ Ð£ÑÐ¿ÐµÑ… Ð¡Ð¾ÑÑ‚Ð¾ÑÐ½Ð¸Ðµ ÐºÐ»Ð¸ÐµÐ½Ñ‚Ð° Ð“Ð¾Ñ‚Ð¾Ð²Ð¾ Ð¡Ñ‚Ð°Ñ‚ÑƒÑ Ð²Ñ‹Ñ…Ð¾Ð´Ð° 0 (0x0) ID ÐºÐ¾Ð¼Ð¿ÑŒÑŽÑ‚ÐµÑ€Ð° 6796479 Ð’Ñ€ÐµÐ¼Ñ Ð²Ñ‹Ð¿Ð¾Ð»Ð½ÐµÐ½Ð¸Ñ 8 Ð¼Ð¸Ð½. 22 ÑÐµÐº. Ð’Ñ€ÐµÐ¼Ñ Ð¦ÐŸ 8 Ð¼Ð¸Ð½. 14 ÑÐµÐº. Ð¡Ð¾ÑÑ‚Ð¾ÑÐ½Ð¸Ðµ Ð¿Ñ€Ð¾Ð²ÐµÑ€ÐºÐ¸ ÐŸÑ€Ð¾Ð²ÐµÑ€ÐµÐ½Ð¾, Ð½Ð¾ Ð¿Ð¾ÐºÐ° Ð½ÐµÑ‚ ÑÐ¾Ð³Ð»Ð°ÑÐ¸Ñ ÐžÑ‡ÐºÐ¸ 0.00 ÐŸÐ¸ÐºÐ¾Ð²Ð°Ñ Ð¿Ñ€Ð¾Ð¸Ð·Ð²Ð¾Ð´Ð¸Ñ‚ÐµÐ»ÑŒÐ½Ð¾ÑÑ‚ÑŒ ÑƒÑÑ‚Ñ€Ð¾Ð¹ÑÑ‚Ð²Ð°, FLOPS 2,022.14 GFLOPS Ð’ÐµÑ€ÑÐ¸Ñ Ð¿Ñ€Ð¸Ð»Ð¾Ð¶ÐµÐ½Ð¸Ñ SETI@home v8 ÐÐ½Ð¾Ð½Ð¸Ð¼Ð½Ð°Ñ Ð¿Ð»Ð°Ñ‚Ñ„Ð¾Ñ€Ð¼Ð° (Ð“ÐŸ NVIDIA) Peak working set size 186.86 MB Peak swap size 27,254.51 MB Peak disk usage 0.03 MB That is, inconclusive. Maybe wingman will ultimately be invalid one, but better to give clear valid task for example. Also, there is no CUDA compiler options to pass to OpenCL compiler. At least it was so in older CUDA SDKs. Need to check new one. Strange, cause passing options to compiler is supported in OpenCL standard. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795805 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1795810 - Posted: 12 Jun 2016, 22:46:31 UTC - in response to Message 1795805. Last modified: 12 Jun 2016, 22:53:06 UTC Petri's latest mod results in the guppis being off by at least 1 pulse count even though the previous code gave the correct results. I'm waiting for the next version. The difference in maxrregcount is valid though. It has been noticed in previous builds for some time now. I believe I even posted about in over at Beta some time ago. If you set the compiler to maxrregcount=32 the App will be much slower. Unfortunately the gpus less than Compute Code 3.2 can't use any setting above 32, the compiler will just ignore the setting for those gpus and use 32. ID: 1795810 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1795820 - Posted: 12 Jun 2016, 23:41:58 UTC - in response to Message 1795810. PulseFind code has quite a lot variables so keeping them from spilling is good thing.With 128 regs per thread it would be possible to do some last iterations purely inside registers that would give very good speedup on those iterations. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1795820 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1796236 - Posted: 15 Jun 2016, 2:21:37 UTC Raistmer, Just noticed you removed r3430 SoG from your downloadables. That versions works best for me. Fortunately, still had the zip on another computer. ID: 1796236 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1796255 - Posted: 15 Jun 2016, 4:52:30 UTC - in response to Message 1796236. I also have noticed that the r3430 app is faster than the new "improved" r3472 app. In general about 50-100 seconds faster on non-VLAR (.41 AR typical) tasks compared to r3472 app. About 100 seconds faster on VLAR BLC2 GUPPI's. I haven't switched over my dedicated cruncher partly due to that fact and also due to the fact I dumped all my 8.00 tasks when I updated to the r3472 app and didn't notice the change in the app version. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1796255 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1796315 - Posted: 15 Jun 2016, 9:57:10 UTC - in response to Message 1796236. Last modified: 15 Jun 2016, 10:01:07 UTC Raistmer, Just noticed you removed r3430 SoG from your downloadables. That versions works best for me. Fortunately, still had the zip on another computer. It replaced with bugfixed SoG. Should be similar to 3430 but w/o hang on -use_sleep option. What relative slowdown you see (that is, not absolute number of seconds but 100%*deltaT(slowdown)/Elapsed_time ) ? BTW, such things will come unnoticed currently cause I can't immediately run NV SoG build at all. So,, all performance changes observations on you ... (Also would be good to see formal offline KWSN-made benches with full-length tasks) SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1796315 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1796316 - Posted: 15 Jun 2016, 10:02:40 UTC - in response to Message 1796255. I also have noticed that the r3430 app is faster than the new "improved" r3472 app. In general about 50-100 seconds faster on non-VLAR (.41 AR typical) tasks compared to r3472 app. About 100 seconds faster on VLAR BLC2 GUPPI's. I haven't switched over my dedicated cruncher partly due to that fact and also due to the fact I dumped all my 8.00 tasks when I updated to the r3472 app and didn't notice the change in the app version. The same, what relative slowdown is? 100 seconds per 300 total is one thing, 100 seconds per 100ks of elapsed time - quite another one. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1796316 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1796348 - Posted: 15 Jun 2016, 13:17:24 UTC - in response to Message 1796315. What relative slowdown you see (that is, not absolute number of seconds but 100%*deltaT(slowdown)/Elapsed_time ) ? BTW, such things will come unnoticed currently cause I can't immediately run NV SoG build at all. So,, all performance changes observations on you ... (Also would be good to see formal offline KWSN-made benches with full-length tasks) Well since I was over at Beta testing, I had not the chance to try the new installer. That and blc2,3 and 5 and blc 7,8 all have different run times. It's harder to get an true "average' Probably later today ID: 1796348 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1796410 - Posted: 15 Jun 2016, 17:23:44 UTC - in response to Message 1796316. The same, what relative slowdown is? 100 seconds per 300 total is one thing, 100 seconds per 100ks of elapsed time - quite another one. I was just comparing a dozen or so completed tasks in my lists for both machines. r3430 tasks, non-VLAR (AR 0.41) completed in an average of around 700 seconds. r3430 tasks, VLAR blc2 GUPPI tasks in an average about 1200 seconds. All tasks done two up on each card in both machines. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1796410 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1796412 - Posted: 15 Jun 2016, 17:40:45 UTC - in response to Message 1796410. Last modified: 15 Jun 2016, 17:44:15 UTC The same, what relative slowdown is? 100 seconds per 300 total is one thing, 100 seconds per 100ks of elapsed time - quite another one. I was just comparing a dozen or so completed tasks in my lists for both machines. r3430 tasks, non-VLAR (AR 0.41) completed in an average of around 700 seconds. r3430 tasks, VLAR blc2 GUPPI tasks in an average about 1200 seconds. All tasks done two up on each card in both machines. Best to compare guppi to guppi and non-guppi to non-guppi. Before this current reversion r3471? there was another between it and r3430 if I remember correctly. That version slowed down the processing of the GUPPI so that is why I revert back to r3430. Tonight I will give this new one a try and will see we can get comparables. or maybe it was the non_SoG version of r3430, hard to remember all the different ones, lol.. ID: 1796412 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1796414 - Posted: 15 Jun 2016, 17:51:54 UTC - in response to Message 1796412. Best to compare guppi to guppi and non-guppi to non-guppi. Before this current reversion r3471? there was another between it and r3430 if I remember correctly. That version slowed down the processing of the GUPPI so that is why I revert back to r3430. Tonight I will give this new one a try and will see we can get comparables. or maybe it was the non_SoG version of r3430, hard to remember all the different ones, lol.. I was comparing the runtimes of non-guppi to non-guppi and guppi to guppi between the original r3430 and r3472 apps. I was running the interim SoG r3430 app that didn't have SoG in its filename even though it was a SoG app. Lots of confusion there and was displayed in forum threads. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1796414 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65745 Credit: 55,293,173 RAC: 49	Message 1796423 - Posted: 15 Jun 2016, 19:37:12 UTC I am very happy with r3472, I like sog, this one works for Me at the very least. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 1796423 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.