Question about SOG

Author	Message
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1838581 - Posted: 29 Dec 2016, 17:53:16 UTC SOG is an open cl app as I understand. Is there a appreciable difference in run time between running open cl on Windoz vs Linux? The reason I ask is Einstein recently came out with an open cl app and it runs 5 to 10 times faster on Linux than Windoz. Is that something inherent in Linux or just the way the app was written? ID: 1838581 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1838585 - Posted: 29 Dec 2016, 17:59:22 UTC When I moved one of my crunchers from Windows to Linux it briefly ran SoG applications. I found very little difference in run times, but there was a reduction in the demands on the CPU. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1838585 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80	Message 1838646 - Posted: 29 Dec 2016, 22:06:56 UTC Since the seti apps have the same codebase for windows and Linux i can say there is no big difference in speed. Of course drivers are different so it might vary a little bit. With each crime and every kindness we birth our future. ID: 1838646 ·

baron_iv Volunteer tester Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0	Message 1840588 - Posted: 7 Jan 2017, 11:38:49 UTC I have noticed a tremendous difference when using Linux vs Windows. My dual Nvidia 1070 system, under Windows, maxes out around 45k RAC, but under Linux, it hovers around 70-75k RAC with the "special sauce" app for Linux. I can't explain why there's a discrepancy, or what causes it, but Linux gets significantly more work done for a given time period than Windows. ID: 1840588 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1840608 - Posted: 7 Jan 2017, 12:44:57 UTC The big improvement using the TBar/Petri special application over SoG or SAH is that the special application is very highly optimised, using some "tricks" that may not work properly under Windows. There are a few folks working hard to get them to work under windows, but it is proving to be somewhat challenging. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1840608 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1840630 - Posted: 7 Jan 2017, 14:08:44 UTC - in response to Message 1840608. Last modified: 7 Jan 2017, 14:10:45 UTC Is it in essence mostly about "sleep" and timer accuracy? If so, can in Windows HPET be used? Sure, it has to be enabled first as I think it is disabled by default... ID: 1840630 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1840721 - Posted: 7 Jan 2017, 22:28:33 UTC - in response to Message 1840630. Last modified: 7 Jan 2017, 22:28:55 UTC Hi, It is not about sleep and other things. It mostly about a) distributing the work that has to be done to all 'symmetrical multiprocessing units' (SM/SMX). b) doing 'some' optimisations on the code itself c) using shared memory where applicable (that is not published yet to other CUDA developers) d) doing autocorrelation fft in a novel way needing way less memory accesses and less computation. e) queueing tasks so that more gets done at a time f) optimising kernel register usage and kernel size g) some other minor stuff The a) is the hardest part to do right. I'm running now a version l (L stands for locking/synchronising globally). The version helps a lot, but is not bug free. It can not be published yet. Main reason being occasional lockups. Previous versions had problems with the order of 'finding/reporting' pulses and sometimes reporting a bad value. My code is something before alpha. I test it first. Then others (superior people) test it and after it has been field proofed by a small and a larger group of users in the beta to be 'valid' the others can get it. Otherwise we'd ruin the science. You may ask: Why do you run it on main? -- I do, because it is allowed and encouraged to do so. At the same time I can show what is possible and the caveats of doing so. And I also run it on beta ofcause. -- Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1840721 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1840728 - Posted: 7 Jan 2017, 23:14:08 UTC - in response to Message 1840721. e) queueing tasks so that more gets done at a time I suspect that alone saves a good chunk of time. I notice running SoG that the first 14-20secs of WU processing isn't done on the GPU (GPU load is 0%). Pre-processing the next WU to run so that when it starts, it starts crunching on the GPU straight away would save that 14-20secs on every WU. Grant Darwin NT ID: 1840728 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1840760 - Posted: 8 Jan 2017, 2:20:32 UTC - in response to Message 1840728. Last modified: 8 Jan 2017, 2:22:10 UTC e) queueing tasks so that more gets done at a time I suspect that alone saves a good chunk of time. I notice running SoG that the first 14-20secs of WU processing isn't done on the GPU (GPU load is 0%). Pre-processing the next WU to run so that when it starts, it starts crunching on the GPU straight away would save that 14-20secs on every WU. That is an interesting find. I'm sure Raistmer can tell more about that. And anyone running SoG can run 2 at a time to overcome that. The e) queueuing ... is done on CPU to fill the GPU queues to the max. That can be micromanaged too. Sometimes it pays off at the end to do some beforehandwork (Grand total.) Please insert space(s) where ever you want to. Then after that to the Press Space Bar to continue! EDIT: My RAC hit 200 000! while writing this. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1840760 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1840770 - Posted: 8 Jan 2017, 3:21:09 UTC - in response to Message 1840760. Last modified: 8 Jan 2017, 3:23:08 UTC I'm sure Raistmer can tell more about that. And anyone running SoG can run 2 at a time to overcome that. I've run 2 at a time with my particular command line settings, and the only improvement I got was about an extra 1-1.5 WUs per hour. Not really worth it IMHO. But it looks like it that gain would mostly be the result of offsetting that initial CPU setup work period. It's nothing like the benefit with CUDA50 of running 2 (or more) at a time. EDIT: My RAC hit 200 000! while writing this. Nothing else comes close to boosting the numbers like all Arecibo work. ;-) Grant Darwin NT ID: 1840770 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1840775 - Posted: 8 Jan 2017, 3:45:34 UTC - in response to Message 1840770. Last modified: 8 Jan 2017, 3:47:07 UTC I'm sure Raistmer can tell more about that. And anyone running SoG can run 2 at a time to overcome that. I've run 2 at a time with my particular command line settings, and the only improvement I got was about an extra 1-1.5 WUs per hour. Not really worth it IMHO. But it looks like it that gain would mostly be the result of offsetting that initial CPU setup work period. It's nothing like the benefit with CUDA50 of running 2 (or more) at a time. EDIT: My RAC hit 200 000! while writing this. Nothing else comes close to boosting the numbers like all Arecibo work. ;-) Yes, I noticed the APR to bump up from 1300 to over 1700. I wonder what it will do: one credit per second times four times 24. (If it lasts) To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1840775 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.