High performance Linux clients at SETI

Author	Message
rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22227 Credit: 416,307,556 RAC: 380	Message 1992709 - Posted: 5 May 2019, 17:27:13 UTC For a number of years SETI has only allowed 100 concurrent tasks for the CPU plus 100 per GPU. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1992709 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1992712 - Posted: 5 May 2019, 17:50:30 UTC - in response to Message 1992707. Is there any workaround for this, since otherwise every downtime longer then a few hours and machine is becoming idle? The workaround is anticipation of the loss of work due to the planned and unplanned outages. You can reschedule gpu work from the gpu cache and move it to the cpu cache in advance of an outage and then move it back by using any one of the rescheduling solutions. Rescheduling has had its own dedicated thread for a few years now. I suggest a read. https://setiathome.berkeley.edu/forum_thread.php?id=79954&postid=1803817#1803817 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1992712 ·

M_M Send message Joined: 20 May 04 Posts: 76 Credit: 45,752,966 RAC: 8	Message 1992718 - Posted: 5 May 2019, 18:21:58 UTC - in response to Message 1992708. Last modified: 5 May 2019, 18:23:22 UTC The most work we have ever been able to download for our work caches is 100 tasks per cpu + 100 tasks per gpu. So the days of work and additional days of work setting is meaningless on modern fast hosts. Maybe still applicable to phones and single board computers like the Raspberry Pi. It would make sense that size of work cache is related to RAC... so maybe to keep 100+100 as default (minimum, appropriate for slow hosts) and to be increased to typical planned outage period based on RAC for each host. This way, everyone is happy, resources are optimally used, and also no "unnecessary" additional work is sent to hosts... Probably not so difficult to implement the logic... ID: 1992718 ·

Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0	Message 1992720 - Posted: 5 May 2019, 18:43:11 UTC - in response to Message 1992712. Last modified: 5 May 2019, 18:59:04 UTC Is there any workaround for this, since otherwise every downtime longer then a few hours and machine is becoming idle? The workaround is anticipation of the loss of work due to the planned and unplanned outages. You can reschedule gpu work from the gpu cache and move it to the cpu cache in advance of an outage and then move it back by using any one of the rescheduling solutions. Rescheduling has had its own dedicated thread for a few years now. I suggest a read. https://setiathome.berkeley.edu/forum_thread.php?id=79954&postid=1803817#1803817 I just use another project and set its resource share to zero. That way it only runs when seti is down or out of Wus and your computer doesn't sit idle. ID: 1992720 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1992725 - Posted: 5 May 2019, 19:40:45 UTC - in response to Message 1992718. It would make sense that size of work cache is related to RAC... so maybe to keep 100+100 as default (minimum, appropriate for slow hosts) and to be increased to typical planned outage period based on RAC for each host. This way, everyone is happy, resources are optimally used, and also no "unnecessary" additional work is sent to hosts... Probably not so difficult to implement the logic... This topic has been raised innumerable times and the reality is that the current work cache size is not going to change. The project espouses that it has never guaranteed an endless supply of work and recommends backup projects when Seti has no work. The current work cache size was sufficient back when the project first started on the current hardware of the time and the use of the project was to make use of unused computer cycles when the Seti screensaver could run and crunch work in the background. It still hews to that original purpose, as you still have a Seti screensaver even though the performance capabilities of today's hardware is magnitudes greater than when the project started. Also after 10 years of development, the BOINC code is very complicated and not so simple to fix or change as you suggest. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1992725 ·

Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0	Message 1992763 - Posted: 6 May 2019, 1:00:40 UTC You guys were right it was a VRAM issue. The GT 730 (2 GB VRAM) is up and running the CUDA60 app flawlessly. Thanks for your help. ID: 1992763 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1992798 - Posted: 6 May 2019, 9:37:27 UTC - in response to Message 1992763. Since you are running a GPU with 2 GB of VRAM you should go back to the default unroll setting. Just remove the -unroll 1 and it will go back to autotune and automatically run unroll 2 or whatever. The times will improve on that card using unroll 2. ID: 1992798 ·

Loren Datlof Send message Joined: 24 Jan 14 Posts: 73 Credit: 19,652,385 RAC: 0	Message 1992821 - Posted: 6 May 2019, 14:06:53 UTC - in response to Message 1992798. Since you are running a GPU with 2 GB of VRAM you should go back to the default unroll setting. Just remove the -unroll 1 and it will go back to autotune and automatically run unroll 2 or whatever. The times will improve on that card using unroll 2. Done. Thanks again. ID: 1992821 ·

BoincSpy Volunteer tester Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353	Message 1992880 - Posted: 6 May 2019, 23:31:20 UTC Hi Is there specific command line arguments that I should try for the RTX 2070 graphics card? Thanks in advance, BoincSpy ID: 1992880 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1992901 - Posted: 7 May 2019, 1:20:58 UTC - in response to Message 1992880. Without knowing what your hardware consists of because you have hidden your hosts, impossible to suggest anything. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1992901 ·

BoincSpy Volunteer tester Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353	Message 1993034 - Posted: 7 May 2019, 23:05:41 UTC - in response to Message 1992901. Sorry about that I thought I had the host visible. They are now visible... Thank you in Advance. ID: 1993034 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1993055 - Posted: 8 May 2019, 1:06:24 UTC - in response to Message 1993034. Sorry about that I thought I had the host visible. They are now visible... Thank you in Advance. You could speed them up a bit by adding the -nobs parameter to the <cmdline></cmdline> entry location in either the app_info.xml or the app_config.xml. Make sure you reduce your cpu thread usage as the -nobs parameter requires a full cpu core to support the gpu task. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1993055 ·

BoincSpy Volunteer tester Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353	Message 1993132 - Posted: 8 May 2019, 16:15:14 UTC - in response to Message 1993055. Last modified: 8 May 2019, 16:15:37 UTC Thanks for the suggestion, I am now getting work-units being completed just shy of a minute.. ID: 1993132 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1993142 - Posted: 8 May 2019, 17:28:31 UTC - in response to Message 1993132. I don't see any use of the -nobs parameter yet. Did you restart BOINC or re-read config files in the Manager? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1993142 ·

BoincSpy Volunteer tester Send message Joined: 3 Apr 99 Posts: 146 Credit: 124,775,115 RAC: 353	Message 1993144 - Posted: 8 May 2019, 17:38:27 UTC - in response to Message 1993142. I just suspended boinc and restarted after I added nobs, I have no restarted boinc.. Here is portion of my app_info.xml <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</file_name> <main_program/> </file_ref> </app_version> ... ID: 1993144 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1993145 - Posted: 8 May 2019, 17:43:30 UTC Suspending BOINC does not read the app_info.xml. You have to completely restart it. If the parameter is placed in the app_config file, only a re-read of config files is necessary. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1993145 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1993431 - Posted: 11 May 2019, 15:50:20 UTC Hi, I am wondering if I should retrograde back to the previous version on both of my HEDC boxes? I am still seeing a lot of inconclusives on a daily basis. And while some of them are "Darwin" disagreements, at least some are not. Could someone take a look and offer and opinion. I know some of my problems were pushing my AMD cpu harder than it appears to be able to work. But a lot are just plain inconclusives. Thank you. Tom A proud member of the OFA (Old Farts Association). ID: 1993431 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1993437 - Posted: 11 May 2019, 17:29:14 UTC I don't see anything out of the ordinary. You can discount all the overflows too. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1993437 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1993439 - Posted: 11 May 2019, 17:40:21 UTC - in response to Message 1993437. I don't see anything out of the ordinary. You can discount all the overflows too. Ok, Thank you. I just hate slowing things down because of a simple fixable problem. Tom A proud member of the OFA (Old Farts Association). ID: 1993439 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1995072 - Posted: 24 May 2019, 22:23:22 UTC - in response to Message 1991417. Last modified: 24 May 2019, 22:37:46 UTC Updated to 0.98b1 CUDA90 In stderr: @ SETI@home using CUDA accelerated device GeForce GTX 1050 Ti Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 @ Should I change params or it's OK for Ti1050 to go with such default settings? . . I thought that unroll =1 was the default for 0.98b1 because it has the reduced external access process that keeps everything in memory on the GPU for pulse find so the unroll doesn't matter. Stephen ? ? ID: 1995072 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.