The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 94 · Next

AuthorMessage
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024474 - Posted: 22 Dec 2019, 22:58:24 UTC - in response to Message 2024470.  

Yes, you can do that with a carefully composed app_info. The application doesn't matter, but the task description - platform, version number, plan_class - does.


Hi Richard, are you saying there's some kind of work-around possible just by modifying the app_info? If so, details would be great. :)
ID: 2024474 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024475 - Posted: 22 Dec 2019, 23:01:33 UTC - in response to Message 2024474.  

i dont think it will be a long term solution. as I understand it, if you even have an app_info.xml file present, the server marks you as anonymous platflorm and will stop sending you new tasks.

I think it will only work with the tasks you already have. then you would have to switch back to stock to get more tasks, and repeat.

please correct me if i'm wrong.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024475 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2024479 - Posted: 22 Dec 2019, 23:12:00 UTC - in response to Message 2024475.  

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.
ID: 2024479 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2024480 - Posted: 22 Dec 2019, 23:12:25 UTC

It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work.
Personally, Einstein is running fine here until it's squared away and I just can't see breaking what works here to work around a far-end issue ... Just a thought.
ID: 2024480 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2024484 - Posted: 22 Dec 2019, 23:57:04 UTC - in response to Message 2024479.  

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.

Seems to work. I edited the app_info to add info on cuda60 and cpu 8.05, so far so good...
ID: 2024484 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024485 - Posted: 22 Dec 2019, 23:58:05 UTC - in response to Message 2024479.  
Last modified: 22 Dec 2019, 23:59:10 UTC

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.


Okay, I was running stock successfully, built up some tasks and tried this. BOINC complained there was no app for the tasks and dumped them all. I did suspend SETI and shutdown BOINC before adding the app_info.xml file in and then restarted BOINC, unsuspend SETI.
ID: 2024485 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2024486 - Posted: 23 Dec 2019, 0:01:04 UTC

This is interesting:
Results received in last hour **	0	1,452	110,219

Since most of the mid/big crunchers who run the anonymous app are now out of work that shows about 25% of the SETI work is done my them.

Hope Eric could fix the bug ASAP.
ID: 2024486 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2024488 - Posted: 23 Dec 2019, 0:14:41 UTC - in response to Message 2024480.  

It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work.
Personally, Einstein is running fine here until it's squared away and I just can't see breaking what works here to work around a far-end issue ... Just a thought.


Just setting the No New Tasks doesn't seem to stop my hitting the Server. I probably have to increase the "get additional tasks" to something like 0.25 or 0.3

Tom
A proud member of the OFA (Old Farts Association).
ID: 2024488 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024494 - Posted: 23 Dec 2019, 0:30:31 UTC - in response to Message 2024485.  

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.


Okay, I was running stock successfully, built up some tasks and tried this. BOINC complained there was no app for the tasks and dumped them all. I did suspend SETI and shutdown BOINC before adding the app_info.xml file in and then restarted BOINC, unsuspend SETI.


you need to change the version number also. whatever the app is labelled. the SoG app is 8.22, so in the version number field in the app_info file, you should put 822.

I have a mix of sah and SoG tasks at the moment, so I just created 2 v8 sections one for plan class opencl_nvidia_SoG and another for opencl_nvidia_sah pointing them both to the special app. it seems to have worked. I had to abort a few tasks that were previously running on the sog/sah apps and had checkpointed, but otherwise its working fine.

it's too troublesome to keep going back and forth like this, but i'm just trying to load test this new system lol. the SoG/sah apps only load my 2080's to ~150W maximum, but the special app will hit the full 200W load.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024494 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2024495 - Posted: 23 Dec 2019, 0:33:22 UTC - in response to Message 2024488.  
Last modified: 23 Dec 2019, 0:33:46 UTC

It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work.
Personally, Einstein is running fine here until it's squared away and I just can't see breaking what works here to work around a far-end issue ... Just a thought.


Just setting the No New Tasks doesn't seem to stop my hitting the Server. I probably have to increase the "get additional tasks" to something like 0.25 or 0.3

Tom
Which is probably a good idea if you're running Einstein in the interim. Weird that it would "check in" if there's nothing to report and it's set NNT. I don't think I see that here ... if I do, the logs don't reflect it. No activity since I set NNT on either Linux box. On the Win box I don't have NNT, but have nothing to report and it hasn't pinged the servers in the last hour either. Sure you're not seeing retries based on server-side back-off timers?
ID: 2024495 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2024499 - Posted: 23 Dec 2019, 0:47:07 UTC - in response to Message 2024494.  

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.


Okay, I was running stock successfully, built up some tasks and tried this. BOINC complained there was no app for the tasks and dumped them all. I did suspend SETI and shutdown BOINC before adding the app_info.xml file in and then restarted BOINC, unsuspend SETI.


you need to change the version number also. whatever the app is labelled. the SoG app is 8.22, so in the version number field in the app_info file, you should put 822.

I have a mix of sah and SoG tasks at the moment, so I just created 2 v8 sections one for plan class opencl_nvidia_SoG and another for opencl_nvidia_sah pointing them both to the special app. it seems to have worked. I had to abort a few tasks that were previously running on the sog/sah apps and had checkpointed, but otherwise its working fine.

it's too troublesome to keep going back and forth like this, but i'm just trying to load test this new system lol. the SoG/sah apps only load my 2080's to ~150W maximum, but the special app will hit the full 200W load.


I'm not sure how to do that. Could you post or message me yours as an example? If the spoofing works (should since it is above SETI), I could load up and then make the change. May not bother; I am finally getting SoG and sah types rather than cuda60. I'm only running my middle system stock for now. The others are Einstein.
ID: 2024499 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 2024502 - Posted: 23 Dec 2019, 0:54:28 UTC - in response to Message 2024449.  

As you frequently have, Stephen, you've voiced a point fairly succinctly. I don't think you're being over dramatic. Issues of confidence are always the most difficult to assuage. For myself, I have had a large number of 'errors' in the last 24 hours or so, on GPU tasks. It might be the nine-year-old Corsair HX PSU needing replacing or fixing (possible fur build-up for example) or it could be the same for the GTX 970......or both of them. I have an EVGA PSU I can substitute, but throwing more money at another GPU is out of the question. I can't justify the cost when, even on Ebay a good 970 from Asus, EVGA or Gigabyte is going to cost £100+. A GTX 980? Are you kidding? You can forget the 10XX series, too......big time. That £100+ might not sound much to some, but, if you factor in 'project confidence', it is almost akin to throwing the cash down the nearest drain.....more so, given that the card might only last a week. Since I'm not seeing any problem in any other use including games, it is probably a moot point. My only other alternatives, would be to take one of the paired, water-cooled 970s out of an i5-4690K gaming system or, given the work involved, for that gaming system to become my 'daily' and 'cruncher'. Doing the latter, would effectively mean committing even more hardware and financial resources to a project where my 'reward' over time, would be less than it is now, if we have to run 'stock'! I well remember the 'very extended maintenance' of a few years ago and had no problem with that (I was one of the many who advocated it), but this latest 'episode' is something else. It may well be, make or break, as you suggest. For me, that moment appears to have arrived. Best of luck to you all.
Don't take life too seriously, as you'll never come out of it alive!
ID: 2024502 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2024509 - Posted: 23 Dec 2019, 1:12:17 UTC - in response to Message 2024502.  

As you frequently have, Stephen, you've voiced a point fairly succinctly. I don't think you're being over dramatic. Issues of confidence are always the most difficult to assuage. For myself, I have had a large number of 'errors' in the last 24 hours or so, on GPU tasks. It might be the nine-year-old Corsair HX PSU needing replacing or fixing (possible fur build-up for example) or it could be the same for the GTX 970......or both of them. I have an EVGA PSU I can substitute, but throwing more money at another GPU is out of the question. I can't justify the cost when, even on Ebay a good 970 from Asus, EVGA or Gigabyte is going to cost £100+. A GTX 980? Are you kidding? You can forget the 10XX series, too......big time. That £100+ might not sound much to some, but, if you factor in 'project confidence', it is almost akin to throwing the cash down the nearest drain.....more so, given that the card might only last a week. Since I'm not seeing any problem in any other use including games, it is probably a moot point. My only other alternatives, would be to take one of the paired, water-cooled 970s out of an i5-4690K gaming system or, given the work involved, for that gaming system to become my 'daily' and 'cruncher'. Doing the latter, would effectively mean committing even more hardware and financial resources to a project where my 'reward' over time, would be less than it is now, if we have to run 'stock'! I well remember the 'very extended maintenance' of a few years ago and had no problem with that (I was one of the many who advocated it), but this latest 'episode' is something else. It may well be, make or break, as you suggest. For me, that moment appears to have arrived. Best of luck to you all.
Actually Iona your failed tasks are all from "device 1" which should be your 2nd GPU as "device 0" is completing its tasks. If a restart doesn't clear the problem (it may have just had a driver crash and hasn't recovered) then just remove that offending GPU.

Cheers.
ID: 2024509 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2024510 - Posted: 23 Dec 2019, 1:16:29 UTC - in response to Message 2024499.  

I'm not sure how to do that. Could you post or message me yours as an example? If the spoofing works (should since it is above SETI), I could load up and then make the change. May not bother; I am finally getting SoG and sah types rather than cuda60. I'm only running my middle system stock for now. The others are Einstein.


I only run SETI, so you may need to make some modifications. I also do not run any CPU work.

I just took the app_info.xml file supplied in the AIO, and added a second v8 GPU section, since I had a mix of 2 different kinds of tasks (SoG and sah), both of these apps are labelled v8.22. the SoG app has plan class "opencl_nvidia_SoG" and the sah app has plan class "opencl_nvidia_sah". So all I did to my existing app_info file was add another copy of the v8 GPU section, changed the plan classes for these 2 sections, and then changed the version numbers from these two sections.

here is my app_info in its entirety. make whatever changes for whatever tasks types you have on your systems or what apps you want to run. if you don't understand the changes you need to make, then I wouldn't touch it and just wait for the project to fix itself.

<app_info>
  <app>
     <name>setiathome_v8</name>
  </app>
    <file_info>
      <name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>822</version_num>
      <plan_class>opencl_nvidia_SoG</plan_class>
      <cmdline>-nobs</cmdline>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</file_name>
          <main_program/>
      </file_ref>
    </app_version>
  <app>
     <name>setiathome_v8</name>
  </app>
    <file_info>
      <name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</name>
      <executable/>
    </file_info>
    <app_version>
      <app_name>setiathome_v8</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>822</version_num>
      <plan_class>opencl_nvidia_sah</plan_class>
      <cmdline>-nobs</cmdline>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</file_name>
          <main_program/>
      </file_ref>
    </app_version>
  <app>
     <name>astropulse_v7</name>
  </app>
     <file_info>
       <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
        <executable/>
     </file_info>
     <file_info>
       <name>AstroPulse_Kernels_r2751.cl</name>
     </file_info>
     <file_info>
       <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
     </file_info>
    <app_version>
      <app_name>astropulse_v7</app_name>
      <platform>x86_64-pc-linux-gnu</platform>
      <version_num>708</version_num>
      <plan_class>opencl_nvidia_100</plan_class>
      <coproc>
        <type>NVIDIA</type>
        <count>1</count>
      </coproc>
      <avg_ncpus>0.1</avg_ncpus>
      <max_ncpus>0.1</max_ncpus>
      <file_ref>
         <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
          <main_program/>
      </file_ref>
      <file_ref>
         <file_name>AstroPulse_Kernels_r2751.cl</file_name>
      </file_ref>
      <file_ref>
         <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
         <open_name>ap_cmdline.txt</open_name>
      </file_ref>
    </app_version>
   <app>
      <name>setiathome_v8</name>
   </app>
      <file_info>
         <name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</name>
         <executable/>
      </file_info>
     <app_version>
     <app_name>setiathome_v8</app_name>
     <platform>x86_64-pc-linux-gnu</platform>
     <version_num>800</version_num>   
      <file_ref>
        <file_name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</file_name>
        <main_program/>
      </file_ref>
    </app_version>
   <app>
      <name>astropulse_v7</name>
   </app>
     <file_info>
       <name>ap_7.05r2728_sse3_linux64</name>
        <executable/>
     </file_info>
    <app_version>
       <app_name>astropulse_v7</app_name>
       <version_num>704</version_num>
       <platform>x86_64-pc-linux-gnu</platform>
       <plan_class></plan_class>
       <file_ref>
         <file_name>ap_7.05r2728_sse3_linux64</file_name>
          <main_program/>
       </file_ref>
    </app_version>
</app_info>

Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2024510 · Report as offensive
Profile Retvari Zoltan

Send message
Joined: 28 Apr 00
Posts: 35
Credit: 128,746,856
RAC: 230
Hungary
Message 2024516 - Posted: 23 Dec 2019, 1:37:46 UTC - in response to Message 2024510.  
Last modified: 23 Dec 2019, 1:55:30 UTC

The scheduler won't send work for anonymous platform, so you can't use the app_info.xml, therefore no way tweaking the app_info.xml would resolve this issue on the client side.

I have a workaround for those who want to use the special app, while avoiding anonymous platform:
(this workaround will work only on single GPU hosts)

1. rename your app_info.xml to something else (for example: app_info_.xml)
2. exit BOINC manager with closing the science apps
3. restart BOINC manager
4. let the BOINC manager download some work preferably from all types (CUDA6.0, or opencl_sah, opencl_SoG)
5. exit BOINC manager with closing the science apps
6. edit cc_config.xml, add the following line to the <options> section:
<dont_check_file_sizes>1</dont_check_file_sizes>
save the changes
7. copy your favorite special app over the original executables (it is good to make a backup copy of the original ones adding an _ to their filename)
8. restart BOINC manager
9. enjoy the fastest ever "opencl" app

I can't give you a link to my recent workunits, somehow they didn't appear in the task list...

ps.: if you use the -nobs parameter in the command line, you should set the avg_ncpus and the max_ncpus to 1 (instead of 0.1)
ID: 2024516 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2024518 - Posted: 23 Dec 2019, 1:48:20 UTC - in response to Message 2024516.  
Last modified: 23 Dec 2019, 2:00:00 UTC

Brilliant workaround... applying to every machine stat and thank you! :^)
Edit: I tried it first on a 2x2080ti machine and it was doing two at once fine... watched it complete half a dozen like that. Hrm.
ID: 2024518 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024522 - Posted: 23 Dec 2019, 1:55:04 UTC - in response to Message 2024470.  

Yes, you can do that with a carefully composed app_info. The application doesn't matter, but the task description - platform, version number, plan_class - does.


. . So as long as you leave your app_info.xml at the description/platform/version/plan_class of the stock apps the servers will not call you anonymous platform even if the special app is actually doing the grunt work? Very interesting!

. . BTW, while I am still getting the response "no tasks available" I can now report completed tasks without needing to set NTT first. So something seems to have been partly fixed. (crossing fingers)

Stephen

8^}
ID: 2024522 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024524 - Posted: 23 Dec 2019, 1:58:58 UTC - in response to Message 2024480.  

It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work.
Personally, Einstein is running fine here until it's squared away and I just can't see breaking what works here to work around a far-end issue ... Just a thought.


. . Or you could do what I have done with the machines with no work, turn them off and save on power bills.

Stephen

:)
ID: 2024524 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024525 - Posted: 23 Dec 2019, 2:05:13 UTC - in response to Message 2024485.  

Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster.


Okay, I was running stock successfully, built up some tasks and tried this. BOINC complained there was no app for the tasks and dumped them all. I did suspend SETI and shutdown BOINC before adding the app_info.xml file in and then restarted BOINC, unsuspend SETI.


. . I think you missed the key element ... you needed to edit your app_info.xml to include the descriptors and version/plan_number information for the stock apps on your machine but with the app name changed to the special app before placing it back into the SETI folder. Your previous app_info.xml would probably not have sections to cover those tasks so it does not know the app to use to process them and dumps them.

Stephen

:(
ID: 2024525 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2024526 - Posted: 23 Dec 2019, 2:12:13 UTC - in response to Message 2024486.  

This is interesting:
Results received in last hour **	0	1,452	110,219

Since most of the mid/big crunchers who run the anonymous app are now out of work that shows about 25% of the SETI work is done my them.
Hope Eric could fix the bug ASAP.


. . Hi Juan,

. . I think the percentage might be higher than that. I think the average enhanced host is doing something like 10 to 20 times the work that the average "run it the background" hosts are doing and the really big guns are doing many times that again. With that kind of ratio it might be more like 1/3. But I might be a little optimistic there, or is cynical?

Stephen

<shrug>
ID: 2024526 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.