Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 94 · Next
Author | Message |
---|---|
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
i dont think it will be a long term solution. as I understand it, if you even have an app_info.xml file present, the server marks you as anonymous platflorm and will stop sending you new tasks. I think it will only work with the tasks you already have. then you would have to switch back to stock to get more tasks, and repeat. please correct me if i'm wrong. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work. Personally, Einstein is running fine here until it's squared away and I just can't see breaking what works here to work around a far-end issue ... Just a thought. |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. Seems to work. I edited the app_info to add info on cuda60 and cpu 8.05, so far so good... |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. Okay, I was running stock successfully, built up some tasks and tried this. BOINC complained there was no app for the tasks and dumped them all. I did suspend SETI and shutdown BOINC before adding the app_info.xml file in and then restarted BOINC, unsuspend SETI. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
This is interesting: Results received in last hour ** 0 1,452 110,219 Since most of the mid/big crunchers who run the anonymous app are now out of work that shows about 25% of the SETI work is done my them. Hope Eric could fix the bug ASAP. |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work. Just setting the No New Tasks doesn't seem to stop my hitting the Server. I probably have to increase the "get additional tasks" to something like 0.25 or 0.3 Tom A proud member of the OFA (Old Farts Association). |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. you need to change the version number also. whatever the app is labelled. the SoG app is 8.22, so in the version number field in the app_info file, you should put 822. I have a mix of sah and SoG tasks at the moment, so I just created 2 v8 sections one for plan class opencl_nvidia_SoG and another for opencl_nvidia_sah pointing them both to the special app. it seems to have worked. I had to abort a few tasks that were previously running on the sog/sah apps and had checkpointed, but otherwise its working fine. it's too troublesome to keep going back and forth like this, but i'm just trying to load test this new system lol. the SoG/sah apps only load my 2080's to ~150W maximum, but the special app will hit the full 200W load. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Which is probably a good idea if you're running Einstein in the interim. Weird that it would "check in" if there's nothing to report and it's set NNT. I don't think I see that here ... if I do, the logs don't reflect it. No activity since I set NNT on either Linux box. On the Win box I don't have NNT, but have nothing to report and it hasn't pinged the servers in the last hour either. Sure you're not seeing retries based on server-side back-off timers?It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work. |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. I'm not sure how to do that. Could you post or message me yours as an example? If the spoofing works (should since it is above SETI), I could load up and then make the change. May not bother; I am finally getting SoG and sah types rather than cuda60. I'm only running my middle system stock for now. The others are Einstein. |
Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0 |
As you frequently have, Stephen, you've voiced a point fairly succinctly. I don't think you're being over dramatic. Issues of confidence are always the most difficult to assuage. For myself, I have had a large number of 'errors' in the last 24 hours or so, on GPU tasks. It might be the nine-year-old Corsair HX PSU needing replacing or fixing (possible fur build-up for example) or it could be the same for the GTX 970......or both of them. I have an EVGA PSU I can substitute, but throwing more money at another GPU is out of the question. I can't justify the cost when, even on Ebay a good 970 from Asus, EVGA or Gigabyte is going to cost £100+. A GTX 980? Are you kidding? You can forget the 10XX series, too......big time. That £100+ might not sound much to some, but, if you factor in 'project confidence', it is almost akin to throwing the cash down the nearest drain.....more so, given that the card might only last a week. Since I'm not seeing any problem in any other use including games, it is probably a moot point. My only other alternatives, would be to take one of the paired, water-cooled 970s out of an i5-4690K gaming system or, given the work involved, for that gaming system to become my 'daily' and 'cruncher'. Doing the latter, would effectively mean committing even more hardware and financial resources to a project where my 'reward' over time, would be less than it is now, if we have to run 'stock'! I well remember the 'very extended maintenance' of a few years ago and had no problem with that (I was one of the many who advocated it), but this latest 'episode' is something else. It may well be, make or break, as you suggest. For me, that moment appears to have arrived. Best of luck to you all. Don't take life too seriously, as you'll never come out of it alive! |
Wiggo Send message Joined: 24 Jan 00 Posts: 36351 Credit: 261,360,520 RAC: 489 |
As you frequently have, Stephen, you've voiced a point fairly succinctly. I don't think you're being over dramatic. Issues of confidence are always the most difficult to assuage. For myself, I have had a large number of 'errors' in the last 24 hours or so, on GPU tasks. It might be the nine-year-old Corsair HX PSU needing replacing or fixing (possible fur build-up for example) or it could be the same for the GTX 970......or both of them. I have an EVGA PSU I can substitute, but throwing more money at another GPU is out of the question. I can't justify the cost when, even on Ebay a good 970 from Asus, EVGA or Gigabyte is going to cost £100+. A GTX 980? Are you kidding? You can forget the 10XX series, too......big time. That £100+ might not sound much to some, but, if you factor in 'project confidence', it is almost akin to throwing the cash down the nearest drain.....more so, given that the card might only last a week. Since I'm not seeing any problem in any other use including games, it is probably a moot point. My only other alternatives, would be to take one of the paired, water-cooled 970s out of an i5-4690K gaming system or, given the work involved, for that gaming system to become my 'daily' and 'cruncher'. Doing the latter, would effectively mean committing even more hardware and financial resources to a project where my 'reward' over time, would be less than it is now, if we have to run 'stock'! I well remember the 'very extended maintenance' of a few years ago and had no problem with that (I was one of the many who advocated it), but this latest 'episode' is something else. It may well be, make or break, as you suggest. For me, that moment appears to have arrived. Best of luck to you all.Actually Iona your failed tasks are all from "device 1" which should be your 2nd GPU as "device 0" is completing its tasks. If a restart doesn't clear the problem (it may have just had a driver crash and hasn't recovered) then just remove that offending GPU. Cheers. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I'm not sure how to do that. Could you post or message me yours as an example? If the spoofing works (should since it is above SETI), I could load up and then make the change. May not bother; I am finally getting SoG and sah types rather than cuda60. I'm only running my middle system stock for now. The others are Einstein. I only run SETI, so you may need to make some modifications. I also do not run any CPU work. I just took the app_info.xml file supplied in the AIO, and added a second v8 GPU section, since I had a mix of 2 different kinds of tasks (SoG and sah), both of these apps are labelled v8.22. the SoG app has plan class "opencl_nvidia_SoG" and the sah app has plan class "opencl_nvidia_sah". So all I did to my existing app_info file was add another copy of the v8 GPU section, changed the plan classes for these 2 sections, and then changed the version numbers from these two sections. here is my app_info in its entirety. make whatever changes for whatever tasks types you have on your systems or what apps you want to run. if you don't understand the changes you need to make, then I wouldn't touch it and just wait for the project to fix itself. <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>822</version_num> <plan_class>opencl_nvidia_SoG</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>822</version_num> <plan_class>opencl_nvidia_sah</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_x41p_V0.99b1p3_x86_64-pc-linux-gnu_cuda102</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name> <executable/> </file_info> <file_info> <name>AstroPulse_Kernels_r2751.cl</name> </file_info> <file_info> <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name> </file_info> <app_version> <app_name>astropulse_v7</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>708</version_num> <plan_class>opencl_nvidia_100</plan_class> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name> <main_program/> </file_ref> <file_ref> <file_name>AstroPulse_Kernels_r2751.cl</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>800</version_num> <file_ref> <file_name>MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>ap_7.05r2728_sse3_linux64</name> <executable/> </file_info> <app_version> <app_name>astropulse_v7</app_name> <version_num>704</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class></plan_class> <file_ref> <file_name>ap_7.05r2728_sse3_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230 |
The scheduler won't send work for anonymous platform, so you can't use the app_info.xml, therefore no way tweaking the app_info.xml would resolve this issue on the client side. I have a workaround for those who want to use the special app, while avoiding anonymous platform: (this workaround will work only on single GPU hosts) 1. rename your app_info.xml to something else (for example: app_info_.xml) 2. exit BOINC manager with closing the science apps 3. restart BOINC manager 4. let the BOINC manager download some work preferably from all types (CUDA6.0, or opencl_sah, opencl_SoG) 5. exit BOINC manager with closing the science apps 6. edit cc_config.xml, add the following line to the <options> section: <dont_check_file_sizes>1</dont_check_file_sizes>save the changes 7. copy your favorite special app over the original executables (it is good to make a backup copy of the original ones adding an _ to their filename) 8. restart BOINC manager 9. enjoy the fastest ever "opencl" app I can't give you a link to my recent workunits, somehow they didn't appear in the task list... ps.: if you use the -nobs parameter in the command line, you should set the avg_ncpus and the max_ncpus to 1 (instead of 0.1) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3797 Credit: 1,114,826,392 RAC: 3,319 |
|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes, you can do that with a carefully composed app_info. The application doesn't matter, but the task description - platform, version number, plan_class - does. . . So as long as you leave your app_info.xml at the description/platform/version/plan_class of the stock apps the servers will not call you anonymous platform even if the special app is actually doing the grunt work? Very interesting! . . BTW, while I am still getting the response "no tasks available" I can now report completed tasks without needing to set NTT first. So something seems to have been partly fixed. (crossing fingers) Stephen 8^} |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It might actually be a good plan for those of us running Anonymous to set No New Tasks until this gets figured out, in order to quit hammering the servers for those running stock who could actually get some work. . . Or you could do what I have done with the machines with no work, turn them off and save on power bills. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes, that's as far as I've got - fetch work as stock, when you've got enough, drop in an app_info.xml file and process them faster. . . I think you missed the key element ... you needed to edit your app_info.xml to include the descriptors and version/plan_number information for the stock apps on your machine but with the app name changed to the special app before placing it back into the SETI folder. Your previous app_info.xml would probably not have sections to cover those tasks so it does not know the app to use to process them and dumps them. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
This is interesting: . . Hi Juan, . . I think the percentage might be higher than that. I think the average enhanced host is doing something like 10 to 20 times the work that the average "run it the background" hosts are doing and the really big guns are doing many times that again. With that kind of ratio it might be more like 1/3. But I might be a little optimistic there, or is cynical? Stephen <shrug> |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.