Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 94 · Next
Author | Message |
---|---|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
from an Anonymous Platform host: Sun 22 Dec 2019 02:26:21 PM EST | SETI@home | [sched_op] Starting scheduler request Sun 22 Dec 2019 02:26:21 PM EST | SETI@home | Sending scheduler request: Requested by user. Sun 22 Dec 2019 02:26:21 PM EST | SETI@home | Requesting new tasks for NVIDIA GPU Sun 22 Dec 2019 02:26:21 PM EST | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices Sun 22 Dec 2019 02:26:21 PM EST | SETI@home | [sched_op] NVIDIA GPU work request: 2617920.00 seconds; 30.00 devices Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | Scheduler request completed: got 0 new tasks Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | [sched_op] Server version 715 Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | Project has no tasks available Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | Project requested delay of 303 seconds Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | [sched_op] Deferring communication for 00:05:03 Sun 22 Dec 2019 02:26:31 PM EST | SETI@home | [sched_op] Reason: requested by project the request completed quickly (10 seconds), but returned that no work was available. didn't timeout like it had been doing previously. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
The response time to a request is very slow. It used to be so fast that I couldn't read to keep up with the log, now it pauses for so long, that I wonder if it is still doing something. 20-30 seconds sounds about right.I think the slow response time is purely because this glitch has also turned 'resend lost tasks' back on, when we have a huge number of tasks in the database. I replicated our problem at LHC, and there I'm getting an 'internal server error' response in 1 second, and it's always 'internal server error' - never time out or no tasks available. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
the request completed quickly (10 seconds), but returned that no work was available. didn't timeout like it had been doing previously.Looking at your 30 GPU host (and we can see it quickly now - yay!), it has no tasks thought by the server to be in progress. That means the check for lost tasks can run very quickly, which will be a great help. Or perhaps Eric has managed to turn off 'resend lost tasks' since I last heard - that would help, too. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I know there are a lot trying to help and more capable to help than me with my old rusty C coding but... I not read the entire server code to be sure but i was unable to locate where is set any value to avid & appid when run anonymous? ) { if (avid < 0) { return appid*1000000 - avid; } return avid; } So the return of the if (!ready) return true; Will be always true. Who knows? |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
I'm running stock with a 1080ti and 2070 Super. Here's a job that just finished. Should it be running cuda60 or something better? 46+ minutes seems long even for stock. Application SETI@home v8 8.01 (cuda60) Name 20dc19aa.24041.21335.5.32.215.vlar State Ready to report Received Sun 22 Dec 2019 08:29:24 AM EST Report deadline Thu 13 Feb 2020 01:29:05 PM EST Resources 0.516 CPUs + 1 NVIDIA GPU Estimated computation size 183,887 GFLOPs CPU time 00:04:42 Elapsed time 00:46:49 Executable setiathome_8.01_x86_64-pc-linux-gnu__cuda60 Here is app_config and the error log is complaining about cuda90 (doesn't match any app versions). <app_config> <project_max_concurrent>12</project_max_concurrent> <app_version> <app_name>setiathome_v8</app_name> <plan_class>cuda90</plan_class> <avg_ncpus>1.0</avg_ncpus> <ngpus>1.0</ngpus> <cmdline>-nobs</cmdline> </app_version> </app_config> |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Since I am using the All-In-One, I don't even have a stock to revert to. I'd need to archive the BOINC folder, download/install the detested Repository version, reconnect to SETI, download/install all the setup/apps/work including some ancient and slow CUDA50 that takes 10x as long to finish if it doesn't crash, then when this is fixed (which with my luck will happen exactly when I have completed this) wait for the work to complete, uninstall it, unpack the All-In-One back and hope for the best... . . I only have 5 machines but I have been thinking much the same. Apart from this one running SoG under lunatics which I can change back to stock in less than half an hour from the time it runs out of work, on the others it would be much easier (and far less intimidating) to just change projects. . . I may be being a little over dramatic but this 'disastrous' change could be the death knell for SETI@home. It has the potential to drive most/all of the projects most productive volunteers to other projects or simply away. By forcing these people to shut down their machines for what is looking like an indefinite period, they will inevitably seek other hobbies or pasttimes which may then preclude them from returning. We really do need assurances that this is not a long term problem due to some administrative BOINC issue over which the SETI guys have no control. Stephen please! |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Your still using the anonymous app_config setup, here's the one I use for running stock apps <app_config> <app_version> <app_name>setiathome_v8</app_name> <plan_class>opencl_nvidia_sah</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64</cmdline> </app_version> <app_version> <app_name>setiathome_v8</app_name> <plan_class>opencl_nvidia_SoG</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64</cmdline> </app_version> <app_version> <app_name>setiathome_v8</app_name> <plan_class>cuda60</plan_class> <avg_ncpus>1</avg_ncpus> <ngpus>1</ngpus> <cmdline>-sbs 256 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64</cmdline> </app_version> </app_config> Yes cuda60 sucks, I don't know if one can force the server to send SoG or sah work. I think the server is supposed to learn which app is the best and only send the right one, my main host doesn't get cuda60 work anymore, my other host only get cuda60 work :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
The LHC project at CERN now has the responsibility for testing and releasing new server code. Their website says: <snip> . . But does that mean if requesting new work when there are NO tasks to be running you could expect to get some? Because I am not seeing that here on any machine... Stephen ? |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
Any oldtimers here remember that one Christmas season that seti was very down... was it a month?? two?? I'm sorry that there is a group of you that aren't getting WUs. Eric has responded, and even tried last night to track down the problem. I'm amazed they have kept the system up for so many of us. They are trying to fix things on a weekend and a holiday season, and that is way more than I would ask for (but I'm grateful). |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
You can, if the platform, version and plan_class strings in app_info match the values for the stock tasks you have received. That's how the Lunatics installer worked: all known platform, version and plan_class combinations were covered in the supplied app_info files. . . This raises a question for me Richard. How does S@H decide that a host is 'anonymous platform'? If we go back to stock and then edit app_info.xml to redirect all appropriate platforms to use the enhanced app won't that cause it to be classified as 'anonymous platform'? Stephen ?? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
My understanding is that the status pages are driven from the replica database . . But we were still seeing the status page while the replica database was offline ... Stephen ? ? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
If you have an app_info.xml file, and it's active - in place when BOINC was started - your reports to the server will say that you're running anonymous platform. That's the definition. If you didn't have an app_info.xml file active when you started BOINC, you're running stock. My comment was that if you prepare an app_info.xml file offline that matches the characteristics of the tasks you're running as stock, you can switch from stock to anonymous platform without losing work - I've done that today at LHC. I don't think you can ever switch back from anonymous platform to stock and keep existing cached work. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I'm running stock with a 1080ti and 2070 Super. Here's a job that just finished. Should it be running cuda60 or something better? 46+ minutes seems long even for stock. . . Cuda 60 is reallly sloooowww! . . I had this problem when I moved my test machine into Beta so I brought it back to main. It occurred to me later on that the Nvidia drivers I am using in Linux do NOT have OpenCL support so the servers will never send S0G tasks. If you check and confirm that you have Nvidia drivers WITH OpenCL support hopefully you should start to receive SoG WUs. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Any oldtimers here remember that one Christmas season that seti was very down... was it a month?? two?? . . That talk of being down for 2 months has me wanting to set my hair on fire ... . . And I am sure most/all of us appreciate the efforts made by the SETi HQ crew and especially Eric, but frustration is a very driving issue ... :( Stephen :( |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Very simple to switch from Anonymous platform to Stock even with the All-In-One. All you have to do is change the Names on the two files app_info.xml & app_config.xml to something as app_info1.xml & app_config1.xml, that will revert you to Stock. To change back to Anonymous platform rename the files to the original names app_info.xml & app_config.xml .It's not that simple in my experience. Or it is to get back to stock but if you want to be able to restore your anonymous setup later, then it is better to move or copy the anonymous apps out of the project folder. Boinc has a habit of deleting any file in the project folder it doesn't know what to do with. And sometimes even when it does! Hi Ville, I alleviated that by renaming my anonymous project folder with "anonymous_" attached to the front of the existing folder name before I restarted BOINC and reset the project. When this issue gets resolved, all I need to do is attach "stock_" to the front of the stock folder and remove the "anonymous_" from the other folder name, restart BOINC and reset the project. Of course others may say just delete the stock folder. Yeah, could do that, but if this happens again, this will save bandwidth on non-WU downloads. Maybe... I don't know, I won't know fer sure until I have to do it. ;) Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
hi Richard, . . Yep I am with you on the process for not trashing WUs when shifting from stock to 'anonymous platform' but I had the impression that the other party had hoped that we could continue get work issued by being 'stock' but get it redirected it to run with the special ap. Stephen . . |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Yes, you can do that with a carefully composed app_info. The application doesn't matter, but the task description - platform, version number, plan_class - does. |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
I'm sure there are many of you that are on top of this but I thought I would just show what I am getting when I try to update. Yes, I'm running Anonymous. 12/22/2019 5:32:36 PM | SETI@home | Requesting new tasks for CPU and Intel GPU 12/22/2019 5:33:01 PM | SETI@home | Scheduler request failed: HTTP internal server error Just thought the " HTTP internal server error" might be a clue to the problem. Allen |
Siran d'Vel'nahr Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238 |
Yes, you can do that with a carefully composed app_info. The application doesn't matter, but the task description - platform, version number, plan_class - does. Hi Richard, My stock project folder does not have the app_info.xml file. My anonymous project folder does. Is this file just used for the anonymous platform? What will happen if I place that file in the stock project folder and restart BOINC, say after doing a NNT first? Will I hose BOINC to where I have to reinstall? Or will I go back to running cuda90? Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.