Message boards :
Number crunching :
Panic Mode On (107) Server Problems?
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 29 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the comments, Z. Not good to hear when the WOW contest starts in 10 hours. I have had this issue on the Windows machines last week for a while but they have calmed down over the weekend and are full up. Just having the problem currently on the Linux box. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Z, try my full exit of BOINC and a restart after your normal 5 minute timeout interval and see if that nets you work. I'd like to see if my temporary solution works for others or is specific to my machine. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, now the linux box thinks it needs to be in 10 minute Nvidia backoff intervals for some reason. Still down 140 gpu tasks from normal gpu cache levels. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I have the same problem with my Mac. Quite some time ago I posted the Log clearly showing the Client was doing it's job by calculating the Work Needed and making a Work Request to the Server every 5 minutes. You can see the amount requested increase with each request, and each request being simply Ignored by the Server. Not much you can do when your machine is working properly and being Ignored. Nothing has changed since this post months ago; Message 1851867 - Posted: 27 Feb 2017, 17:00:00 UTC Mon Feb 27 11:33:02 2017 | SETI@home | Sending scheduler request: To fetch work.Clearly the Requests are being received and even token tasks are being sent on occasion. The only thing I can suggest is to have, <log_flags> <sched_op_debug>1</sched_op_debug> </log_flags> set in cc_config.xml and post the log showing your machine making Work Requests and being Ignored, and then ask Why that machine isn't being sent enough Work to keep it running. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Thanks for the comments, Z. Not good to hear when the WOW contest starts in 10 hours. I have had this issue on the Windows machines last week for a while but they have calmed down over the weekend and are full up. Just having the problem currently on the Linux box.The work buffers on my 3 Linux boxes have been fluctuating this morning, but none have gotten below about 60% full. A couple that had dropped a good bit just recently got transfusions of about 80-100 tasks and the 3 boxes currently stand at 92%, 91%, and 94% full. (They normally stay right at 100% when there aren't a lot of Arecibo VLARs in the feeder.) No special intervention has been required here. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the post TBar. Yes, the logs always show a gpu work shortfall but the servers are ignoring the deficit and not sending work when it is available. And I know that there isn't any competition from other projects on that machine since it is solely a SETI@Home cruncher with 100% resource share. On another idea, can someone post their minimal cc_config.xml for a machine running the special app please. I wonder if I have too many extraneous flags in it that might be confusing the servers. I had copied over my Windows cc_config and removed any flag that was complained about at BOINC startup. But I wonder if there are still some settings that don't need to be there for a linux box. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the comments, Z. Not good to hear when the WOW contest starts in 10 hours. I have had this issue on the Windows machines last week for a while but they have calmed down over the weekend and are full up. Just having the problem currently on the Linux box.The work buffers on my 3 Linux boxes have been fluctuating this morning, but none have gotten below about 60% full. A couple that had dropped a good bit just recently got transfusions of about 80-100 tasks and the 3 boxes currently stand at 92%, 91%, and 94% full. (They normally stay right at 100% when there aren't a lot of Arecibo VLARs in the feeder.) No special intervention has been required here. If my boxes would refill at that level of tasks pulled down, I wouldn't worry too much about it. But at maximum of 20 tasks received at any task request, the special app machine processes work faster than it can be pulled down and my cache slowly falls to zero. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This is what I use on my machines; <cc_config> <log_flags> <sched_op_debug>1</sched_op_debug> </log_flags> <options> <save_stats_days>365</save_stats_days> <dont_contact_ref_site>1</dont_contact_ref_site> <use_all_gpus>1</use_all_gpus> <max_file_xfers_per_project>8</max_file_xfers_per_project> <no_priority_change>1</no_priority_change> <skip_cpu_benchmarks>1</skip_cpu_benchmarks> </options> </cc_config> All my machines are SETI Only, No other Projects interfering with Work Requests. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
On another idea, can someone post their minimal cc_config.xml for a machine running the special app please.Here's what I've got on my #1 cruncher: <cc_config> <log_flags> <cpu_sched>1</cpu_sched> </log_flags> <options> <use_all_gpus>1</use_all_gpus> <no_priority_change>1</no_priority_change> </options> </cc_config> EDIT: ...and #2 has: <cc_config> <log_flags> <cpu_sched>1</cpu_sched> </log_flags> <options> <max_event_log_lines>10000</max_event_log_lines> <use_all_gpus>1</use_all_gpus> </options> </cc_config> EDIT2: ...and to round it off, #3 has: <cc_config> <log_flags> <cpu_sched>1</cpu_sched> </log_flags> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> All in all, pretty basic stuff only. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks TBar and Jeff. Need to do some editing. What does the <dont_contact_ref_site>1</dont_contact_ref_site> do? Is that the connection check to Google for the network status debug flag? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Keith, I've tried the method you told me about for ghost work units... Seems to work for this as well Z |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hi Z, I would think so, yeah, basically the same thing forcing the BOINC servers to look at the BOINC Manager client as a new startup. Seems to refresh the status or something. I wonder what gets old and stale making the servers think you don't need work when your shortfall is really there. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I've simplified my cc_config files on all machines down to bare minimum. There was a lot of fluff in there. Basically every option that can be set was in the file and even though they mostly were all set to 0, I figure if the client doesn't need to read them, all the better. Will be interesting to watch and see if that makes any kind of dramatic difference in my work request issues. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Just took a quick look at my 3 big boxes again and the work buffers are now at 100%, 91%, and 100% of capacity (for the moment, anyway). |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
What does the <dont_contact_ref_site>1</dont_contact_ref_site> do? Is that the connection check to Google for the network status debug flag? To determine if a physical network connection exists, the client occasionally contacts a highly-available web site (google.com). If this flag is set, this behavior is suppressed. This flag also suppresses a periodic fetch of a project list from boinc.berkeley.edu. http://boinc.berkeley.edu/wiki/Client_configuration#Options |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I've simplified my cc_config files on all machines down to bare minimum. There was a lot of fluff in there. Basically every option that can be set was in the file and even though they mostly were all set to 0, I figure if the client doesn't need to read them, all the better. Will be interesting to watch and see if that makes any kind of dramatic difference in my work request issues. My cc_config is set in a similar way. If I add an option for testing I leave it in and set to 0. I figure it is less work if I want to use it later. Currently my standard cc_config, that I copy when setting up a new host, is 82 lines line. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I've been having issues getting work on the special app machine. Toggling preferences in apps or using TBars request method is not working. Funny part is that the Windows machines have had no issues maintaining their cache levels. I am down about 200 gpu tasks on the Linux machine. The only thing that I have found to work in getting my normal 20 task download when the cache level is low is exiting BOINC and not restarting until the previous 5 minute countdown has expired. Then the machine will get maybe 5 cycles of download requests netting me around 100 tasks and then it stalls out again with no work is available. The gpu cache level will run down to 0 if I don't intervene with my manual stopping and restarting of BOINC. . . Hi Keith, . . My rig with the 970s is doing the same. It will be OK for a while then stop getting tasks and just the "project has no tasks available" message. But so far I am finding the "kick in the pants" approach (premature requests for work after a failed attempt) is still working. Generally the lower the cache gets the more likely it will be given some work but about six or seven hours ago it was down to about 20 tasks instead of 200, but it came good after giving it the kick. Stephen :( |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
This has me concerned because all the machines are going to be left unattended from Friday till next Tuesday this week when I leave for Idaho and the solar eclipse. And this is during the WOW contest to make matters worse.Keith, I'm thinking that the c2g.sh frontend script I made and gave you could help you there. Even if you don't move files, you could edit it to stop/start BOINC after being shut down for 5.5 minutes. Do that in the loop every 3-4 hours, and just leave it run unattended. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just came in from 3 hours edging and mowing and the boxes are all up. So don't know if the cc_config edit had any effect or setting sched_ops_debug. We shall see. The config file was the original one that the BOINC installation wrote with EVERY possible parameter listed. I found the answer to my question about that parameter by refreshing my memory on the Client Configuration wiki. That is a nice idea to try about automating a BOINC exit. I have been thinking I needed to set up some sort of remote desktop for all the boxes so I can monitor them when I'm away this weekend. I don't have any experience in that matter though with just a recent experience with Team Viewer on my daily driver so the SMA tech support could look at my solar monitoring equipment and make changes while I was watching. I don't know if that is a viable solution or not. Just rebooting Boinc Manager every few hours might be a easier solution. I guess I have this week to see if the changes I've made have removed the problem and/or I need to take more drastic measures. Thanks for the ideas and comments everyone. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Keith, a bunch of us use TeamViewer to keep an eye on our headless machines. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.