Message boards :
Number crunching :
Panic Mode On (104) Server Problems?
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 42 · Next
Author | Message |
---|---|
Wiggo Send message Joined: 24 Jan 00 Posts: 36560 Credit: 261,360,520 RAC: 489 |
After the outrage, we had a 3hr power here outage due to storms, but everything is now fully loaded. Cheers. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36560 Credit: 261,360,520 RAC: 489 |
Well 16 of mine have now validated. Cheers. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
What i'm still trying to figure out is why I have to change the Application settings once or twice a day to be able to keep getting work. Eventually, even after the post outage congestion, the usual response from the Scheduler these days is "Project has no tasks available." Change the Application settings, then it has work available. At least for 12 or more hours. Then I get to do it all over again. Rough gist is that when hard limits are set, such as project backoffs and queue sizes, statistically there will be people that fit into the expected area, ones that sometimes fit into the expected area while other times not, then still more that always fall into the always breaks regime. My guess is if the backoffs &/or request intervals were somewhat randomised, then it would allow fairer work distribution. Sadly actual statistics and control systems theory doesn't seem to be on the Agenda for Boinc anytime soon. So you'll need to either continue babysitting, or change something such that you fall into a different 'chance' bucket. [Edit:] Either manually, or Some script to force update a random interval after project backoff expiration might work. The client's aggressive request on backoff expiration won't. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Rough gist is that when hard limits are set, such as project backoffs and queue sizes, statistically there will be people that fit into the expected area, ones that sometimes fit into the expected area while other times not, then still more that always fall into the always breaks regime. My guess is if the backoffs &/or request intervals were somewhat randomised, then it would allow fairer work distribution. Sadly actual statistics and control systems theory doesn't seem to be on the Agenda for Boinc anytime soon. So you'll need to either continue babysitting, or change something such that you fall into a different 'chance' bucket. Problem is it has nothing to do with the backoffs. Every 5 minutes the Manager will ask for work, and for some reasons after a certain period of time it's necessary to change the project's application settings to keep it coming. Even though I don't have an AP application to process AP work, I have to set that option to Yes to get work (and the "If no work available for selected application, accept work from other applications). Then later on I have to set it to No, then Yes, then No and so on. Been this way for a few weeks now. I suspect it's related to the issue that was reported late Dec for people that had selected to do AP work, and "If no work avail, do other work" with v8 left unselected. At the end of Dec they had to specifically enable v8 work in order to get any, where as before it wasn't necessary. (Of course running out of work during the weekly outages does result in ridiculously long backoff times if you're not around to hit retry). Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well the 5 minutes is across the board, so constant for all of us. Likely bugs aside (a big if), there are still periods of full or empty feeder queue which you may fall into. It's a task to determine whether there is a bug in the work issue, or you are simply falling into the empty feeder bucket for some reason (could be anything such as latency you described) Perhaps an answer is to increase the backoff such than more users can get in, maybe not. Either way, there will always be some proportion of users that can never get work. What I'm proposing is, that if fixed time intervals are used, then nomatter what, some proportion of hosts will end up in a state of never being able to get work. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Likely bugs aside (a big if), there are still periods of full or empty feeder queue which you may fall into. It's a task to determine whether there is a bug in the work issue, or you are simply falling into the empty feeder bucket for some reason (could be anything such as latency you described) Possible, but unlikely IMHO. Work returned per hour is less than at times in the past, yet this issue persists. Also, even after the extended outages with work returned right up there & the demand for new work way up there, if the first few requests for work result in none, changing the application settings results in getting work- up to the point the cache is full. If it starts getting work after the outages, it continues to fill up normally. In both cases it will generally get work with each request, just the odd one or 2 it might miss out on. However If the project does ever get the number of crunchers they're hoping for to crunch all the new data, I am expecting significant issues getting work with things as they stand. A 5min +-30 seconds might help with the Scheduler and feeder loads, but as the number of active hosts increases, wouldn't the sheer number of hosts result in an (effectively) random load due to each system's randomness with the initial Scheduler request after BOINC starting? Grant Darwin NT |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes 'unlikely'. That may possibly account for your ability to change settings, last for 12 hours, then fall back into a hole. I'll be very interested to see if you can break out of that rut without code change on client or server. Being Australia day, I'd like to point out that falling into such holes is pretty much the Australian way. Usually the result of doing so is fairly disruptive. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
A 5min +-30 seconds might help with the Scheduler and feeder loads, but as the number of active hosts increases, wouldn't the sheer number of hosts result in an (effectively) random load due to each system's randomness with the initial Scheduler request after BOINC starting? Most likely the backoff needs to be proportional to the rate of requests, since the feed rate is likely more or less constant. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Grant, Maybe try adding the AP application, maybe that is the one thing that makes your requests different than others. There are a few getting split right now, but normally you won't get any(many) anyways. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
Grant, Maybe try adding the AP application, maybe that is the one thing that makes your requests different than others. True, but really I've no desire to re-do the setup again. I'm happy to run v8 only & crunch all the guppies others don't want. It would be nice if the issue we just fixed. I'll just continue with my manual work around for now. Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Just adding an idea here.. but the feeder can only re-fill so fast/frequently/often, so what if instead of letting someone get lucky and get all 200 tasks that it has in one request.. the feeder gets limited to assigning 20 or 50 tasks at a time? I know there will be more groaning and griping about that, too, but if the feeder is more likely to have tasks in it, then there should--theoretically--be less people who get absolutely nothing because it is empty. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Just adding an idea here.. but the feeder can only re-fill so fast/frequently/often, so what if instead of letting someone get lucky and get all 200 tasks that it has in one request.. the feeder gets limited to assigning 20 or 50 tasks at a time? I know there will be more groaning and griping about that, too, but if the feeder is more likely to have tasks in it, then there should--theoretically--be less people who get absolutely nothing because it is empty. +1 Makes a lot of sense. Would also help for new folks who take a load and are never heard from again, in terms of wingmen waiting for timouts. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
+1 Seems to be the old norm for me anyway, at least WAS back last year before this new problem cropped up. I never got more than 40 or so tasks in the first downloads after the outage anyway. I've NEVER received the full 100 task buffer output for my download request. I always sort of thought that was the servers doing, or it was just my luck of the draw always pulling 40~ tasks out of the 100 task buffer. It never took more than 4 or 5 downloads to get back to my quota after the outage and my work request got answered in the queue deluge. I gather from the thread comments that the servers DON'T actually have this mechanism in play. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Just adding an idea here.. but the feeder can only re-fill so fast/frequently/often, so what if instead of letting someone get lucky and get all 200 tasks that it has in one request.. the feeder gets limited to assigning 20 or 50 tasks at a time? I know there will be more groaning and griping about that, too, but if the feeder is more likely to have tasks in it, then there should--theoretically--be less people who get absolutely nothing because it is empty. Something similar was done with AP work requests a few years ago. At least it seemed that way once no more than ~7 tasks were assigned per request. If hitting an empty feeder is the only issue it seems like more users would be seeing their queues dropping & that toggling settings wouldn't fix the users getting no work. At another project there was an issue with the feeder running dry. So the admin adjusted some settings for it. I think they said they increased the number the feeder held at once, but they may have increased how often it was filled. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Just adding an idea here.. but the feeder can only re-fill so fast/frequently/often, so what if instead of letting someone get lucky and get all 200 tasks that it has in one request.. the feeder gets limited to assigning 20 or 50 tasks at a time? I know there will be more groaning and griping about that, too, but if the feeder is more likely to have tasks in it, then there should--theoretically--be less people who get absolutely nothing because it is empty. . . That is exactly how it seems to me as well. But I can offer no idea of just where the issue lies. Except to say it is a recent development. Since they upgraded the OS on one or more of the servers in fact. Stephen . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
That is exactly how it seems to me as well. But I can offer no idea of just where the issue lies. Except to say it is a recent development. Since they upgraded the OS on one or more of the servers in fact. It wouldn't be the first time major changes resulted in configuration files going missing or being ignored in one way or another. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It seems the problem with downloading Work has surfaced again after being absent for a while. Just as before the machines with only ATI GPUs are Not affected. The problems are with the machines that just had GPUs swapped. One machine went from having 3 nVidia cards to 2 NV and 1 ATI. The other machine went from 2 NV and 1 ATI to 3 nVidia cards. Both machines have been having problems since the GPUs were swapped a couple days ago, up until then they were not having any problems since the last post a couple weeks ago. Suddenly, they are back to having problems. The one machine with the 3 NV GPUs is down over a hundred tasks, changing the preferences works for a few hours then the problem returns. Mon Jan 30 09:27:58 2017 | SETI@home | [sched_op] Starting scheduler request Mon 30 Jan 2017 09:33:05 AM EST | SETI@home | Requesting new tasks for NVIDIA :-( |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Change the preferences and the work returns...for a little while; Mon 30 Jan 2017 10:26:32 AM EST | SETI@home | Sending scheduler request: To fetch work. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Remind me again, what preferences are you changing to get effect. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Run only the selected applications AstroPulse v7: noIt doesn't matter how they are set. Just change them. Right now one machine is set to the above while the other machine is; Run only the selected applications AstroPulse v7: yes The next time I'll just swap the settings again. It's the act of changing them that matters. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.