The Server Issues / Outages Thread - Panic Mode On! (119)

Author	Message
juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2045929 - Posted: 20 Apr 2020, 3:02:36 UTC This is new: Never see an upper limit of what a host could do by day. Sun 19 Apr 2020 08:23:03 PM EST \| SETI@home \| Sending scheduler request: To report completed tasks. Sun 19 Apr 2020 08:23:03 PM EST \| SETI@home \| Reporting 66 completed tasks Sun 19 Apr 2020 08:23:03 PM EST \| SETI@home \| Requesting new tasks for CPU and NVIDIA GPU Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| Scheduler request completed: got 0 new tasks Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| No tasks sent Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| No tasks are available for SETI@home v8 Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| Tasks for Intel GPU are available, but your preferences are set to not accept them Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| This computer has finished a daily quota of 67344 tasks Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| This computer has reached a limit on tasks in progress Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| Project requested delay of 1818 seconds ID: 2045929 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304	Message 2045930 - Posted: 20 Apr 2020, 3:20:59 UTC - in response to Message 2045929. Last modified: 20 Apr 2020, 3:21:57 UTC This is new: Never see an upper limit of what a host could do by day. Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| This computer has finished a daily quota of 67344 tasks Probably the first time anyone has been able to actually hit it. 67344 is a rather odd sort of value. And i finally managed to pick up another 179 to crunch. Will keep the GPUs occupied for a while. Grant Darwin NT ID: 2045930 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2045935 - Posted: 20 Apr 2020, 3:40:06 UTC - in response to Message 2045930. This is new: Never see an upper limit of what a host could do by day. Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| This computer has finished a daily quota of 67344 tasks Probably the first time anyone has been able to actually hit it. 67344 is a rather odd sort of value. A new record breaked. Will drink to that now! LOL ID: 2045935 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304	Message 2045945 - Posted: 20 Apr 2020, 5:52:36 UTC Been picking up work with each request for a couple of hours now. Grant Darwin NT ID: 2045945 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304	Message 2045948 - Posted: 20 Apr 2020, 6:05:50 UTC - in response to Message 2045945. Last modified: 20 Apr 2020, 6:06:45 UTC Been picking up work with each request for a couple of hours now. And another batch- i've been hitting the Serverside limits for a while now. I'm wondering if the script is broken, heaps of sustained work coming through. Not just little blips here & there. Grant Darwin NT ID: 2045948 · Reply Quote

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 2045951 - Posted: 20 Apr 2020, 6:54:42 UTC - in response to Message 2045945. Been picking up work with each request for a couple of hours now. Me to on both CPU and GPU ID: 2045951 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045952 - Posted: 20 Apr 2020, 7:06:09 UTC - in response to Message 2045907. Instead of that on the other hand the hosts who will have bigger problems must be the slow ones who uses the standard 150 WU cache with the 10+10 days configuration . A large quantity of them are leaving without any user interaction (just set & go), this hosts could be have serious problem with this reduction. The main source of the problems here is that boinc client really doesn't seem to care about the deadlines at all when it requests work. If you have 10+10 day configuraton and the scheduler suddenly starts giving you work with 14 day deadline, it will happily buffer 20 days of those 14 day jobs. Such hosts would automatically abort some of the tasks because they missed their deadline before starting and return all the rest after the deadline because their deadline passed during the crunching. This means that all their tasks generate resends,, so the deadline reduction would have completely opposite effect to what was intended. ID: 2045952 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045953 - Posted: 20 Apr 2020, 7:12:47 UTC - in response to Message 2045929. Sun 19 Apr 2020 08:23:06 PM EST \| SETI@home \| This computer has finished a daily quota of 67344 tasks This is obviously some kind of server side glitch. The biggest batches of tasks a single scheduler request can get seem to be somewhere around 330 tasks. If we assume the scheduler buffer size to be 350, then the maximum a host can get in a day with 1818 second request cooldown is less than 17000 tasks. ID: 2045953 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 2045954 - Posted: 20 Apr 2020, 7:16:20 UTC - in response to Message 2045952. Have you tested that on a project which already has 7 or 14 day deadlines? You can try it at Einstein, which allows you to see the server log of the work request. The server makes a parallel assessment of how long the work being allocated is going to take to run, based on the speed/usage data reported by your client. That's how it knows how many tasks to send - assuming you're requesting a timed amount of work, rather than asking for the universe and relying on the server limits to keep you sane. I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline. ID: 2045954 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 2045956 - Posted: 20 Apr 2020, 7:22:17 UTC - in response to Message 2045953. This is obviously some kind of server side glitch. You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow. ID: 2045956 · Reply Quote

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 2045957 - Posted: 20 Apr 2020, 7:28:16 UTC - in response to Message 2045954. I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline. Yep, had that many years ago because of way too high DCF (caused by non BOINC applications), answer from the server was something like "No tasks send, won't finish before deadline". ID: 2045957 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045966 - Posted: 20 Apr 2020, 8:46:53 UTC My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster. ID: 2045966 · Reply Quote

Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0	Message 2045968 - Posted: 20 Apr 2020, 8:56:12 UTC - in response to Message 2045966. Last modified: 20 Apr 2020, 9:01:00 UTC Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs? ID: 2045968 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045973 - Posted: 20 Apr 2020, 9:25:54 UTC - in response to Message 2045968. Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs? It is sending them. Not doing that would make the scheduler insta-expire all my tasks as ghosts. I would rather not send them because listing all those tasks as verbose xml makes the scheduler request huge and slow and more likely to fail when the server is heavily loaded :( ID: 2045973 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 2045976 - Posted: 20 Apr 2020, 10:01:20 UTC - in response to Message 2045966. My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster. I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented. The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more. ID: 2045976 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045979 - Posted: 20 Apr 2020, 10:23:33 UTC - in response to Message 2045976. I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented. I used it as an example of what can happen with long cache day setting. There's no reason for this problem to be specific to my host and there certainly are lot of hosts running on 'autopilot' with long caches. The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more. At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably. ID: 2045979 · Reply Quote

Scrooge McDuck Send message Joined: 26 Nov 99 Posts: 722 Credit: 1,674,173 RAC: 54	Message 2045986 - Posted: 20 Apr 2020, 11:02:19 UTC - in response to Message 2045979. Last modified: 20 Apr 2020, 11:18:40 UTC At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably. You are right and (some) exceptions prove that rule! https://setiathome.berkeley.edu/show_host_detail.php?hostid=8629071 Host ist active - at least boinc is contacting the servers regularly. All tasks ~~(or most - can't tell, since they are purged immediately)~~ are timing out, huge cache, five years old low cost CPU, no GPU. And it got lots of resends today. The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on. example: https://setiathome.berkeley.edu/workunit.php?wuid=3869091667 Initially in February, there was only one task replicated? Strange. I haven't seen such behaviour (2 hour deadlines) before. [edited: purging behaviour] ID: 2045986 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2045988 - Posted: 20 Apr 2020, 11:17:20 UTC - in response to Message 2045986. The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on. example: https://setiathome.berkeley.edu/workunit.php?wuid=3869091667 Initially in February, there was only one task replicated? Strange. I haven't seen such behaviour (2 hour deadlines) before. It wasn't a 2 hour deadline. It was the scheduler deciding to zap a ghost task. The server has automatic ghost recovery on but the ghost recovery really only works for some hosts. For the others the scheduler won't resend the ghost but sets its deadline to the current time instead making it expire instantly. Also this zapped result was part of the initial replication. You can see it from it having task id only one bigger than the other initial result. Also its task name ends in '_1'. It has received a new 'Sent' date in a ghost recovery that was succesful from the server's point of view but it ended up as ghost again and a new ghost recovery two hours later decided to kill it instead. It's a Windows host so I guess a trigger happy antivirus blocked the download so the task got ghosted again and again. ID: 2045988 · Reply Quote

Scrooge McDuck Send message Joined: 26 Nov 99 Posts: 722 Credit: 1,674,173 RAC: 54	Message 2045989 - Posted: 20 Apr 2020, 11:24:54 UTC - in response to Message 2045988. Ville Saari, thanks for the detailed explanation! ID: 2045989 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2045995 - Posted: 20 Apr 2020, 12:23:16 UTC - in response to Message 2045956. Last modified: 20 Apr 2020, 12:57:35 UTC This is obviously some kind of server side glitch. You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow. Yes i run with the controversial client but the host was not receiving such amount of new (resends) WU. Not have the numbers since i not follow but it must not received 100 WU yesterday. During this past night it received about 700 Resends due the last flow gates season (by the posts i thing all hosts received a large amount too). Well far away to the 68191 day limit! So what triggers this server glitch is still a mystery for me. But as Grant wisely posted, it's an unknown territory, nobody never reach such high numbers of cache and maintain it for so long. BTW Those resends are crunching now but as they was so many will take a couple of hours to finish or until i decided to turn on the rests of the GPU's and throttle them back to run with high clock. The host is running now at 1/4 of impulse. Will look if necessary later after the welcomed early coffee & aspirins . ID: 2045995 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.