Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 84 · 85 · 86 · 87 · 88 · 89 · 90 . . . 107 · Next
Author | Message |
---|---|
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
This is new: Never see an upper limit of what a host could do by day. Sun 19 Apr 2020 08:23:03 PM EST | SETI@home | Sending scheduler request: To report completed tasks. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13824 Credit: 208,696,464 RAC: 304 |
This is new: Never see an upper limit of what a host could do by day.Probably the first time anyone has been able to actually hit it. 67344 is a rather odd sort of value. And i finally managed to pick up another 179 to crunch. Will keep the GPUs occupied for a while. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
This is new: Never see an upper limit of what a host could do by day.Probably the first time anyone has been able to actually hit it. A new record breaked. Will drink to that now! LOL |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13824 Credit: 208,696,464 RAC: 304 |
Been picking up work with each request for a couple of hours now. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13824 Credit: 208,696,464 RAC: 304 |
Been picking up work with each request for a couple of hours now.And another batch- i've been hitting the Serverside limits for a while now. I'm wondering if the script is broken, heaps of sustained work coming through. Not just little blips here & there. Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Been picking up work with each request for a couple of hours now. Me to on both CPU and GPU |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Instead of that on the other hand the hosts who will have bigger problems must be the slow ones who uses the standard 150 WU cache with the 10+10 days configuration . A large quantity of them are leaving without any user interaction (just set & go), this hosts could be have serious problem with this reduction.The main source of the problems here is that boinc client really doesn't seem to care about the deadlines at all when it requests work. If you have 10+10 day configuraton and the scheduler suddenly starts giving you work with 14 day deadline, it will happily buffer 20 days of those 14 day jobs. Such hosts would automatically abort some of the tasks because they missed their deadline before starting and return all the rest after the deadline because their deadline passed during the crunching. This means that all their tasks generate resends,, so the deadline reduction would have completely opposite effect to what was intended. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has finished a daily quota of 67344 tasksThis is obviously some kind of server side glitch. The biggest batches of tasks a single scheduler request can get seem to be somewhere around 330 tasks. If we assume the scheduler buffer size to be 350, then the maximum a host can get in a day with 1818 second request cooldown is less than 17000 tasks. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14667 Credit: 200,643,578 RAC: 874 |
Have you tested that on a project which already has 7 or 14 day deadlines? You can try it at Einstein, which allows you to see the server log of the work request. The server makes a parallel assessment of how long the work being allocated is going to take to run, based on the speed/usage data reported by your client. That's how it knows how many tasks to send - assuming you're requesting a timed amount of work, rather than asking for the universe and relying on the server limits to keep you sane. I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14667 Credit: 200,643,578 RAC: 874 |
This is obviously some kind of server side glitch.You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline. Yep, had that many years ago because of way too high DCF (caused by non BOINC applications), answer from the server was something like "No tasks send, won't finish before deadline". |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs? |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs?It is sending them. Not doing that would make the scheduler insta-expire all my tasks as ghosts. I would rather not send them because listing all those tasks as verbose xml makes the scheduler request huge and slow and more likely to fail when the server is heavily loaded :( |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14667 Credit: 200,643,578 RAC: 874 |
My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster.I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented. The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented.I used it as an example of what can happen with long cache day setting. There's no reason for this problem to be specific to my host and there certainly are lot of hosts running on 'autopilot' with long caches. The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more.At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably. |
Scrooge McDuck Send message Joined: 26 Nov 99 Posts: 1024 Credit: 1,674,173 RAC: 54 |
At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably. You are right and (some) exceptions prove that rule! https://setiathome.berkeley.edu/show_host_detail.php?hostid=8629071 Host ist active - at least boinc is contacting the servers regularly. All tasks The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on. example: https://setiathome.berkeley.edu/workunit.php?wuid=3869091667 Initially in February, there was only one task replicated? Strange. I haven't seen such behaviour (2 hour deadlines) before. [edited: purging behaviour] |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on.It wasn't a 2 hour deadline. It was the scheduler deciding to zap a ghost task. The server has automatic ghost recovery on but the ghost recovery really only works for some hosts. For the others the scheduler won't resend the ghost but sets its deadline to the current time instead making it expire instantly. Also this zapped result was part of the initial replication. You can see it from it having task id only one bigger than the other initial result. Also its task name ends in '_1'. It has received a new 'Sent' date in a ghost recovery that was succesful from the server's point of view but it ended up as ghost again and a new ghost recovery two hours later decided to kill it instead. It's a Windows host so I guess a trigger happy antivirus blocked the download so the task got ghosted again and again. |
Scrooge McDuck Send message Joined: 26 Nov 99 Posts: 1024 Credit: 1,674,173 RAC: 54 |
Ville Saari, thanks for the detailed explanation! |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
This is obviously some kind of server side glitch.You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow. Yes i run with the controversial client but the host was not receiving such amount of new (resends) WU. Not have the numbers since i not follow but it must not received 100 WU yesterday. During this past night it received about 700 Resends due the last flow gates season (by the posts i thing all hosts received a large amount too). Well far away to the 68191 day limit! So what triggers this server glitch is still a mystery for me. But as Grant wisely posted, it's an unknown territory, nobody never reach such high numbers of cache and maintain it for so long. BTW Those resends are crunching now but as they was so many will take a couple of hours to finish or until i decided to turn on the rests of the GPU's and throttle them back to run with high clock. The host is running now at 1/4 of impulse. Will look if necessary later after the welcomed early coffee & aspirins . |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.