The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 84 · 85 · 86 · 87 · 88 · 89 · 90 . . . 107 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045929 - Posted: 20 Apr 2020, 3:02:36 UTC

This is new: Never see an upper limit of what a host could do by day.

Sun 19 Apr 2020 08:23:03 PM EST | SETI@home | Sending scheduler request: To report completed tasks.
Sun 19 Apr 2020 08:23:03 PM EST | SETI@home | Reporting 66 completed tasks
Sun 19 Apr 2020 08:23:03 PM EST | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | Scheduler request completed: got 0 new tasks
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | No tasks sent
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | No tasks are available for SETI@home v8
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has finished a daily quota of 67344 tasks
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has reached a limit on tasks in progress
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | Project requested delay of 1818 seconds

ID: 2045929 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2045930 - Posted: 20 Apr 2020, 3:20:59 UTC - in response to Message 2045929.  
Last modified: 20 Apr 2020, 3:21:57 UTC

This is new: Never see an upper limit of what a host could do by day.
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has finished a daily quota of 67344 tasks
Probably the first time anyone has been able to actually hit it.
67344 is a rather odd sort of value.


And i finally managed to pick up another 179 to crunch. Will keep the GPUs occupied for a while.
Grant
Darwin NT
ID: 2045930 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045935 - Posted: 20 Apr 2020, 3:40:06 UTC - in response to Message 2045930.  

This is new: Never see an upper limit of what a host could do by day.
Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has finished a daily quota of 67344 tasks
Probably the first time anyone has been able to actually hit it.
67344 is a rather odd sort of value.

A new record breaked. Will drink to that now! LOL
ID: 2045935 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2045945 - Posted: 20 Apr 2020, 5:52:36 UTC

Been picking up work with each request for a couple of hours now.
Grant
Darwin NT
ID: 2045945 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13755
Credit: 208,696,464
RAC: 304
Australia
Message 2045948 - Posted: 20 Apr 2020, 6:05:50 UTC - in response to Message 2045945.  
Last modified: 20 Apr 2020, 6:06:45 UTC

Been picking up work with each request for a couple of hours now.
And another batch- i've been hitting the Serverside limits for a while now. I'm wondering if the script is broken, heaps of sustained work coming through. Not just little blips here & there.
Grant
Darwin NT
ID: 2045948 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2045951 - Posted: 20 Apr 2020, 6:54:42 UTC - in response to Message 2045945.  

Been picking up work with each request for a couple of hours now.

Me to on both CPU and GPU
ID: 2045951 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045952 - Posted: 20 Apr 2020, 7:06:09 UTC - in response to Message 2045907.  

Instead of that on the other hand the hosts who will have bigger problems must be the slow ones who uses the standard 150 WU cache with the 10+10 days configuration . A large quantity of them are leaving without any user interaction (just set & go), this hosts could be have serious problem with this reduction.
The main source of the problems here is that boinc client really doesn't seem to care about the deadlines at all when it requests work. If you have 10+10 day configuraton and the scheduler suddenly starts giving you work with 14 day deadline, it will happily buffer 20 days of those 14 day jobs. Such hosts would automatically abort some of the tasks because they missed their deadline before starting and return all the rest after the deadline because their deadline passed during the crunching. This means that all their tasks generate resends,, so the deadline reduction would have completely opposite effect to what was intended.
ID: 2045952 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045953 - Posted: 20 Apr 2020, 7:12:47 UTC - in response to Message 2045929.  

Sun 19 Apr 2020 08:23:06 PM EST | SETI@home | This computer has finished a daily quota of 67344 tasks
This is obviously some kind of server side glitch. The biggest batches of tasks a single scheduler request can get seem to be somewhere around 330 tasks. If we assume the scheduler buffer size to be 350, then the maximum a host can get in a day with 1818 second request cooldown is less than 17000 tasks.
ID: 2045953 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045954 - Posted: 20 Apr 2020, 7:16:20 UTC - in response to Message 2045952.  

Have you tested that on a project which already has 7 or 14 day deadlines? You can try it at Einstein, which allows you to see the server log of the work request. The server makes a parallel assessment of how long the work being allocated is going to take to run, based on the speed/usage data reported by your client. That's how it knows how many tasks to send - assuming you're requesting a timed amount of work, rather than asking for the universe and relying on the server limits to keep you sane.

I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline.
ID: 2045954 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045956 - Posted: 20 Apr 2020, 7:22:17 UTC - in response to Message 2045953.  

This is obviously some kind of server side glitch.
You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow.
ID: 2045956 · Report as offensive     Reply Quote
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 2045957 - Posted: 20 Apr 2020, 7:28:16 UTC - in response to Message 2045954.  

I've never pushed it to the limit - it would be an interesting experiment to try some day - but I believe the server is aware of its own limits and won't send work which it's impossible to complete before deadline.

Yep, had that many years ago because of way too high DCF (caused by non BOINC applications), answer from the server was something like "No tasks send, won't finish before deadline".
ID: 2045957 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045966 - Posted: 20 Apr 2020, 8:46:53 UTC

My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster.
ID: 2045966 · Report as offensive     Reply Quote
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 2045968 - Posted: 20 Apr 2020, 8:56:12 UTC - in response to Message 2045966.  
Last modified: 20 Apr 2020, 9:01:00 UTC

Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs?
ID: 2045968 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045973 - Posted: 20 Apr 2020, 9:25:54 UTC - in response to Message 2045968.  

Let me guess: your host isn't sending the list of the tasks it has? Or how can it get over 17k in progress tasks for 1 CPU + 2 GPUs?
It is sending them. Not doing that would make the scheduler insta-expire all my tasks as ghosts.

I would rather not send them because listing all those tasks as verbose xml makes the scheduler request huge and slow and more likely to fail when the server is heavily loaded :(
ID: 2045973 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045976 - Posted: 20 Apr 2020, 10:01:20 UTC - in response to Message 2045966.  

My slower host's very slow CPU has received more AP work than it can crunch before deadline in several occasions. Forcing me to crunch some of the tasks with the GPU to speed things up. Which has only worsened the problem by making the CPU appear faster.
I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented.

The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more.
ID: 2045976 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045979 - Posted: 20 Apr 2020, 10:23:33 UTC - in response to Message 2045976.  

I don't think we should decide project-wide policies like deadlines on the basis of your, very particular, special case. Your machines are clearly under firm personal control, and you will be able to adapt to any 'new normal' as and when it's implemented.
I used it as an example of what can happen with long cache day setting. There's no reason for this problem to be specific to my host and there certainly are lot of hosts running on 'autopilot' with long caches.

The project should be more concerned by 'grab and run' one-time users, who download a cache and are never heard from again - thus locking up those tasks for seven weeks or more.
At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably.
ID: 2045979 · Report as offensive     Reply Quote
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 722
Credit: 1,674,173
RAC: 54
Germany
Message 2045986 - Posted: 20 Apr 2020, 11:02:19 UTC - in response to Message 2045979.  
Last modified: 20 Apr 2020, 11:18:40 UTC

At this point those users are very unlikely to be an issue any more. The project is officially closed, so there is no constant stream of new users and people who use boinc to stress test their overclocking setups go to the projects that hand out new work reliably.

You are right and (some) exceptions prove that rule!
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8629071 Host ist active - at least boinc is contacting the servers regularly. All tasks (or most - can't tell, since they are purged immediately) are timing out, huge cache, five years old low cost CPU, no GPU. And it got lots of resends today.

The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on.
example: https://setiathome.berkeley.edu/workunit.php?wuid=3869091667
Initially in February, there was only one task replicated? Strange.

I haven't seen such behaviour (2 hour deadlines) before.
[edited: purging behaviour]
ID: 2045986 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045988 - Posted: 20 Apr 2020, 11:17:20 UTC - in response to Message 2045986.  

The scheduler seems to shorten the deadline of tasks assigned to such overloaded hosts to two hours 10 minutes? I don't know if the deadline was set initially or somehow shortened later on.
example: https://setiathome.berkeley.edu/workunit.php?wuid=3869091667
Initially in February, there was only one task replicated? Strange.

I haven't seen such behaviour (2 hour deadlines) before.
It wasn't a 2 hour deadline. It was the scheduler deciding to zap a ghost task. The server has automatic ghost recovery on but the ghost recovery really only works for some hosts. For the others the scheduler won't resend the ghost but sets its deadline to the current time instead making it expire instantly.

Also this zapped result was part of the initial replication. You can see it from it having task id only one bigger than the other initial result. Also its task name ends in '_1'. It has received a new 'Sent' date in a ghost recovery that was succesful from the server's point of view but it ended up as ghost again and a new ghost recovery two hours later decided to kill it instead. It's a Windows host so I guess a trigger happy antivirus blocked the download so the task got ghosted again and again.
ID: 2045988 · Report as offensive     Reply Quote
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 722
Credit: 1,674,173
RAC: 54
Germany
Message 2045989 - Posted: 20 Apr 2020, 11:24:54 UTC - in response to Message 2045988.  

Ville Saari, thanks for the detailed explanation!
ID: 2045989 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045995 - Posted: 20 Apr 2020, 12:23:16 UTC - in response to Message 2045956.  
Last modified: 20 Apr 2020, 12:57:35 UTC

This is obviously some kind of server side glitch.
You will be aware that the GPUUG has produced private clients which circumvent the server limits, and Juan is a primary exponent of this practice? His machine currently has a daily limit of 68191 GPU tasks per day: he'll be able to fetch more work now, because the day has rolled over to tomorrow.

Yes i run with the controversial client but the host was not receiving such amount of new (resends) WU. Not have the numbers since i not follow but it must not received 100 WU yesterday. During this past night it received about 700 Resends due the last flow gates season (by the posts i thing all hosts received a large amount too). Well far away to the 68191 day limit!
So what triggers this server glitch is still a mystery for me. But as Grant wisely posted, it's an unknown territory, nobody never reach such high numbers of cache and maintain it for so long.

BTW Those resends are crunching now but as they was so many will take a couple of hours to finish or until i decided to turn on the rests of the GPU's and throttle them back to run with high clock. The host is running now at 1/4 of impulse. Will look if necessary later after the welcomed early coffee & aspirins .
ID: 2045995 · Report as offensive     Reply Quote
Previous · 1 . . . 84 · 85 · 86 · 87 · 88 · 89 · 90 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.