Panic Mode On (110) Server Problems?

Author	Message
juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1926415 - Posted: 24 Mar 2018, 22:15:52 UTC Last modified: 24 Mar 2018, 22:17:13 UTC What is even more strange is if you look the WU itself: https://setiathome.berkeley.edu/workunit.php?wuid=2912007028 6510415901 8389310 24 Mar 2018, 11:13:54 UTC 6 May 2018, 13:55:14 UTC Em progresso --- --- --- SETI@home v8 v8.08 (alt) windows_x86_64 6510415902 8396902 24 Mar 2018, 11:13:59 UTC 24 Mar 2018, 11:14:18 UTC Tempo limite atingido - sem resposta 0.00 0.00 --- SETI@home v8 Plataforma anonima (NVIDIA GPU) They was created at the same time (as expected) but the task was send to the host 8389310 has a Time Limit of 6-May-18 while the one sended to my host has a time limit of 11:14:18 less than 20 secs after sended! Still don't have any clue why that happening and a way to avoid that in the future. It's a compleate waste of DL/UL resources, besides the computer time wasted to do that. ID: 1926415 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1926420 - Posted: 24 Mar 2018, 22:31:05 UTC Last modified: 24 Mar 2018, 22:32:34 UTC In my log it states at the time of download that server maybe down after a while states it gave up trying to download them! Roughly 4 hrs ago.. O_o ID: 1926420 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1926426 - Posted: 24 Mar 2018, 22:51:50 UTC Splitters are back in struggle mode. Grant Darwin NT ID: 1926426 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1926493 - Posted: 25 Mar 2018, 5:44:50 UTC - in response to Message 1926426. Starting to get pretty low in the RTS buffer. Splitters just aren't doing the work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1926493 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926514 - Posted: 25 Mar 2018, 8:48:20 UTC Scheduler requests are not working properly here this morning. If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off. I have no stuck transfers. I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help. What is going on ? Humans may rule the world...but bacteria run it... ID: 1926514 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1926515 - Posted: 25 Mar 2018, 8:56:21 UTC - in response to Message 1926514. Last modified: 25 Mar 2018, 8:58:11 UTC Scheduler requests are not working properly here this morning. If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off. I have no stuck transfers. I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help. What is going on ? Most likely the usual Application preference & Scheduler weirdness showing it's ugly face again. Have you tried a triple update? Edit- the Ready-to-send buffer is very low, but not (yet) empty. So you should be able to get work. Grant Darwin NT ID: 1926515 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926518 - Posted: 25 Mar 2018, 9:10:53 UTC - in response to Message 1926515. Scheduler requests are not working properly here this morning. If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off. I have no stuck transfers. I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help. What is going on ? Most likely the usual Application preference & Scheduler weirdness showing it's ugly face again. Have you tried a triple update? Edit- the Ready-to-send buffer is very low, but not (yet) empty. So you should be able to get work. Yes, I'm getting work, IF I do a manual update request. Haven't tried the triple update...how to ? Humans may rule the world...but bacteria run it... ID: 1926518 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1926519 - Posted: 25 Mar 2018, 9:22:28 UTC - in response to Message 1926518. Last modified: 25 Mar 2018, 9:24:06 UTC Haven't tried the triple update...how to ? OK. Hit Update. You should then get "Scheduler request pending; Requested by user" (or something along those lines) As soon as you get "Scheduler request in progress", hit update again. This time wait for the request to complete. Then update again. The next automatic Scheduler request should then result in work, and usually the following requests as well. No idea why it works, but if there is work available, and the Scheduler's just being funny about allocating it that seems to get it working again. I used to have to change work preference application settings to keep thw work coming, but Tbar came up with this triple update. If it doesn't get the work flowing again, there's some other issue. If work still isn't regularly forthcoming, we'll have to look further (and what you're describing does sound different to the usual issue of not getting work due to Scheduler weirdness). Grant Darwin NT ID: 1926519 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1926522 - Posted: 25 Mar 2018, 9:52:09 UTC - in response to Message 1926519. we'll have to look further The Event Log (with appropriate debug flags) is always a good place to start looking. ID: 1926522 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926524 - Posted: 25 Mar 2018, 10:06:00 UTC - in response to Message 1926519. Last modified: 25 Mar 2018, 10:09:16 UTC Tnx, Grant....alas, that didn't help. BOINC still gives me tasks on manual updates, but then backs off. I'll leave it alone now to find out how long the back offs actually are. Weird...never experienced this before. Edit : Richard : Which flags would you suggest ? I have activated sched-op-debug. Humans may rule the world...but bacteria run it... ID: 1926524 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1926526 - Posted: 25 Mar 2018, 10:25:35 UTC - in response to Message 1926524. Edit : Richard : Which flags would you suggest ? I have activated sched-op-debug. sched_op_debug is an extremely good place to start - it gets activated immediately on any machine I use. It may not be sufficient on its own in this case: for that, I'd let work_fetch_debug run once, and then turn it off again while you decipher the output. I wrote it up in message 1900544 ID: 1926526 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926531 - Posted: 25 Mar 2018, 11:09:44 UTC - in response to Message 1926526. Sched-op-debug doesn't run when there is no update triggered by BOINC or me. work-fetch-debug looks like this when the back-off is active : 25/03/2018 12:46:12 \| \| [work_fetch] Request work fetch: Backoff ended for SETI@home 25/03/2018 12:46:14 \| \| [work_fetch] ------- start work fetch state ------- 25/03/2018 12:46:14 \| \| [work_fetch] target work buffer: 43200.00 + 43200.00 sec 25/03/2018 12:46:14 \| \| [work_fetch] --- project states --- 25/03/2018 12:46:14 \| Einstein@Home \| [work_fetch] REC 356.206 prio 0.000 can't request work: suspended via Manager 25/03/2018 12:46:14 \| SETI@home \| [work_fetch] REC 247437.379 prio -1.769 can request work 25/03/2018 12:46:14 \| \| [work_fetch] --- state for CPU --- 25/03/2018 12:46:14 \| \| [work_fetch] shortfall 0.00 nidle 0.00 saturated 86722.91 busy 0.00 25/03/2018 12:46:14 \| Einstein@Home \| [work_fetch] share 0.000 zero resource share 25/03/2018 12:46:14 \| SETI@home \| [work_fetch] share 1.000 25/03/2018 12:46:14 \| \| [work_fetch] --- state for NVIDIA GPU --- 25/03/2018 12:46:14 \| \| [work_fetch] shortfall 42290.59 nidle 0.00 saturated 44003.31 busy 0.00 25/03/2018 12:46:14 \| Einstein@Home \| [work_fetch] share 0.000 zero resource share 25/03/2018 12:46:14 \| SETI@home \| [work_fetch] share 1.000 25/03/2018 12:46:14 \| \| [work_fetch] ------- end work fetch state ------- 25/03/2018 12:46:14 \| \| [work_fetch] No project chosen for work fetch 25/03/2018 12:46:52 \| \| [work_fetch] Request work fetch: application exited Humans may rule the world...but bacteria run it... ID: 1926531 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1926534 - Posted: 25 Mar 2018, 11:22:21 UTC - in response to Message 1926531. Behaviour by design. For SETI, both CPU (86,722 seconds) and NVidia (44,003 seconds) are 'saturated' above your target work buffer (43,200 seconds). You have as much work as you've asked for. When one of those drops below the target (which should happen quite soon for NVidia), BOINC should request more work. While it's online, it should ask for the 'additional' work you've requested (the second 43,200 seconds). That should keep it busy for the next 12 hours until it needs to request work again. The idea is to get all the work you could possibly need (or at least, what you've asked for) once every 12 hours, so you don't keep pestering the servers - they're busy enough with everybody else. ID: 1926534 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926538 - Posted: 25 Mar 2018, 11:38:04 UTC - in response to Message 1926534. Behaviour by design. For SETI, both CPU (86,722 seconds) and NVidia (44,003 seconds) are 'saturated' above your target work buffer (43,200 seconds). You have as much work as you've asked for. When one of those drops below the target (which should happen quite soon for NVidia), BOINC should request more work. While it's online, it should ask for the 'additional' work you've requested (the second 43,200 seconds). That should keep it busy for the next 12 hours until it needs to request work again. The idea is to get all the work you could possibly need (or at least, what you've asked for) once every 12 hours, so you don't keep pestering the servers - they're busy enough with everybody else. I do understand that. And I didn't account for the AP WUs, thinking I WAS below target. Also, normally there WILL be a work request when the counter runs out, even if cache is full. In such cases I get a "cache full" message from BOINC. This how it has worked before, so why not now ? Humans may rule the world...but bacteria run it... ID: 1926538 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1926543 - Posted: 25 Mar 2018, 12:04:46 UTC - in response to Message 1926538. Next time it happens, run 'work_fetch_debug' again. The answer will be in there somewhere, though it sometimes takes some hard looking before you see it. ID: 1926543 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926545 - Posted: 25 Mar 2018, 12:43:13 UTC - in response to Message 1926543. Next time it happens, run 'work_fetch_debug' again. The answer will be in there somewhere, though it sometimes takes some hard looking before you see it. This is getting a bit confusing. When my cache is "officially" full, I still get tasks if I do a manual update. At the moment it is not full (155+32 AP), returned completed tasks, and still no updates at the 5 minute mark... ;-) Humans may rule the world...but bacteria run it... ID: 1926545 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1926642 - Posted: 26 Mar 2018, 2:27:04 UTC - in response to Message 1926545. This is getting a bit confusing. When my cache is "officially" full, I still get tasks if I do a manual update. At the moment it is not full (155+32 AP), returned completed tasks, and still no updates at the 5 minute mark... ;-) How do you have your cache set? Store at least xx days of work Store up to an additional x.x days of work If you run a 4 day cache, and have "Store at least xx days of work" set to 4, and "Store up to an additional x.x days of work" set to something like 0.05 then as you complete work, BOINC will ask for more. If due to processing speed, you can't actually get a full cache with the 100WU server side limits, then it will ask for work every 5 (and a bit) minutes even if you haven't completed any WUs in that time. If you have "Store up to an additional x.x days of work" set for 2, and "Store up to an additional x.x days of work" set to 2, then the cach will tend to run down to around 2 days worth, then refill the extra 2 days in one (or several) goes. Grant Darwin NT ID: 1926642 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1926671 - Posted: 26 Mar 2018, 8:24:41 UTC - in response to Message 1926642. How do you have your cache set? He posted that in the work fetch log: 43200 + 43200, or 0.5 days + 0.5 days. If he doesn't understand what's happening, he needs to post that again, with the cycles before, during, and after the fetch. All this speculation and guesswork is useless. Set the logs, and read the logs. The logs will also show the backoffs that will prevent BOINC behaving the way Grant describes. ID: 1926671 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1926683 - Posted: 26 Mar 2018, 9:35:44 UTC - in response to Message 1926671. Settings are : Store 1.1 days of work + an additional 0.5 days. At the moment I have 92 CPU tasks in progress, 92 GPU tasks and 9 AP tasks. The reason I asked this question is that everything has been working perfectly without change to these settings. I simply got curious when things suddenly started behaving differently. Obviously, I don't know enough about the way BOINC works, so I'll just leave it alone. But tnx for trying to help :-) Humans may rule the world...but bacteria run it... ID: 1926683 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1926690 - Posted: 26 Mar 2018, 10:18:46 UTC - in response to Message 1926683. Settings are : Store 1.1 days of work + an additional 0.5 days. Richard says that your log shows it as : 43200 + 43200, or 0.5 days + 0.5 days. Wild speculation- for whatever reason the Manager is carrying a smaller cache than you have set previously (0.5 not 1.1), combined with a higher than usual number of AP WUs being about could be responsible for the present work fetch behavior. Grant Darwin NT ID: 1926690 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.