Panic Mode On (110) Server Problems?

Message boards : Number crunching : Panic Mode On (110) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1926415 - Posted: 24 Mar 2018, 22:15:52 UTC
Last modified: 24 Mar 2018, 22:17:13 UTC

What is even more strange is if you look the WU itself: https://setiathome.berkeley.edu/workunit.php?wuid=2912007028

6510415901	8389310	24 Mar 2018, 11:13:54 UTC	6 May 2018, 13:55:14 UTC	Em progresso	---	---	---	SETI@home v8 v8.08 (alt)
windows_x86_64
6510415902	8396902	24 Mar 2018, 11:13:59 UTC	24 Mar 2018, 11:14:18 UTC	Tempo limite atingido - sem resposta	0.00	0.00	---	SETI@home v8
Plataforma anonima (NVIDIA GPU)


They was created at the same time (as expected) but the task was send to the host 8389310 has a Time Limit of 6-May-18 while the one sended to my host has a time limit of 11:14:18 less than 20 secs after sended!

Still don't have any clue why that happening and a way to avoid that in the future. It's a compleate waste of DL/UL resources, besides the computer time wasted to do that.
ID: 1926415 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1926420 - Posted: 24 Mar 2018, 22:31:05 UTC
Last modified: 24 Mar 2018, 22:32:34 UTC

In my log it states at the time of download that server maybe down after a while states it gave up trying to download them! Roughly 4 hrs ago.. O_o
ID: 1926420 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1926426 - Posted: 24 Mar 2018, 22:51:50 UTC

Splitters are back in struggle mode.
Grant
Darwin NT
ID: 1926426 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1926493 - Posted: 25 Mar 2018, 5:44:50 UTC - in response to Message 1926426.  

Starting to get pretty low in the RTS buffer. Splitters just aren't doing the work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1926493 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926514 - Posted: 25 Mar 2018, 8:48:20 UTC

Scheduler requests are not working properly here this morning.
If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off.
I have no stuck transfers.
I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help.
What is going on ?
Humans may rule the world...but bacteria run it...
ID: 1926514 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1926515 - Posted: 25 Mar 2018, 8:56:21 UTC - in response to Message 1926514.  
Last modified: 25 Mar 2018, 8:58:11 UTC

Scheduler requests are not working properly here this morning.
If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off.
I have no stuck transfers.
I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help.
What is going on ?

Most likely the usual Application preference & Scheduler weirdness showing it's ugly face again.

Have you tried a triple update?

Edit- the Ready-to-send buffer is very low, but not (yet) empty. So you should be able to get work.
Grant
Darwin NT
ID: 1926515 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926518 - Posted: 25 Mar 2018, 9:10:53 UTC - in response to Message 1926515.  

Scheduler requests are not working properly here this morning.
If I do a manual request, all looks normal and tasks are sent. BOINC then counts down the normal 5 minuts and goes into infinite back-off.
I have no stuck transfers.
I thought maybe th change to Daylight Savings Time did it, but a BOINC restart did not help.
What is going on ?

Most likely the usual Application preference & Scheduler weirdness showing it's ugly face again.

Have you tried a triple update?

Edit- the Ready-to-send buffer is very low, but not (yet) empty. So you should be able to get work.

Yes, I'm getting work, IF I do a manual update request.
Haven't tried the triple update...how to ?
Humans may rule the world...but bacteria run it...
ID: 1926518 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1926519 - Posted: 25 Mar 2018, 9:22:28 UTC - in response to Message 1926518.  
Last modified: 25 Mar 2018, 9:24:06 UTC

Haven't tried the triple update...how to ?

OK.
Hit Update. You should then get "Scheduler request pending; Requested by user" (or something along those lines)
As soon as you get "Scheduler request in progress", hit update again.
This time wait for the request to complete.
Then update again.

The next automatic Scheduler request should then result in work, and usually the following requests as well. No idea why it works, but if there is work available, and the Scheduler's just being funny about allocating it that seems to get it working again.
I used to have to change work preference application settings to keep thw work coming, but Tbar came up with this triple update. If it doesn't get the work flowing again, there's some other issue.

If work still isn't regularly forthcoming, we'll have to look further (and what you're describing does sound different to the usual issue of not getting work due to Scheduler weirdness).
Grant
Darwin NT
ID: 1926519 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926522 - Posted: 25 Mar 2018, 9:52:09 UTC - in response to Message 1926519.  

we'll have to look further
The Event Log (with appropriate debug flags) is always a good place to start looking.
ID: 1926522 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926524 - Posted: 25 Mar 2018, 10:06:00 UTC - in response to Message 1926519.  
Last modified: 25 Mar 2018, 10:09:16 UTC

Tnx, Grant....alas, that didn't help. BOINC still gives me tasks on manual updates, but then backs off.
I'll leave it alone now to find out how long the back offs actually are.
Weird...never experienced this before.

Edit : Richard : Which flags would you suggest ? I have activated sched-op-debug.
Humans may rule the world...but bacteria run it...
ID: 1926524 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926526 - Posted: 25 Mar 2018, 10:25:35 UTC - in response to Message 1926524.  

Edit : Richard : Which flags would you suggest ? I have activated sched-op-debug.
sched_op_debug is an extremely good place to start - it gets activated immediately on any machine I use. It may not be sufficient on its own in this case: for that, I'd let work_fetch_debug run once, and then turn it off again while you decipher the output.

I wrote it up in message 1900544
ID: 1926526 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926531 - Posted: 25 Mar 2018, 11:09:44 UTC - in response to Message 1926526.  

Sched-op-debug doesn't run when there is no update triggered by BOINC or me.
work-fetch-debug looks like this when the back-off is active :

25/03/2018 12:46:12 | | [work_fetch] Request work fetch: Backoff ended for SETI@home
25/03/2018 12:46:14 | | [work_fetch] ------- start work fetch state -------
25/03/2018 12:46:14 | | [work_fetch] target work buffer: 43200.00 + 43200.00 sec
25/03/2018 12:46:14 | | [work_fetch] --- project states ---
25/03/2018 12:46:14 | Einstein@Home | [work_fetch] REC 356.206 prio 0.000 can't request work: suspended via Manager
25/03/2018 12:46:14 | SETI@home | [work_fetch] REC 247437.379 prio -1.769 can request work
25/03/2018 12:46:14 | | [work_fetch] --- state for CPU ---
25/03/2018 12:46:14 | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 86722.91 busy 0.00
25/03/2018 12:46:14 | Einstein@Home | [work_fetch] share 0.000 zero resource share
25/03/2018 12:46:14 | SETI@home | [work_fetch] share 1.000
25/03/2018 12:46:14 | | [work_fetch] --- state for NVIDIA GPU ---
25/03/2018 12:46:14 | | [work_fetch] shortfall 42290.59 nidle 0.00 saturated 44003.31 busy 0.00
25/03/2018 12:46:14 | Einstein@Home | [work_fetch] share 0.000 zero resource share
25/03/2018 12:46:14 | SETI@home | [work_fetch] share 1.000
25/03/2018 12:46:14 | | [work_fetch] ------- end work fetch state -------
25/03/2018 12:46:14 | | [work_fetch] No project chosen for work fetch
25/03/2018 12:46:52 | | [work_fetch] Request work fetch: application exited
Humans may rule the world...but bacteria run it...
ID: 1926531 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926534 - Posted: 25 Mar 2018, 11:22:21 UTC - in response to Message 1926531.  

Behaviour by design. For SETI, both CPU (86,722 seconds) and NVidia (44,003 seconds) are 'saturated' above your target work buffer (43,200 seconds). You have as much work as you've asked for.

When one of those drops below the target (which should happen quite soon for NVidia), BOINC should request more work. While it's online, it should ask for the 'additional' work you've requested (the second 43,200 seconds). That should keep it busy for the next 12 hours until it needs to request work again.

The idea is to get all the work you could possibly need (or at least, what you've asked for) once every 12 hours, so you don't keep pestering the servers - they're busy enough with everybody else.
ID: 1926534 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926538 - Posted: 25 Mar 2018, 11:38:04 UTC - in response to Message 1926534.  

Behaviour by design. For SETI, both CPU (86,722 seconds) and NVidia (44,003 seconds) are 'saturated' above your target work buffer (43,200 seconds). You have as much work as you've asked for.

When one of those drops below the target (which should happen quite soon for NVidia), BOINC should request more work. While it's online, it should ask for the 'additional' work you've requested (the second 43,200 seconds). That should keep it busy for the next 12 hours until it needs to request work again.

The idea is to get all the work you could possibly need (or at least, what you've asked for) once every 12 hours, so you don't keep pestering the servers - they're busy enough with everybody else.

I do understand that. And I didn't account for the AP WUs, thinking I WAS below target.
Also, normally there WILL be a work request when the counter runs out, even if cache is full. In such cases I get a "cache full" message from BOINC. This how it has worked before, so why not now ?
Humans may rule the world...but bacteria run it...
ID: 1926538 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926543 - Posted: 25 Mar 2018, 12:04:46 UTC - in response to Message 1926538.  

Next time it happens, run 'work_fetch_debug' again. The answer will be in there somewhere, though it sometimes takes some hard looking before you see it.
ID: 1926543 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926545 - Posted: 25 Mar 2018, 12:43:13 UTC - in response to Message 1926543.  

Next time it happens, run 'work_fetch_debug' again. The answer will be in there somewhere, though it sometimes takes some hard looking before you see it.

This is getting a bit confusing. When my cache is "officially" full, I still get tasks if I do a manual update.
At the moment it is not full (155+32 AP), returned completed tasks, and still no updates at the 5 minute mark... ;-)
Humans may rule the world...but bacteria run it...
ID: 1926545 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1926642 - Posted: 26 Mar 2018, 2:27:04 UTC - in response to Message 1926545.  

This is getting a bit confusing. When my cache is "officially" full, I still get tasks if I do a manual update.
At the moment it is not full (155+32 AP), returned completed tasks, and still no updates at the 5 minute mark... ;-)

How do you have your cache set?
Store at least xx days of work
Store up to an additional x.x days of work
If you run a 4 day cache, and have "Store at least xx days of work" set to 4, and "Store up to an additional x.x days of work" set to something like 0.05 then as you complete work, BOINC will ask for more. If due to processing speed, you can't actually get a full cache with the 100WU server side limits, then it will ask for work every 5 (and a bit) minutes even if you haven't completed any WUs in that time.
If you have "Store up to an additional x.x days of work" set for 2, and "Store up to an additional x.x days of work" set to 2, then the cach will tend to run down to around 2 days worth, then refill the extra 2 days in one (or several) goes.
Grant
Darwin NT
ID: 1926642 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1926671 - Posted: 26 Mar 2018, 8:24:41 UTC - in response to Message 1926642.  

How do you have your cache set?
He posted that in the work fetch log: 43200 + 43200, or 0.5 days + 0.5 days.

If he doesn't understand what's happening, he needs to post that again, with the cycles before, during, and after the fetch. All this speculation and guesswork is useless. Set the logs, and read the logs.

The logs will also show the backoffs that will prevent BOINC behaving the way Grant describes.
ID: 1926671 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1926683 - Posted: 26 Mar 2018, 9:35:44 UTC - in response to Message 1926671.  

Settings are : Store 1.1 days of work + an additional 0.5 days.
At the moment I have 92 CPU tasks in progress, 92 GPU tasks and 9 AP tasks.

The reason I asked this question is that everything has been working perfectly without change to these settings.
I simply got curious when things suddenly started behaving differently.
Obviously, I don't know enough about the way BOINC works, so I'll just leave it alone.
But tnx for trying to help :-)
Humans may rule the world...but bacteria run it...
ID: 1926683 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1926690 - Posted: 26 Mar 2018, 10:18:46 UTC - in response to Message 1926683.  

Settings are : Store 1.1 days of work + an additional 0.5 days.

Richard says that your log shows it as : 43200 + 43200, or 0.5 days + 0.5 days.

Wild speculation- for whatever reason the Manager is carrying a smaller cache than you have set previously (0.5 not 1.1), combined with a higher than usual number of AP WUs being about could be responsible for the present work fetch behavior.
Grant
Darwin NT
ID: 1926690 · Report as offensive
Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · Next

Message boards : Number crunching : Panic Mode On (110) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.