Panic Mode On (54) Server problems?

Message boards : Number crunching : Panic Mode On (54) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1153830 - Posted: 19 Sep 2011, 10:41:47 UTC - in response to Message 1153829.  

Indeed, it looks as if the scheduler is assigning work again. The cricket graph is back to maxxed out on the download side, however uploads still aren't going through. I believe you are the first to report receiving AP tasks. Perhaps someone at the lab is having an early morning, or maybe a late night ;)
ID: 1153830 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1153831 - Posted: 19 Sep 2011, 10:51:23 UTC - in response to Message 1153829.  

I am in the process of downloading 6 AP WUs with completion times of 10x. I thought AP downloads were turned off.

I'm beginning to wonder if it might be working like this:

BOINC servers aren't supposed to send out work if it can't be completed in time. The servers know, obviously, what work has been allocated to a host: and they also know what work the host knows about (otherwise 'resend lost results' wouldn't work).

What if (and I'm guessing blind here): you have something approaching a 10-day cache. The server is suddenly thinking your host is ten times slower than it really is (because of the APR cap). That would make the server think your 10-day cache is going to take you 100 days. That would take you beyond deadline, so there wouldn't be any point in sending you work that (it thinks) you wouldn't finish in time.

But once your (real) cache has dropped below 2.5 days, then the server's (inflated) estimate of your cache size would come down to 25 days - and you'd start to be capable of completing extra work before deadline.

It's a theory, anyway.
ID: 1153831 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1153833 - Posted: 19 Sep 2011, 10:58:09 UTC - in response to Message 1153831.  

Makes sense to me.

Geoff, what are the estimated times for those tasks? And what is your current DCF?

I know my Q8300 machine does AP tasks with the Lunatics app in ~12 hours. With the current calculations being crazy over estimated, I'm interested to see just how long they are estimated. My CPU work was off by about 4x on my i7 machine, with GPU work being off by about 60x for the same machine's GTX460's. My other machines have been mostly unable to download new work during this time, so I can't compare them intelligently.
ID: 1153833 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 1153834 - Posted: 19 Sep 2011, 10:59:52 UTC
Last modified: 19 Sep 2011, 11:00:54 UTC

DCF 1.1 Completion times 87.55 hrs ! Normal completion time about 9hrs
ID: 1153834 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1153837 - Posted: 19 Sep 2011, 11:17:56 UTC - in response to Message 1153834.  

So it looks like it's off by about a factor of 9. By itself isn't AWFUL, but when you figure the tasks are usually 8-24 hours long, they could fill up even a 10 day cache VERY fast. Using your machine for example, quad-core taking 80 hours per task (adjusted the 87.5hrs back to a 1.0DCF), it would only take 12 tasks to fill 10 days.
ID: 1153837 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19062
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153861 - Posted: 19 Sep 2011, 12:41:10 UTC - in response to Message 1153831.  

I can shoot that theory down immediately as the DCF on my computer has kept in the range 0.5 to 2.0. This is thanks to the incorrect APR for AP that thought my computer was faster than it is and gave me 16 AP tasks just before the maintenance.

By forcing the computer to always have a least one AP task running, every time one finishes it forces the DCF up, last increase was to 1.88xxxx.
ID: 1153861 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1153939 - Posted: 19 Sep 2011, 17:38:01 UTC
Last modified: 19 Sep 2011, 17:39:38 UTC

Well AP work is obviously being handed out. Last night, ready to send was above 30k, and now it is below 10k, so 20k have been issued already.. or thrown at /dev/null and are being re-split.

I'm hoping to pick some up really soon here though. My 10-day cache is going to be empty probably during maintenance tomorrow. I'll start to have idle cores in about 10 hours. With a completely empty cache, I wonder if I can get one of these 9x ETA tasks. Presently I'm asking for ~3.1M seconds of work and getting nothing in return.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1153939 · Report as offensive
Profile Floyd
Avatar

Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1153943 - Posted: 19 Sep 2011, 18:02:25 UTC - in response to Message 1153861.  

I can shoot that theory down immediately as the DCF on my computer has kept in the range 0.5 to 2.0. This is thanks to the incorrect APR for AP that thought my computer was faster than it is and gave me 16 AP tasks just before the maintenance.

By forcing the computer to always have a least one AP task running, every time one finishes it forces the DCF up, last increase was to 1.88xxxx.



Ok... as a noob I am wondering about the DCF factor , what is a good one ?

according to the computer/details my dcf is - Task duration correction factor

0.265752

Yet I am Just getting work 1 0r 2 WU's at a time... as I finish one it ( after Reporting ) only sends me 1 or 2 more .

Yet when the WU starts it still says about 25x time more than it actually takes to do the Unit... !!!

SO how does that work ?
ID: 1153943 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1153955 - Posted: 19 Sep 2011, 18:45:44 UTC - in response to Message 1153943.  

Yet when the WU starts it still says about 25x time more than it actually takes to do the Unit... !!!

SO how does that work ?

Check the Shorties estimate up from three minutes to six hours thread.
In short- server side update borked things badly. The fact that the upload server is also down isn't helping.
Grant
Darwin NT
ID: 1153955 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6652
Credit: 121,090,076
RAC: 0
United States
Message 1153965 - Posted: 19 Sep 2011, 19:44:45 UTC - in response to Message 1153964.  

Will there ever come a day again, when AP flows freely, or will this drought of AP units force me to stray away from the path of AP only?

To have AP's, or to no have AP's, that's the question. Just like Shakespeare said....

At this point, I would be happy just being able to obtain and hold a full cache, regardless of what was in it.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1153965 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1153968 - Posted: 19 Sep 2011, 19:49:11 UTC - in response to Message 1153964.  

Will there ever come a day again, when AP flows freely, or will this drought of AP units force me to stray away from the path of AP only?

To have AP's, or to not have AP's, that's the question. Just like Shakespeare said....

I don't know, the last AP I got took over twenty minutes to download :).
ID: 1153968 · Report as offensive
Profile Jim_S
Avatar

Send message
Joined: 23 Feb 00
Posts: 4705
Credit: 64,560,357
RAC: 31
United States
Message 1153969 - Posted: 19 Sep 2011, 19:50:15 UTC

At this point in time has anyone got anything to upload or download?

I Desire Peace and Justice, Jim Scott (Mod-Ret.)
ID: 1153969 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1153974 - Posted: 19 Sep 2011, 19:57:53 UTC - in response to Message 1153969.  

At this point in time has anyone got anything to upload or download?

well considering that i had well over 100 WUs in my cache before the server crash on the 13th, i probably have 30 or so AP tasks waiting to upload right now...
ID: 1153974 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1153976 - Posted: 19 Sep 2011, 20:02:27 UTC

Can't upload right now (for few hours already)
ID: 1153976 · Report as offensive
bryn

Send message
Joined: 6 Mar 01
Posts: 7
Credit: 145,339
RAC: 0
United Kingdom
Message 1153979 - Posted: 19 Sep 2011, 20:13:33 UTC
Last modified: 19 Sep 2011, 20:15:43 UTC

I dont think anything can be working right at the moment, I have only ever returned 1 ap, and I am currently crunching my second one, yet my account only shows the one that I have returned!
Managed to upload 13 wu's earlier but they went through very slowly over about 5/6 hours, I now have another 6 ready to return!

Wow just as I post that they all went through!
ID: 1153979 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1153981 - Posted: 19 Sep 2011, 20:19:13 UTC - in response to Message 1153969.  

At this point in time has anyone got anything to upload or download?

Seven tasks still waiting to upload. Zero to download.
ID: 1153981 · Report as offensive
MikeN

Send message
Joined: 24 Jan 11
Posts: 319
Credit: 64,719,409
RAC: 85
United Kingdom
Message 1153991 - Posted: 19 Sep 2011, 21:09:53 UTC - in response to Message 1153969.  
Last modified: 19 Sep 2011, 21:16:25 UTC

yes I can get wu's to upload, but it takes a lot of effort. It has just taken 5 minutes and three attempts to upload 1 wu. The good news is that once they are uploaded, reporting works fine.

Edit the cricket graph also shows uploads currently at over 20 M bit/s

Edit Edit but the server status page currently showing the upload server as disabled and the date / time seems to be up to date? If the upload server is disabled, where is all the uploaded data going???
ID: 1153991 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1153992 - Posted: 19 Sep 2011, 21:11:34 UTC
Last modified: 19 Sep 2011, 21:28:23 UTC

9/19/2011 10:58:22 PM SETI@home Sending scheduler request: To fetch work.
9/19/2011 10:58:22 PM SETI@home Requesting new tasks for CPU
9/19/2011 10:58:27 PM SETI@home Scheduler request completed: got 0 new tasks
9/19/2011 10:58:27 PM SETI@home No tasks sent
9/19/2011 10:58:27 PM SETI@home This computer has reached a limit on tasks in progress

Do we have a new surprise (?) :)

This is happening on two of my CPU-only hosts and a GPU one, too. Cache nowhere near the usual 5 days I run.
Edit: I hope this is a temporary measure to prevent overfetch after they revert back to normal run times. Does that mean they will fix it? Yay, if so!
ID: 1153992 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1153994 - Posted: 19 Sep 2011, 21:28:25 UTC - in response to Message 1153992.  

9/19/2011 10:58:22 PM SETI@home Sending scheduler request: To fetch work.
9/19/2011 10:58:22 PM SETI@home Requesting new tasks for CPU
9/19/2011 10:58:27 PM SETI@home Scheduler request completed: got 0 new tasks
9/19/2011 10:58:27 PM SETI@home No tasks sent
9/19/2011 10:58:27 PM SETI@home This computer has reached a limit on tasks in progress

Do we have a new surprise (?) :)

This is happening on two of my CPU-only hosts and a GPU one, too. Cache nowhere near the usual 5 days I run.

That does sound like a very sensible precaution by the project, to protect against runaway work-fetch during recovery from the APR and DCF problem.

We've seen the limit applied before during recovery from problems: it will be a limit on the number of tasks in progress, not taking any account of their expected or actual run-time. Roughly how many tasks do you have in progress on the affected boxes?
ID: 1153994 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1153997 - Posted: 19 Sep 2011, 21:38:04 UTC - in response to Message 1153994.  

That does sound like a very sensible precaution by the project, to protect against runaway work-fetch during recovery from the APR and DCF problem.

Yes, I was thinking it's for that reason, too. At least we have a confirmation the fix is on the way.
We've seen the limit applied before during recovery from problems: it will be a limit on the number of tasks in progress, not taking any account of their expected or actual run-time. Roughly how many tasks do you have in progress on the affected boxes?

340 or so shorties on CPU one, but that will be enough for 2-3 days, so I'm not panicking.
ID: 1153997 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (54) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.