Panic Mode On (21) Server problems

Message boards : Number crunching : Panic Mode On (21) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

AuthorMessage
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 919371 - Posted: 19 Jul 2009, 14:39:18 UTC - in response to Message 919362.  

Note that in the last hour - at about 06:30 on a Sunday morning, Pacific time - somebody has disabled the Astropulse splitters, put an extra 200 GB of data online (4 'tapes'), and restarted the splitters.

Any more volunteers for the Sunday morning staff shift?


Great that somebody is willing to do that but in the current situation not very usefull.

It would have been better to shutdown the splitters for a while to let the project run out of work and restart the upload server giving everybody the opportunity to return all the finished work they have sitting on there hosts.
ID: 919371 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 919381 - Posted: 19 Jul 2009, 15:26:53 UTC - in response to Message 919371.  
Last modified: 19 Jul 2009, 15:37:53 UTC

So really, is someone going to turn the upload server back on today, or did they load all the AP so we'd just have something to look at? Kinda like look but don't touch at this point.
ID: 919381 · Report as offensive
Profile Melanie

Send message
Joined: 10 Feb 02
Posts: 3
Credit: 645,683
RAC: 0
United Kingdom
Message 919389 - Posted: 19 Jul 2009, 15:52:39 UTC - in response to Message 919381.  

thinks.....what about a change in message ? " Quick Quick ! server is temporarily UP " ;-)
ID: 919389 · Report as offensive
John G

Send message
Joined: 29 Dec 01
Posts: 68
Credit: 10,932,850
RAC: 0
Canada
Message 919392 - Posted: 19 Jul 2009, 15:55:58 UTC - in response to Message 919381.  

Its my best guess that yes the upload server will be turned back on. Its been a back and forth thing all weekend (it almost appears to be 6 hours on then 6 hours off)or very hard to upload while the AP splitters are doing there thing ????

Waiting Patiently While having 87 to upload and still counting

John G
ID: 919392 · Report as offensive
John G

Send message
Joined: 29 Dec 01
Posts: 68
Credit: 10,932,850
RAC: 0
Canada
Message 919409 - Posted: 19 Jul 2009, 17:11:58 UTC

Well I guess that theory was shot down !!!! Been about 7 hours now !!!! The astropulse gods must rule the roost !!!! So be it I continue to crunch 104 to upload as I speak.

John G
ID: 919409 · Report as offensive
Profile Rick B

Send message
Joined: 6 Mar 01
Posts: 299
Credit: 1,532,791
RAC: 0
Canada
Message 919419 - Posted: 19 Jul 2009, 17:37:24 UTC - in response to Message 919409.  

If the upload server has been down for 7 hours how is that 15333 results have been received in the last hour?
Rick
**************************
ID: 919419 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 919420 - Posted: 19 Jul 2009, 17:38:15 UTC - in response to Message 919409.  

Well I guess that theory was shot down !!!! Been about 7 hours now !!!! The astropulse gods must rule the roost !!!! So be it I continue to crunch 104 to upload as I speak.

John G

The upload server is still turned off, indeed. What I don't understand is that on the server status page it says that some 15,000 results have been received over the last hour. Although (I think) that number is slowly decreasing since the server was turned off, I wonder how it can be receiving anything once it has been turned off? Does anybody know what that number (results received last hour) actually means?

Regards,
John.
ID: 919420 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919422 - Posted: 19 Jul 2009, 17:41:22 UTC

When I read these threads and everyone saying "I hope they turn the upload server back on" it really makes me wonder.

It's not like they turn things on and off for no reason. Life is better for Eric and Matt and Jeff when things are all turned on.

It's either broken, or it is off for a reason.

If it's broken, and can't be fixed remotely, then someone has to make the commute from home to the office, and while it's bad, it may not be critical enough to drop everything and go right now.

If it's off for a reason, then there is a plan to turn it back on, and a benefit to having it off.

I don't know which, or why, but I'm pretty sure that it's not "because we just decided to turn it off."

... and maybe it's just me, but why are we asking "are they going to turn it back on?" -- if they don't, the project is over, done, finished, so the answer to that question seems fairly obvious.
ID: 919422 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 919424 - Posted: 19 Jul 2009, 17:50:36 UTC - in response to Message 919422.  


If it's off for a reason, then there is a plan to turn it back on, and a benefit to having it off.

I don't know which, or why, but I'm pretty sure that it's not "because we just decided to turn it off."

... and maybe it's just me, but why are we asking "are they going to turn it back on?" -- if they don't, the project is over, done, finished, so the answer to that question seems fairly obvious.

I don't think anyone was thinking they turned it off to just turn it off, and the question (at least by me) wasn't merely if they will turn it back on, but instead will they turn it back on today as well as why have it off when a slew of AP were just loaded. They are questions that aren't real obvious to some including myself. I also don't see why just because someone can't fix it today for whatever reason, means we shouldn't ask questions that aren't obvious to us.

BTW 17,054 received and rising, still not sure how that works either
ID: 919424 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919428 - Posted: 19 Jul 2009, 17:55:12 UTC - in response to Message 919424.  


If it's off for a reason, then there is a plan to turn it back on, and a benefit to having it off.

I don't know which, or why, but I'm pretty sure that it's not "because we just decided to turn it off."

... and maybe it's just me, but why are we asking "are they going to turn it back on?" -- if they don't, the project is over, done, finished, so the answer to that question seems fairly obvious.

I don't think anyone was thinking they turned it off to just turn it off, and the question (at least by me) wasn't merely if they will turn it back on, but instead will they turn it back on today as well as why have it off when a slew of AP were just loaded. They are questions that aren't real obvious to some including myself. I also don't see why just because someone can't fix it today for whatever reason, means we shouldn't ask questions that aren't obvious to us.

BTW 17,054 received and rising, still not sure how that works either

My guess? Bruno isn't the only upload server at the moment.
ID: 919428 · Report as offensive
WendyR
Volunteer tester
Avatar

Send message
Joined: 1 Aug 05
Posts: 44
Credit: 1,962,140
RAC: 0
United States
Message 919431 - Posted: 19 Jul 2009, 17:59:28 UTC - in response to Message 919424.  

BTW 17,054 received and rising, still not sure how that works either


Remember, returning results works in two phases. First, you upload the result That's the slow part that uses the upload server, and happens as soon as you finish the workunit. The status goes from "Uploading" to "Ready to Report" when this is done.

Then you connect to the database and report a bunch of results. That second phase isn't down.

ID: 919431 · Report as offensive
Lazydude
Volunteer tester

Send message
Joined: 17 Jan 01
Posts: 45
Credit: 96,158,001
RAC: 136
Sweden
Message 919434 - Posted: 19 Jul 2009, 18:03:22 UTC - in response to Message 919424.  
Last modified: 19 Jul 2009, 18:04:08 UTC

Uploading and reporting:

Once an Wu are finished Boinc upload the data-file direct without any database communication.

Then it reports that those WU's are ready for the server to work with.

The reports are uploaded as follows - Taken from John Maclouds answer in the previous thread (20)
>>
Reporting is done once per day, unless there is time pressure. This operation uses the database heavily, so multiple work units are reported in one transaction.


Reporting does not happen on a schedule, rather it happens at the first of:

1) 24 hours before deadline.
2) Connect every X before completion.
3) Immediately after upload completes if the time is later than either 1 or 2.
4) On a work request.
5) When any other task reports.
6) 24 hours after completion.
7) On a trickle up message (CPDN only as far as I know).
8) On a trickle down request (no projects that I am aware of).
9) On a server scheduled connection. (maximum duration between connections)
10) When the user hits the update button.


Edit; Wendy was quicker
ID: 919434 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 919435 - Posted: 19 Jul 2009, 18:04:10 UTC - in response to Message 919431.  
Last modified: 19 Jul 2009, 18:28:30 UTC

BTW 17,054 received and rising, still not sure how that works either


Remember, returning results works in two phases. First, you upload the result That's the slow part that uses the upload server, and happens as soon as you finish the workunit. The status goes from "Uploading" to "Ready to Report" when this is done.

Then you connect to the database and report a bunch of results. That second phase isn't down.

That makes sense, and even though it's been alsmost 8 hours down now, I can see how some people would still be reporting work upload that long ago.

Edit: forgot to say thanks to all.
ID: 919435 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 919442 - Posted: 19 Jul 2009, 18:37:58 UTC - in response to Message 919428.  

...
BTW 17,054 received and rising, still not sure how that works either

My guess? Bruno isn't the only upload server at the moment.

My guess? The 24 hour maximum delay after which the BOINC core client reports completed work even if there are too many uncompleted uploads to allow it to ask for new work. "Received" of course means "reported" since the database doesn't know about an upload until it has been reported.
                                                               Joe
ID: 919442 · Report as offensive
John G

Send message
Joined: 29 Dec 01
Posts: 68
Credit: 10,932,850
RAC: 0
Canada
Message 919443 - Posted: 19 Jul 2009, 18:44:29 UTC - in response to Message 919419.  

Good question Rick ----- The world wakes up at different times duh !!!!
ID: 919443 · Report as offensive
Profile Rick B

Send message
Joined: 6 Mar 01
Posts: 299
Credit: 1,532,791
RAC: 0
Canada
Message 919444 - Posted: 19 Jul 2009, 18:46:39 UTC - in response to Message 919442.  
Last modified: 19 Jul 2009, 18:48:34 UTC

That must be the results reported number. I just did a manual update and sure enough the 23 results I had "ready to report" did report.

Edit: Thanks all for your answers.
Rick
**************************
ID: 919444 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 919448 - Posted: 19 Jul 2009, 19:05:47 UTC

I think it over 12 hrs since Bruno went down as I have got connect failed since 07:55 BST or 06:55 UTC. I managed to upload at 06:38 BST and nothing since then has uploaded most of my wus have either been 30 mins shorties or 105 mins ones
ID: 919448 · Report as offensive
Hofman's Atlantic
Volunteer tester

Send message
Joined: 6 Jan 05
Posts: 32
Credit: 11,359,969
RAC: 0
United States
Message 919450 - Posted: 19 Jul 2009, 19:24:43 UTC

7/19/09 2:27:16 PM|SETI@home Beta Test|Scheduler request failed: Error 405
Does Any one know what this error is?

ID: 919450 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 919451 - Posted: 19 Jul 2009, 19:27:06 UTC
Last modified: 19 Jul 2009, 19:38:01 UTC


Ahh.. yes.. what you think/guess? ;-)

Yes, my GPU cruncher is again idle.. after ~ 1/3 day or something..
Because ~ 500 results in the UL overview and the UL server is offline.


So - it's not a well idea for to increase the > 'CPUs x 2' for GPU cruncher?
http://setiathome.berkeley.edu/forum_thread.php?id=54711
Every day.. either whole day idle.. or UL server possible (bandwidth)/ or online.. UL and work request possible.. can catch some WUs.. and then again UL probs after some minutes.. many stopped result ULs in the overview.. and then no work request possible.. only some WUs in the WU cache.. not enough for a whole day..


BTW.
Because of the reported ULs, but UL server offline..

My GPU cruncher have the CUDA_V12_VLARkill_app.
Every time a 'bad WU header' (VLAR kill) happen, it can be reported immediately without happened UL.
The later happened UL is I guess for nothing, or it's needed to send back a ~ 21,x kb file?

That's the reason that my GPU cruncher have now Maximum daily WU quota per CPU 62/day.
It's not every day, sometimes one to up to 3 days or something he have no killed VLARs.

So all PCs with VLARkill can report like my GPU cruncher and they count.

ID: 919451 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 919453 - Posted: 19 Jul 2009, 19:29:44 UTC - in response to Message 919442.  

...
BTW 17,054 received and rising, still not sure how that works either

My guess? Bruno isn't the only upload server at the moment.

My guess? The 24 hour maximum delay after which the BOINC core client reports completed work even if there are too many uncompleted uploads to allow it to ask for new work. "Received" of course means "reported" since the database doesn't know about an upload until it has been reported.
                                                               Joe

Doh. Of course.
ID: 919453 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (21) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.