Panic Mode On (77) Server Problems?

Message boards : Number crunching : Panic Mode On (77) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · Next

AuthorMessage
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1301132 - Posted: 2 Nov 2012, 2:40:41 UTC - in response to Message 1301125.  
Last modified: 2 Nov 2012, 2:48:23 UTC

If you have set NNT you don't have to wait the 5 minutes for another "hit".

Interesting to know. I thought it was a universal 303 seconds between accepted contacts. Not that I ever have a huge pile of tasks to report since I'm CPU-only and AP.

My guess is that the limit is applied in the Scheduler and that a straight report without a request for new work doesn't go through the Scheduler, so therefore the limit is not applied.

T.A.
ID: 1301132 · Report as offensive
Profile Brother Frank

Send message
Joined: 10 Dec 11
Posts: 26
Credit: 15,142,410
RAC: 0
United States
Message 1301141 - Posted: 2 Nov 2012, 3:28:27 UTC - in response to Message 1301112.  

Dear Sutaru Tsureku,

I did what you suggested and set my download to no new tasks and then waited five minutes or so. Then I "abused my Update button" by pressing it, but only once. I used to really hit the darned thing so often, but I'm getting more subtle and not "using that hammer" all the time. A few minutes later, all of my 50 or so built up completed tasks had been swallowed up. I moved from this desktop with one Nvidia Ti 550 graphics card to my new notebook and did the same thing. Again in just a few minutes my 40 or so completed tasks were swallowed. I went downstairs to my i7 2600k desktop with two Nvidia Ti 550 graphics cards and it had swallowed all its completed tasks -- probably 70-80 of them. I am feeling a great sense of relief like you have helped me find a wonderful brand of computer laxative. Wow, Thanks for reminding me of this wonderful inexpensive procedure. I am breaking out in song,"Oh What A Relief IT IS -- la la la la la.... la la. Hurray." I hope this is not too racy. It's in the same fine tradition found in the writings of the esteemed 16th Century author Francois Rabelais, a scholar and humorist who penned the tales of Gargantua and Pantagruel. French literature including Rabelais and Michel de Montaigne was one of my very favorite college courses.

Brother Frank
ID: 1301141 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24875
Credit: 3,081,182
RAC: 7
Ireland
Message 1301144 - Posted: 2 Nov 2012, 3:58:19 UTC

This is a first for me. After working on my Win 8 host, allowed it to download tasks.

After several timeouts, found that I had downloaded 16 wu's. However, on looking at My computers page, found that I had 119 wu's, but 103 marked "abandoned"?
ID: 1301144 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1301148 - Posted: 2 Nov 2012, 4:17:39 UTC - in response to Message 1301144.  


While at work today not once were either of my systems able to report any work.
Scheduler request failed: Timeout was reached is the only response they get.

And naturally their caches continue to shrink.
Grant
Darwin NT
ID: 1301148 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1301151 - Posted: 2 Nov 2012, 4:41:23 UTC - in response to Message 1301148.  
Last modified: 2 Nov 2012, 4:41:57 UTC

After several timeouts, found that I had downloaded 16 wu's. However, on looking at My computers page, found that I had 119 wu's, but 103 marked "abandoned"?

You're lucky. My computer got entire 5-day cache of 692 tasks (including some of new APs) abandoned today for absolutely no reason. The worst of it is, none of the workunits were deleted locally, so if I hadn't aborted them, I'd be doing 5 days of pointless work.
ID: 1301151 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1301188 - Posted: 2 Nov 2012, 6:23:16 UTC

As someone pointed out, and for me, it appears that the workunits we've uploaded are being processed and credited, just the notification of that to clear all the workunits listed as "ready to report" off of the task tab as well as fetching any new workunits.

Of course you know what this means. When the blockage is finally fixed we will be getting a metric ton of new workunits all with the same timestamp and likely paired with the same wingman or two. And if history is any indication there is a high likelihood that the wingman will be someone whose a) cuda application always overflow thus leading to a inconclusive validation; b) has a 29.9 day turn around time; or C) forgot they signed up for Seti@Home and all the tasks timeout after six or so weeks.

No, I'm not cynical. Not at all. :p
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1301188 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1301197 - Posted: 2 Nov 2012, 6:58:45 UTC
Last modified: 2 Nov 2012, 6:59:08 UTC

All I can add here is right now there is a LOT of bandwidth simply being wasted.
When I watch my 9 rigs try over and over again to report completed tasks, only to have the attempts go awry after sending the data........

All that bandwidth has been wasted, further clogging the pipes with comms that accomplish nothing.

I can only hope that the new up/download servers and other improvements the GPUUG is working on may somewhat mitigate this disaster.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1301197 · Report as offensive
Profile Uli
Volunteer tester
Avatar

Send message
Joined: 6 Feb 00
Posts: 10923
Credit: 5,996,015
RAC: 1
Germany
Message 1301202 - Posted: 2 Nov 2012, 7:09:27 UTC

I have been monitoring my rig and have no issues.
Except Mark is partnered with me on of my inconclusives. I sure it is stuck in the upload problem.
Pluto will always be a planet to me.

Seti Ambassador
Not to late to order an Anni Shirt
ID: 1301202 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1301210 - Posted: 2 Nov 2012, 8:08:17 UTC
Last modified: 2 Nov 2012, 8:10:00 UTC

What about the idea of perhaps assigning so-called "shorties" tasks to the slower hosts?

Most of these "shorties" probably do not return results of any particular significance. Therefore the more powerful hosts (computers) could be assigned to be doing the longer tasks which are carrying out the gaussian search as well as possibly the AstroPulse (AP) tasks.
ID: 1301210 · Report as offensive
Profile Uli
Volunteer tester
Avatar

Send message
Joined: 6 Feb 00
Posts: 10923
Credit: 5,996,015
RAC: 1
Germany
Message 1301213 - Posted: 2 Nov 2012, 8:17:12 UTC
Last modified: 2 Nov 2012, 8:19:15 UTC

MP all our science counts. Your facinasion with Guassions astonds me. When in time we will find the signal, everyone will win.
Pluto will always be a planet to me.

Seti Ambassador
Not to late to order an Anni Shirt
ID: 1301213 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1301282 - Posted: 2 Nov 2012, 15:45:44 UTC

Mark, I noticed something last night that I think you're talking about. I reported one completed AP and the website showed it as reported, but BOINC didn't get the memo and instead.. "timeout was reached." The next report cleared it up, but every time you talk to the scheduler, your client_state has to be sent. I imagine for the faster rigs, it is probably in the >1MB range, so I agree.. wasted bandwidth for scheduler time-outs.

Kind of like cramming 16MB of data through the pipe only to have it be 100% blanked.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1301282 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1301288 - Posted: 2 Nov 2012, 15:54:01 UTC - in response to Message 1301282.  
Last modified: 2 Nov 2012, 15:54:59 UTC

Mark, I noticed something last night that I think you're talking about. I reported one completed AP and the website showed it as reported, but BOINC didn't get the memo and instead.. "timeout was reached." The next report cleared it up, but every time you talk to the scheduler, your client_state has to be sent. I imagine for the faster rigs, it is probably in the >1MB range, so I agree.. wasted bandwidth for scheduler time-outs.

Kind of like cramming 16MB of data through the pipe only to have it be 100% blanked.

Yeah....
All the data gets sent through the pipe, and then the server fumbles the ball......more data then transferred trying to determine who recovered the fumble.

Uhh...back to 1st down and goal to go.

More bandwidth consumed on the next attempt.

4th down......how many times are they gonna try?

Oh crap, they missed the field goal.

1st down.......more bandwidth consumed and we still have not scored.

You get my drift. When things get tangled like this, it's a downward spiral. More bandwidth gets used trying to recover fumbles than moving the dang ball.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1301288 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1301293 - Posted: 2 Nov 2012, 16:01:11 UTC

Now I have over 3000 phantom wu's up from about 2k last night. This is about 5x my normal caches size. Should I set NNT? Or just let it do its thing?
ID: 1301293 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1301297 - Posted: 2 Nov 2012, 16:08:36 UTC - in response to Message 1301202.  

I have been monitoring my rig and have no issues.
Except Mark is partnered with me on of my inconclusives. I sure it is stuck in the upload problem.

I'm having less trouble uploading than reporting, something needs some computer exlax for the reporting...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1301297 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1301299 - Posted: 2 Nov 2012, 16:10:48 UTC - in response to Message 1301293.  

Now I have over 3000 phantom wu's up from about 2k last night. This is about 5x my normal caches size. Should I set NNT? Or just let it do its thing?

The kitties are just letting Boinc do it's thing.

The only change I made a while back when the scheduler was tied in knots was add a bit to my cc_config file to report only 100 WUs at a time. That helped at the time, but it is not improving things much right now.

"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1301299 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1301305 - Posted: 2 Nov 2012, 16:24:57 UTC - in response to Message 1301282.  

(...) but every time you talk to the scheduler, your client_state has to be sent. I imagine for the faster rigs, it is probably in the >1MB range, so I agree.. wasted bandwidth for scheduler time-outs.

sched_request_setiathome.berkeley.edu.xml is send to the scheduler and should be quite a bit smaller than the client_state.xml since it doesn't contain all the information about all files, other projects and only very sparse information about all SETI tasks on that machine.
ID: 1301305 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1301320 - Posted: 2 Nov 2012, 17:09:22 UTC - in response to Message 1301299.  

Mark, I suspect, however, that the kitties are not purring right now. ;)

By the way, I did do one thing to help alleviate the obviously gigantic mess developing between the SETI host and its clients. I suspended network activity on all my machines.

I always keep my caches maxed out to handle these emergencies, so I'm good-to-go for at least a week. It'll be awhile before any of my units time out. When the dust settles and traffic is back to normal, I'll just let my rigs report their stored results one machine at a time. :)
ID: 1301320 · Report as offensive
Wibble

Send message
Joined: 25 Nov 02
Posts: 4
Credit: 1,168,325
RAC: 0
United Kingdom
Message 1301354 - Posted: 2 Nov 2012, 18:29:08 UTC - in response to Message 1301112.  
Last modified: 2 Nov 2012, 18:30:50 UTC

Maybe the last 20 hours or something no well scheduler contact. Always:
Scheduler request failed: Timeout was reached


*new tasks* was enabled.

I set *no new tasks* - and then 178 uploaded tasks were accepted from the scheduler server in a bunch (successful report).


That also worked for me, my last successful scheduler request was on the 30th, I just set 'no new tasks,' clicked 'update' and the request/report went through straight away.

I think I'll leave 'no new tasks' set for a while.
ID: 1301354 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1301358 - Posted: 2 Nov 2012, 18:33:37 UTC - in response to Message 1301354.  

That also worked for me, my last successful scheduler request was on the 30th, I just set 'no new tasks,' clicked 'update' and the request/report went through straight away.

I've tried it with NNT set, and without.
I think the only advantage of NNT set is you can click update as soon as you get the timeout response from the Scheduler without it complaining about it being too soon.


The fact is that overnight neither of my systems were able to get a reponse from the Scheduler that wasn't a timeout. Completed work to be reported piles up, and my caches get smaller & smaller.
Grant
Darwin NT
ID: 1301358 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1301392 - Posted: 2 Nov 2012, 20:18:24 UTC - in response to Message 1301358.  


I hope they can sort this Scheduler out soon. With all the shorties in the system, and the inabilty to get work on more than 1 attempt in 20, i'm going to run out of work very quickly when almost everything that does get downloaded will be done in minutes.
Grant
Darwin NT
ID: 1301392 · Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · Next

Message boards : Number crunching : Panic Mode On (77) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.