Panic Mode On (78) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 22 · Next
Author Message
Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 3966
Credit: 31,948,659
RAC: 13,397
United Kingdom
Message 1302921 - Posted: 7 Nov 2012, 0:20:24 UTC - in response to Message 1302915.

You might consider not requesting new work until the folks back at the farm figure out what has been in 'mangled condition' regarding the scheduler for the past week or so.


They had all day to figure that out. looks like they haven’t ...
Internal server error is an important message that justifies posting.
Your suggestion has been considered <roll>

That's fairly normal on a recovery from an outage, if you have <max_tasks_reported> set low enough it'll go through:

07/11/2012 00:16:15 SETI@home [sched_op_debug] Starting scheduler request
07/11/2012 00:16:15 SETI@home Sending scheduler request: Requested by user.
07/11/2012 00:16:15 SETI@home Reporting 10 completed tasks, not requesting new tasks
07/11/2012 00:16:15 SETI@home [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
07/11/2012 00:16:15 SETI@home [sched_op_debug] NVIDIA GPU work request: 0.00 seconds; 0.00 GPUs
07/11/2012 00:16:15 SETI@home [sched_op_debug] ATI GPU work request: 0.00 seconds; 0.00 GPUs
07/11/2012 00:16:55 SETI@home Scheduler request completed
07/11/2012 00:16:55 SETI@home [sched_op_debug] Server version 701
07/11/2012 00:16:55 SETI@home Project requested delay of 303 seconds
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.156_0
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.152_0
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.150_0
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.149_1
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.148_1
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.144_0
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.138_1
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.131_1
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.130_1
07/11/2012 00:16:55 SETI@home [sched_op_debug] handle_scheduler_reply(): got ack for result 23jn11ae.18408.22971.140733193388044.10.128_0
07/11/2012 00:16:55 SETI@home [sched_op_debug] Deferring communication for 5 min 3 sec
07/11/2012 00:16:55 SETI@home [sched_op_debug] Reason: requested by project

Claggy

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,478,226
RAC: 5,176
United States
Message 1302930 - Posted: 7 Nov 2012, 0:39:36 UTC - in response to Message 1302921.

Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

I'm sure the folks back at the ranch, once they figure out what the issue is with the scheduler specific to this problem (not the 'normal 4 hour outage 4 hour recovery Tuesday cycle'), will work toward resolving it and let folks know they've spotted it.

bill
Send message
Joined: 16 Jun 99
Posts: 848
Credit: 20,700,918
RAC: 18,354
United States
Message 1302938 - Posted: 7 Nov 2012, 1:05:41 UTC - in response to Message 1302930.

Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

(snipped)


It doesn't hurt that other projects hand out credit at a higher rate than Seti does, if that sort of thing is important to you.

BarryAZ
Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 11,478,226
RAC: 5,176
United States
Message 1302941 - Posted: 7 Nov 2012, 1:17:22 UTC - in response to Message 1302938.

Actually, those projects I noted are not all that credit generous - at least not for CPU tasks. POEM's GPU task credit is likely above that of SETI == but below that of many other GPU projects.




Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

(snipped)


It doesn't hurt that other projects hand out credit at a higher rate than Seti does, if that sort of thing is important to you.

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 739
Credit: 22,272,598
RAC: 26,926
United States
Message 1302946 - Posted: 7 Nov 2012, 1:35:14 UTC
Last modified: 7 Nov 2012, 1:35:36 UTC

I got through and reported the first 50 while on No New Tasks, but the following efforts failed. I can't get to my tasks page, and Replica is way behind. Going to be a while....

Hoping the new task limits have gone away. See comments in this thread before the maintenance window.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

bill
Send message
Joined: 16 Jun 99
Posts: 848
Credit: 20,700,918
RAC: 18,354
United States
Message 1302950 - Posted: 7 Nov 2012, 2:07:28 UTC - in response to Message 1302941.

Einstien's handing out 2000 credits about every 55 minutes

on my dual GTX560Ti's at the moment. That's a bit better than Seti was doing.

I'm still working on a 7 day cache of APs on the cpu.

bluestar
Send message
Joined: 5 Sep 12
Posts: 204
Credit: 1,175,841
RAC: 14
Message 1302953 - Posted: 7 Nov 2012, 2:35:39 UTC
Last modified: 7 Nov 2012, 2:40:31 UTC

So you are thinking that you are observing or watching in the sky?

Or at least trying to detect an intelligent from another civilization.

Could they perhaps be moving a little bit around?

Anyway, right now I have finished off my short tasks here a little while ago and now have 18 tasks that carry out the gaussian search.

Therefore it takes a little more time to finish up these tasks.

The question is then - are such tasks based on short tasks which has already been completed and now have been resent to users because they returned better numbers, possibly having a new task name or designation?

Or does the tasks carrying out the gaussian search need a separate observation or re-observation on their own?

Thanks for explaining this to me!

Profile betreger
Avatar
Send message
Joined: 29 Jun 99
Posts: 1759
Credit: 3,649,795
RAC: 8,063
United States
Message 1302975 - Posted: 7 Nov 2012, 4:49:20 UTC

Is a validater stuck?
____________

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 20,631,346
RAC: 29,979
United States
Message 1302985 - Posted: 7 Nov 2012, 5:19:55 UTC
Last modified: 7 Nov 2012, 5:26:18 UTC

Those of us having trouble reporting tasks might find this helpful. I did a little experimenting this evening.

I had about 150 tasks completed and waiting to report. Multiple attempts to upload with max_tasks_reported set to 0 (everything) failed as expected.
Then I tried max_tasks_reported set to 10. It worked fine, also as expected.
Next I tried max_tasks_reported set to 20. It also worked.
Now I tried max_tasks_reported set to 40. Once again, success.

Well, I thought, let's see where the upper limit is.

So, I tried max_tasks_reported set to 110. It failed.
Now I tried max_tasks_reported set to 100. IT WORKED!

Although not 100% conclusive, it would appear that setting max_tasks_reported to anything from 10 to 100 should work when we are having these reporting troubles.

I hope this is helpful. :)

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 369
Credit: 2,488,698
RAC: 2,388
United States
Message 1302988 - Posted: 7 Nov 2012, 5:32:35 UTC

Well I'm getting scheduler timeouts again. I'll try connecting again tomorrow morning, should have a slew of units done by then since I just hit my massive shorty stack of units.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 20,631,346
RAC: 29,979
United States
Message 1302992 - Posted: 7 Nov 2012, 5:41:55 UTC - in response to Message 1302988.
Last modified: 7 Nov 2012, 5:42:55 UTC

Keith,
Try setting your cc_config.xml file's <max_tasks_reported>0</max_tasks_reported> line to:
<max_tasks_reported>100</max_tasks_reported>
Then give the old manual Update button a push or two. No new tasks does -not- need to be on. I think you will find up to 100 of your completed tasks will report just fine.

Of course, getting new tasks is an entirely different, and more difficult, problem.

David :)

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1303001 - Posted: 7 Nov 2012, 6:05:20 UTC

ok, now I'm getting a whole cache full of resends... I was not aware I had any lost tasks in the first place .. my cache was empty no ghosts. weird

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,634,014
RAC: 44,440
Australia
Message 1303003 - Posted: 7 Nov 2012, 6:15:34 UTC - in response to Message 1302992.
Last modified: 7 Nov 2012, 6:20:41 UTC

Whatever they did during the outage, i wish they would undo it.
Before, no matter how bad the netwrok traffic was, i was still able to download work even if it was very, very, very slowly.

Now all that happens is the elapsed time starts counting, but nothing actually downloads.
After about 4 minutes (for a couple of WUs) they eventually started to download. I've got the rest of them stitting there, 6 minutes & counting & not even a bit has transfered yet.

Eventually they downloaded, although one of them took a couple of eventual timeouts & 12 minutes before it started to download.


EDIT- i am getting less Scheduler timeouts & couldn't connect to servers than before, but there are still plenty of them. Probably half of the Scheduler requests are resulting in an error.
____________
Grant
Darwin NT.

Profile S@NL Etienne Dokkum
Volunteer tester
Avatar
Send message
Joined: 11 Jun 99
Posts: 155
Credit: 12,963,366
RAC: 25,902
Netherlands
Message 1303010 - Posted: 7 Nov 2012, 6:43:27 UTC - in response to Message 1302992.


Of course, getting new tasks is an entirely different, and more difficult, problem.

David :)



The max. number of tasks is set ridiculous low... I think I have about 2.5 days left and the schudeler still tells me I can't get new WÚ's
____________

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 369
Credit: 2,488,698
RAC: 2,388
United States
Message 1303022 - Posted: 7 Nov 2012, 8:00:58 UTC
Last modified: 7 Nov 2012, 8:02:57 UTC

Thanks David but I'm seeing the same behavior as before. The scheduler is reporting failure yet the results I uploads have been validated, just my client doesn't know that. Also it appears I'm building yet another ghost army.

It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right.

<cc_config>
<options>
<max_tasks_reported>100</max_tasks_reported>
</options>
</cc_config>

I'm off to a late night gym excursion before I get socked in by the arrival of winter so I'll check tomorrow for a reply.

Thanks ahead of time.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 739
Credit: 22,272,598
RAC: 26,926
United States
Message 1303061 - Posted: 7 Nov 2012, 12:07:28 UTC

It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right.

<cc_config>
<options>
<max_tasks_reported>100</max_tasks_reported>
</options>
</cc_config>


That looks okay - and yes, it belongs on the top level data dirctory, not one of the project directories or the program directory. Scheduler was modified a few weeks ago to accept a max of 64, so there's little point in using higher values unless you run other projects and need it there.

Project is running inconsistently. Looking at my log for last 8 hours, I see failure to connect, timeouts, no tasks available, over the limit messages, and I got 18 cpu tasks overnight. I couldn't get any for 24 hours before that. I'm still below limit for cpu work and down to 99 cpu tasks for 6 cores. Could be a problem here, so I'm looking for corroboration that Scheduler is refusing you when you're certain that you are below the limit (50/cpu core and 400/gpu). Hard to tell with the variety of responses and connection issues.
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1303066 - Posted: 7 Nov 2012, 12:29:15 UTC

Limits now are NOT 50/cpu core, they are fixed and less than 200 WUs per HOST, BTW.
So that was the fix.
____________

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 739
Credit: 22,272,598
RAC: 26,926
United States
Message 1303069 - Posted: 7 Nov 2012, 12:45:20 UTC

Limits now are NOT 50/cpu core, they are fixed and less than 200 WUs per HOST, BTW.
So that was the fix.

Okay, that would explain what we were discussing yesterday in this thread. I got a few more taking me to 100 cpu tasks, and then got the limits message on the next request - so maybe that's the number? Have you seen anything posted on this? Any idea on gpu limit?
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Paul Bowyer
Send message
Joined: 15 Aug 99
Posts: 8
Credit: 50,511,955
RAC: 130,524
United States
Message 1303071 - Posted: 7 Nov 2012, 12:52:39 UTC - in response to Message 1303069.

Ever since the splitters were turned back on, I have been consistantly held to exactly 100 CPU and 100 GPU.
____________

Profile Ronald R CODNEY
Avatar
Send message
Joined: 19 Nov 11
Posts: 87
Credit: 420,280
RAC: 3
United States
Message 1303084 - Posted: 7 Nov 2012, 13:16:07 UTC

W O W,,,A/P city. 7666 in the queue for all to be had. Maybe now, the bee-atching will turn to yelps of glee.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Copyright © 2014 University of California