Panic Mode On (78) Server Problems?

Message boards : Number crunching : Panic Mode On (78) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 22 · Next

AuthorMessage
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 1302930 - Posted: 7 Nov 2012, 0:39:36 UTC - in response to Message 1302921.  

Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

I'm sure the folks back at the ranch, once they figure out what the issue is with the scheduler specific to this problem (not the 'normal 4 hour outage 4 hour recovery Tuesday cycle'), will work toward resolving it and let folks know they've spotted it.

ID: 1302930 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1302938 - Posted: 7 Nov 2012, 1:05:41 UTC - in response to Message 1302930.  

Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

(snipped)


It doesn't hurt that other projects hand out credit at a higher rate than Seti does, if that sort of thing is important to you.
ID: 1302938 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 1302941 - Posted: 7 Nov 2012, 1:17:22 UTC - in response to Message 1302938.  

Actually, those projects I noted are not all that credit generous - at least not for CPU tasks. POEM's GPU task credit is likely above that of SETI == but below that of many other GPU projects.




Cleggy, I realize that -- but the problem which has persisted for a week or so seems unrelated to the 4 hour Tuesday outage. A number of people have reported it over the past week or more. Some are getting a tad frustrated. As for me, like I said, that's what other projects are for. Einstein, POEM and Malaria are getting a bit more of my CPU cycles for now.

(snipped)


It doesn't hurt that other projects hand out credit at a higher rate than Seti does, if that sort of thing is important to you.

ID: 1302941 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1302946 - Posted: 7 Nov 2012, 1:35:14 UTC
Last modified: 7 Nov 2012, 1:35:36 UTC

I got through and reported the first 50 while on No New Tasks, but the following efforts failed. I can't get to my tasks page, and Replica is way behind. Going to be a while....

Hoping the new task limits have gone away. See comments in this thread before the maintenance window.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1302946 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1302950 - Posted: 7 Nov 2012, 2:07:28 UTC - in response to Message 1302941.  

Einstien's handing out 2000 credits about every 55 minutes

on my dual GTX560Ti's at the moment. That's a bit better than Seti was doing.

I'm still working on a 7 day cache of APs on the cpu.
ID: 1302950 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 7011
Credit: 2,084,789
RAC: 3
Message 1302953 - Posted: 7 Nov 2012, 2:35:39 UTC
Last modified: 7 Nov 2012, 2:40:31 UTC

So you are thinking that you are observing or watching in the sky?

Or at least trying to detect an intelligent from another civilization.

Could they perhaps be moving a little bit around?

Anyway, right now I have finished off my short tasks here a little while ago and now have 18 tasks that carry out the gaussian search.

Therefore it takes a little more time to finish up these tasks.

The question is then - are such tasks based on short tasks which has already been completed and now have been resent to users because they returned better numbers, possibly having a new task name or designation?

Or does the tasks carrying out the gaussian search need a separate observation or re-observation on their own?

Thanks for explaining this to me!
ID: 1302953 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1302975 - Posted: 7 Nov 2012, 4:49:20 UTC

Is a validater stuck?
ID: 1302975 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1302985 - Posted: 7 Nov 2012, 5:19:55 UTC
Last modified: 7 Nov 2012, 5:26:18 UTC

Those of us having trouble reporting tasks might find this helpful. I did a little experimenting this evening.

I had about 150 tasks completed and waiting to report. Multiple attempts to upload with max_tasks_reported set to 0 (everything) failed as expected.
Then I tried max_tasks_reported set to 10. It worked fine, also as expected.
Next I tried max_tasks_reported set to 20. It also worked.
Now I tried max_tasks_reported set to 40. Once again, success.

Well, I thought, let's see where the upper limit is.

So, I tried max_tasks_reported set to 110. It failed.
Now I tried max_tasks_reported set to 100. IT WORKED!

Although not 100% conclusive, it would appear that setting max_tasks_reported to anything from 10 to 100 should work when we are having these reporting troubles.

I hope this is helpful. :)
ID: 1302985 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1302988 - Posted: 7 Nov 2012, 5:32:35 UTC

Well I'm getting scheduler timeouts again. I'll try connecting again tomorrow morning, should have a slew of units done by then since I just hit my massive shorty stack of units.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1302988 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1302992 - Posted: 7 Nov 2012, 5:41:55 UTC - in response to Message 1302988.  
Last modified: 7 Nov 2012, 5:42:55 UTC

Keith,
Try setting your cc_config.xml file's <max_tasks_reported>0</max_tasks_reported> line to:
<max_tasks_reported>100</max_tasks_reported>
Then give the old manual Update button a push or two. No new tasks does -not- need to be on. I think you will find up to 100 of your completed tasks will report just fine.

Of course, getting new tasks is an entirely different, and more difficult, problem.

David :)
ID: 1302992 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1303001 - Posted: 7 Nov 2012, 6:05:20 UTC

ok, now I'm getting a whole cache full of resends... I was not aware I had any lost tasks in the first place .. my cache was empty no ghosts. weird
ID: 1303001 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1303003 - Posted: 7 Nov 2012, 6:15:34 UTC - in response to Message 1302992.  
Last modified: 7 Nov 2012, 6:20:41 UTC

Whatever they did during the outage, i wish they would undo it.
Before, no matter how bad the netwrok traffic was, i was still able to download work even if it was very, very, very slowly.

Now all that happens is the elapsed time starts counting, but nothing actually downloads.
After about 4 minutes (for a couple of WUs) they eventually started to download. I've got the rest of them stitting there, 6 minutes & counting & not even a bit has transfered yet.

Eventually they downloaded, although one of them took a couple of eventual timeouts & 12 minutes before it started to download.


EDIT- i am getting less Scheduler timeouts & couldn't connect to servers than before, but there are still plenty of them. Probably half of the Scheduler requests are resulting in an error.
Grant
Darwin NT
ID: 1303003 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1303010 - Posted: 7 Nov 2012, 6:43:27 UTC - in response to Message 1302992.  


Of course, getting new tasks is an entirely different, and more difficult, problem.

David :)



The max. number of tasks is set ridiculous low... I think I have about 2.5 days left and the schudeler still tells me I can't get new WÚ's
ID: 1303010 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1303022 - Posted: 7 Nov 2012, 8:00:58 UTC
Last modified: 7 Nov 2012, 8:02:57 UTC

Thanks David but I'm seeing the same behavior as before. The scheduler is reporting failure yet the results I uploads have been validated, just my client doesn't know that. Also it appears I'm building yet another ghost army.

It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right.

<cc_config>
<options>
<max_tasks_reported>100</max_tasks_reported>
</options>
</cc_config>

I'm off to a late night gym excursion before I get socked in by the arrival of winter so I'll check tomorrow for a reply.

Thanks ahead of time.
"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1303022 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1303061 - Posted: 7 Nov 2012, 12:07:28 UTC

It appears I don't have a cc_config.xml on my system already. I believe it goes in programdata\boinc (I'm running Win 7 64-bit) but it didn't seem to help (yes I shut the BOINC Manager down and restarted it). I'll paste the cc_config.xml below to get an opinion if I built it right.

<cc_config>
<options>
<max_tasks_reported>100</max_tasks_reported>
</options>
</cc_config>


That looks okay - and yes, it belongs on the top level data dirctory, not one of the project directories or the program directory. Scheduler was modified a few weeks ago to accept a max of 64, so there's little point in using higher values unless you run other projects and need it there.

Project is running inconsistently. Looking at my log for last 8 hours, I see failure to connect, timeouts, no tasks available, over the limit messages, and I got 18 cpu tasks overnight. I couldn't get any for 24 hours before that. I'm still below limit for cpu work and down to 99 cpu tasks for 6 cores. Could be a problem here, so I'm looking for corroboration that Scheduler is refusing you when you're certain that you are below the limit (50/cpu core and 400/gpu). Hard to tell with the variety of responses and connection issues.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1303061 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1303066 - Posted: 7 Nov 2012, 12:29:15 UTC

Limits now are NOT 50/cpu core, they are fixed and less than 200 WUs per HOST, BTW.
So that was the fix.
ID: 1303066 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1303069 - Posted: 7 Nov 2012, 12:45:20 UTC

Limits now are NOT 50/cpu core, they are fixed and less than 200 WUs per HOST, BTW.
So that was the fix.

Okay, that would explain what we were discussing yesterday in this thread. I got a few more taking me to 100 cpu tasks, and then got the limits message on the next request - so maybe that's the number? Have you seen anything posted on this? Any idea on gpu limit?
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1303069 · Report as offensive
Paul Bowyer
Volunteer tester

Send message
Joined: 15 Aug 99
Posts: 11
Credit: 137,603,890
RAC: 0
United States
Message 1303071 - Posted: 7 Nov 2012, 12:52:39 UTC - in response to Message 1303069.  

Ever since the splitters were turned back on, I have been consistantly held to exactly 100 CPU and 100 GPU.
ID: 1303071 · Report as offensive
Profile Ronald R CODNEY
Avatar

Send message
Joined: 19 Nov 11
Posts: 87
Credit: 420,920
RAC: 0
United States
Message 1303084 - Posted: 7 Nov 2012, 13:16:07 UTC

W O W,,,A/P city. 7666 in the queue for all to be had. Maybe now, the bee-atching will turn to yelps of glee.
ID: 1303084 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1303135 - Posted: 7 Nov 2012, 14:51:45 UTC - in response to Message 1303061.  

Fred,
It appears that the 64 unit reporting limit you were told has already been changed by the staff. If you look at my post http://setiathome.berkeley.edu/forum_thread.php?id=69890&postid=1302985 and Paul's post http://setiathome.berkeley.edu/forum_thread.php?id=69890&postid=1303071 it would appear that they have now set the limit to 100 CPU, 100 GPU and 100 reports, or, probably, 100 on everything.

Sadly, this is not going to fix what is obviously a different problem, as the system has proven for a long time that it can handle far larger numbers of uploads, downloads, and caches per host. As most of us know, this whole problem has been recently introduced and will require a specific fix. Limiting our hosts only compounds the problem by forcing our hosts to hit the servers far more often.

I wish Matt were back, as this is his expertise (I know this to be true from my extensive, personally conducted tour of SETI and visit with him not too long ago). He would be able to track down the problem faster and patch things up quicker. Furthermore, there is also just too much work for the remaining staff to tackle problems quickly, and we are beginning to see the results of this.

Oh, for a government that would realize the importance of and would fund true scientific research, or, should I say, true scientific research that does not lead to more powerful weapons and surveillance technology!

But I digress... If you only knew what I knew....... (that almost sounds like lyrics to a song, lol!)
ID: 1303135 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.