Panic Mode On (78) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next
Author Message
Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 732
Credit: 22,246,415
RAC: 27,787
United States
Message 1304017 - Posted: 9 Nov 2012, 15:20:39 UTC
Last modified: 9 Nov 2012, 15:21:26 UTC

However since I have two GPUs that should have close to 200 WUs and it is getting down to 100 shortly.


There hasn't been any hard info on the limits, just going with what others are reporting. I got down to 100 cpu yesterday at about this time, and run on NNT most of the time i request work when I get down 15 or 20, and when I'm lucky and get a successful work request, the server has replenished me to exactly 100. Not down there on GPU yet, and there haven't been many posts to confirm that limit. Still assuming it is per gpu and your limit s/b 200 a. Not sure if ghosts are causing your problem.

With so many of our loyal SETI community upset and frustrated over the current situation, perhaps it would be a good time for one of the SETI staff to take a few minutes to let us know what their ideas are about the problem, why they have limited us so severely, and if they have been able to determine a way to fix things and release the restrictions.

+1. I'm sure they are working on it, but...
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37450
Credit: 502,273,448
RAC: 568,772
United States
Message 1304031 - Posted: 9 Nov 2012, 16:09:35 UTC - in response to Message 1303992.
Last modified: 9 Nov 2012, 16:10:49 UTC



As Richard points out depending on when you access the page it can be up to 9:59 behind!

Not quite true.
The status page typically updates every 10 minutes...
But, for reasons unknown to me, it will sometimes only update every 20 minutes.

EDIT...
Ooops. Sorry, I see somebody already mentioned that.
____________
******************
Just a kittyman kinda guy.

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 732
Credit: 22,246,415
RAC: 27,787
United States
Message 1304039 - Posted: 9 Nov 2012, 16:24:31 UTC

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Cherokee150
Send message
Joined: 11 Nov 99
Posts: 103
Credit: 20,590,809
RAC: 29,714
United States
Message 1304044 - Posted: 9 Nov 2012, 16:34:36 UTC - in response to Message 1304039.

While I cannot confirm it, I will say that, based on the way most SETI logic has been written, the limits are most likely 100 CPU units per host and 100 GPU units per host. I know this is not what any of us want to hear, but I suspect it is the case. :(

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37450
Credit: 502,273,448
RAC: 568,772
United States
Message 1304045 - Posted: 9 Nov 2012, 16:35:12 UTC - in response to Message 1304039.

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?

I can't...
The kitties are still burning off cache. And I don't use any 3rd party software to monitor the 9 rigs, so it's hard for me to know exactly how what I actually have on hand compares to what the servers think I have on hand.
____________
******************
Just a kittyman kinda guy.

Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44592
Credit: 35,454,505
RAC: 9,051
Message 1304048 - Posted: 9 Nov 2012, 16:40:33 UTC - in response to Message 1303752.

Tried that route as well. It works ( SOMETIMES)

I surely hope this problem gets ironed out.

Me too, maybe an Acme anvil dropped on it would flatten the problem. ;)

Just kidding of course.
____________

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1304113 - Posted: 9 Nov 2012, 18:30:43 UTC

Me thinks an anvil might not be adequate :

rob smith
Volunteer moderator
Send message
Joined: 7 Mar 03
Posts: 7694
Credit: 45,181,840
RAC: 78,937
United Kingdom
Message 1304145 - Posted: 9 Nov 2012, 19:15:44 UTC

APs are NOT the whole problem. One problem is the crazy long back-off that BOINC throws up. Any reasonable model of a try/re-try/back-off delivery system shows that the very worst thing you can do is back-off for a long time - all it does it makes things worse, far worse. The best solution is actually to set the re-try/back-off time to around 50-75% of the server swap time (remember the download servers swap over every five minutes or so). This time must be properly random, and not the "high end biased" random that is currently in use.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Bill G
Avatar
Send message
Joined: 1 Jun 01
Posts: 341
Credit: 30,418,719
RAC: 78,926
United States
Message 1304161 - Posted: 9 Nov 2012, 19:49:41 UTC - in response to Message 1304039.

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?

Looks like that is what it is.
____________

MikeN
Send message
Joined: 24 Jan 11
Posts: 286
Credit: 25,786,941
RAC: 34,854
United Kingdom
Message 1304172 - Posted: 9 Nov 2012, 20:30:36 UTC

Appologies if I have missed something, been busy this week, but is there not a major logic problem here? I have just run Boinc Rescheduler on my No2 cruncher and it tells me I have 324 tasks in progress whilst my SETI account page for this PC shows 426 in progress, so presumably 102 ghosts. Cruncher No2 cannot currently get any more tasks off SETI as it is over the newly imposed limit, so it cannot download these ghosts and get them off the system. For this cruncher the situation will presumably eventually resolve itself as it has a GPU and so should be allowed a total of 200 WUs. So when the SETI total in progress drops below 200 I assume the ghosts will be resent.

However, by analogy, my No1 cruncher with a GTX460 graphics card probably has arround 1000 ghosts out of 4000 WU in total at present. It will never be able to get down to 200 WU as it has more than that many ghosts and SETI will not give it the ghosts until it has less than 200 in total! Eventually therefore it will end up doing nothing (or more likely crunching Einstein) and the 1000 ghosts will stay in the SETI system for ever.
____________

WezH
Volunteer tester
Send message
Joined: 19 Aug 99
Posts: 78
Credit: 3,053,044
RAC: 18,379
Finland
Message 1304176 - Posted: 9 Nov 2012, 20:36:09 UTC - in response to Message 1304039.

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?

Can't talk about GPU, but CPU limit is 100. Running new* installation of BOINC in AMD 6-core processor and 100 tasks.

*As new install of BOINC 7.0.31 on which used 7.0.28 client.

And now... out of 100, 99 tasks are ghosts.
____________

Profile Bernie Vine
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6615
Credit: 22,464,922
RAC: 14,283
United Kingdom
Message 1304192 - Posted: 9 Nov 2012, 21:13:03 UTC

With so many of our loyal SETI community upset and frustrated over the current situation, perhaps it would be a good time for one of the SETI staff to take a few minutes to let us know what their ideas are about the problem, why they have limited us so severely, and if they have been able to determine a way to fix things and release the restrictions.

+1. I'm sure they are working on it, but...


Too late for me I have given up waiting for any meaningful communication from the project. I can no longer justify running 4 crunchers without really having any idea what is going on.

Whatever the reasons it is not good enough.
____________


Today is life, the only life we're sure of. Make the most of today.

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4620
Credit: 234,942,357
RAC: 364,174
Brazil
Message 1304195 - Posted: 9 Nov 2012, 21:23:48 UTC
Last modified: 9 Nov 2012, 21:31:51 UTC

Because the inability to cure the disease, they are choosing to kill the patient.

This is very, very sad...
____________

bill
Send message
Joined: 16 Jun 99
Posts: 848
Credit: 20,679,859
RAC: 17,986
United States
Message 1304199 - Posted: 9 Nov 2012, 21:37:56 UTC

I figure, in about 4 hours, our little yellow
feathered friend will show up to join the party.

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4141
Credit: 1,005,254
RAC: 266
United States
Message 1304201 - Posted: 9 Nov 2012, 21:42:29 UTC - in response to Message 1303999.

Questions:

1. Would it be possible for Eric (or someone) to flush all the ghosts out of the scheduler? (I.e., somehow identify them as ghosts and deassign them so the scheduler isn't trying to keep track of so many.)

2. If it's possible, would it help the situation any to do it?

In theory that would be possible if the download server logs are detailed enough. The general idea would be that for a task marked sent at a particular time, there ought to be a corresponding download of the WU. The download server would only know the IP address of the system which asked for the file, hopefully that would most times match the address in the host record. If not, perhaps simply checking that there were as many completed downloads as tasks assigned for a WU would be an adequate fallback.

Whether it's practical is another matter. I guess it would take at least a day of programming effort trying to foresee all the possible problems, and another day testing with copies of the download server logs and data extracted from the BOINC database to see if the list of probable ghosts produced makes sense.

I have too foggy a view of what the problem really is to try to guess whether the effort would help. Eric and Jeff of course have much better knowledge of what's going on with the servers.

------------------------

The most puzzling aspect I see is that if the lost work checking is performed for every work request, it ought to be impossible for one host to have more than ~200 ghosts. That is, as soon as there are any ghosts no new tasks should be assigned until those ghosts are turned into real live tasks the host can report as other_results in its scheduler requests. Richard Haselgrove's post to boinc_dev last Sunday made it clear enough, but perhaps Dr. Anderson failed to look into it once Eric had implemented damage containment changes.
Joe

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,573,886
RAC: 44,072
Australia
Message 1304203 - Posted: 9 Nov 2012, 21:46:31 UTC - in response to Message 1304161.

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?

Looks like that is what it is.

A shame if it is the case.

Would be better if they could have just overridden people's cache settings. Set it to 2 days for now, that will get us through most outages.
Although given the difference in work fetch between v6.x & v7.x clients that probably wouldn't work to well either.

The present settings will mean i'll run out of GPU work during the weekly outages. Probably CPU work as well on my i7 if they're mostly shorties.
____________
Grant
Darwin NT.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 45,009,870
RAC: 13,693
United Kingdom
Message 1304236 - Posted: 9 Nov 2012, 23:01:07 UTC - in response to Message 1304201.

The most puzzling aspect I see is that if the lost work checking is performed for every work request, it ought to be impossible for one host to have more than ~200 ghosts. That is, as soon as there are any ghosts no new tasks should be assigned until those ghosts are turned into real live tasks the host can report as other_results in its scheduler requests. Richard Haselgrove's post to boinc_dev last Sunday made it clear enough, but perhaps Dr. Anderson failed to look into it once Eric had implemented damage containment changes.
Joe

Joe, you have some extra background by email.

Profile Brother Frank
Send message
Joined: 10 Dec 11
Posts: 26
Credit: 15,142,410
RAC: 0
United States
Message 1304254 - Posted: 10 Nov 2012, 1:30:30 UTC

What SETI Really Needs And Our Current Data Problems $350 Gift On The Way

By Frank Elliott, Brother Frank on Seti@Home

It's very discouraging to be purchasing computers especially designed or manufactured to give my wife and I a high level of Seti @ Home signal crunching and read very little about any efforts being made to solve problems undercutting our ability to do work for the project. However, I think the problem is not likely that of the dedicated staff and scientists at Seti not wanting to communicate with us. I think instead it is a funding problem at the personnel level and at the equipment and communication level too. I think too that many of us may want to and be able to step in for funding help. The difficulty I see is that I really don't know what would be needed to get the project running well in terms of handling the huge amount of data that needs to be analyzed. We probably need an interim funding goal and a continued yearly funding goal. I don't know either if there are gifted people willing to step in, but who absolutely must have it be a paid job so that they can support themselves and their families. There is a severe loss in productivity and turnover if the project relies on too many part-time and volunteer positions at headquarters. Continuity and staff knowledge and skills are so important in the medium and long term.

I am able to do the following. I have about 200 dollars in one of my bonus accounts for a credit card and can use it to pay the bill for that account. I'll use it this November and send the roughly 200 I didn't have to pay to Seti. I also have another 150 bucks from a bank bonus account for reaching some goals. I will send that to Seti too. I hope some of you or maybe many of you who can will send money you have in bonus or credit giving card accounts to Seti. So, the next business day, On next Monday, I will send $350 to Seti. I am wanting Seti to dedicate $175 of that money toward staff and $175 toward equipment if they can. However, this first year, you may use this money in a general account toward any of your needs as you see fit.

When my life and I lived in Toledo almost thirty years ago, we belonged to an Astronomy Society. I believe there were about 100 of us. We had a fund drive to build a rather large 25 inch reflecting telescope, the mounting, and an observatory. You wouldn't believe how much money we raised in that small group. Owens Corning donated the 25 inch mirror blank to us and we ground it ourselves. Other people stepped in with a land donation. We all worked helping to build the telescope and the observatory and then mount the big rotating dome on top of the building.

I hope that Seti can raise several million dollars a year. We should be able to with the dedicated volunteers we have. We will need a much larger fund raising group and many contacts with local and national media to spread the word that we are serious about finding a way to continue funding the project. Then, other large donors like Paul Allen and many medium and small ones will step in too. We need the help of a dedicated group of small donors year after year with sustaining pledges like Public Television does, with one at a time pledges, and with large donation that do special things. I have been taught in fundraising that most large donors don't want to supply ongoing funding to a project and that it is very important for a project to have a self-sustaining base. Then that attracts donors who want to help with the very important extra's that are very important but not quite essential and start up funds like Paul Allen provided for the Allen Telescope Array. So $350 is my immediate pledge.

I hope many of you out there will donate as much as you comfortably can to Seti's fundraising for this year and on a regular basis if you can. I will make further donations as I am able, but the majority of that is contingent on Seti really deciding what it wants to do on a yearly basis and what that would cost.

I want that to be realistic. Not a low ball figure that counts on all kinds of good possibilities happening. Not a high ball figure that goes far beyond making the staff adequate and solving some very severe problems we are having now like getting data into and out of the volunteers. I have watched as things have gone down hill with data delivery and data input problem since about last February. It is now about as bad as it can get for very high level crunchers, perhaps in the top 200. For me at 313 now in the top 1,000 participants (about 34,300 RAC) it is a big issue running around to my computers and turning no new tasks on and off and then remembering to turn new tasks on so I can get some data to crunch.

Please, please SETI. Let us know what you need and make a reasonable budget. I know many, many of us are willing and able to help. One time small donations of $l0 or more are also going to help, but just think what having 10,000 donors who are willing to send $200 to $500 per year and above to the project would do. The project would be on firm footing. What is $350 average times 10,000. It is 3.5 million dollars. My wife and I have donated a few hundred to $1,000 per year to many projects over the years. I don't understand why Seti could not be one of those for us and perhaps many of us who are committed to astronomy and space exploration, and the search for intelligent civilizations in our galaxy and it star systems that we can target as likely locations for life. That would work for the project wouldn't it. I am thinking it would make people like Paul Allen very happy too. My first donation of $350 will be sent to you online or via the web on Monday.
____________
Frank Elliott,Member of Carepages.com,a chronic illness support site. Was FrankLivingFully there.Free user name & pw needed. My Google+ Profile is:
https://profiles.google.com/u/0/10871372137584 Science,SF,Space,Astronomy,Medicine,Psyc Topics.

Profile Bill G
Avatar
Send message
Joined: 1 Jun 01
Posts: 341
Credit: 30,418,719
RAC: 78,926
United States
Message 1304261 - Posted: 10 Nov 2012, 2:10:43 UTC - in response to Message 1304203.

Missed the edit window, but the more I think about it, the gpu limit may be total gpu rather than per gpu. Was a quick and dirty implementation. Can anyone confirm?

Looks like that is what it is.

A shame if it is the case.

Would be better if they could have just overridden people's cache settings. Set it to 2 days for now, that will get us through most outages.
Although given the difference in work fetch between v6.x & v7.x clients that probably wouldn't work to well either.

The present settings will mean i'll run out of GPU work during the weekly outages. Probably CPU work as well on my i7 if they're mostly shorties.

I am certain that is how it is, in fact it counts WUs in progress as well as WUs wating to run. Between my cache and running jobs, the downloads bring the total up to 200. Down to 3100 ghosts which are slowly downloading.

I would normally never have that many WUs on my computer....it is a shame that the programing kept trying to send me WUs even when there were ghosts waiting to download.
____________

Profile betreger
Avatar
Send message
Joined: 29 Jun 99
Posts: 1757
Credit: 3,640,779
RAC: 8,183
United States
Message 1304264 - Posted: 10 Nov 2012, 2:31:40 UTC - in response to Message 1304254.

Your Karma is good.
____________

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (78) Server Problems?

Copyright © 2014 University of California