Panic Mode On (19) Server problems

Message boards : Number crunching : Panic Mode On (19) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 914019 - Posted: 4 Jul 2009, 17:24:52 UTC

I bet a selected few are able to d/load these WU's which maxes out the system,
that leaves the rest of us unable to access.
Then by the time we are able to up/download all the tasks have been taken leaving the cupboard bare.
ID: 914019 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 914020 - Posted: 4 Jul 2009, 17:36:29 UTC

Do you see that big 18 hour plus gap in the bandwidth???

Cricket Graph

This gap is where I performed my deception and blocked the bandwidth in order to prevent all the crunchers from discovering my home planet. Of course assistance required from those other aliens....you know them as "Grays". Their job was to suppress knowledge of this in the human race but they have failed. They did not discover the Cricket in time. Now arrangements are being made with politicians to cover this up along the same lines as with the Roswell Incident.

Boinc....Boinc....Boinc....Boinc....
ID: 914020 · Report as offensive
Profile Matthew Love
Volunteer tester
Avatar

Send message
Joined: 26 Sep 99
Posts: 7763
Credit: 879,151
RAC: 0
United States
Message 914024 - Posted: 4 Jul 2009, 17:45:35 UTC

CHARLIE CHAN WILL SAVE THE DAY FROM THE TERRIBLE FOES!!

LETS BEGIN IN 2010
ID: 914024 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 914025 - Posted: 4 Jul 2009, 17:46:34 UTC - in response to Message 914016.  

Still seeing RED.

By the way.....how can 69,000+ MultiBeam work units instantly show up ready for download?
And then the bandwidth maxes out? Instantly?

They didn't show up instantly. At 8:30 the result creation rate went to over thirty (instantly:-). The resulting download rush used up the bandwidth (almost instantly) and so the result count could increase over the next two hours.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 914025 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 914027 - Posted: 4 Jul 2009, 17:53:59 UTC - in response to Message 914023.  

Do you see that big 18 hour plus gap in the bandwidth???

Cricket Graph

This gap is where I performed my deception and blocked the bandwidth in order to prevent all the crunchers from discovering my home planet. Of course assistance required from those other aliens....you know them as "Grays". Their job was to suppress knowledge of this in the human race but they have failed. They did not discover the Cricket in time. Now arrangements are being made with politicians to cover this up along the same lines as with the Roswell Incident.


Did you really get permission from our home planet, to reveal that?


Oh my........this is rapidly falling apart. The knowledge of the gap is spreading around the planet and Obama is the only politician reachable on this holiday weekend. He does not have the necessary clearance and besides he would make a major tv broadcast about it and speak for several hours on it. But hey, maybe that would work.....put the entire planet to sleep with his speaking on and on and on........must do further research on this.

Boinc....Boinc....Boinc....Boinc....
ID: 914027 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 914030 - Posted: 4 Jul 2009, 18:04:15 UTC - in response to Message 914015.  
Last modified: 4 Jul 2009, 18:07:14 UTC

Look, we know the litany, the SETI project is doing a *lot* with *relatively* resources. (I say *relatively* limited resources because there are dozens of other projects with far fewer resources than SETI). Given that 'Give me more power (or money or resources) lament of Captain user to Engineer SETI project lament is a constant, and one never to be fulfilled (more resources begets the need for more resources), it seems to me that folks really ought to be exploring one of the actual good things about the BOINC project -- project diversity. Add more projects, tamp down on the share of your CPU (and/or GPU) cycles that is allocated to SETI and balance the load.

SETI currently has four times the users as the next largest batch of BOINC projects (Rosetta, Climate, World Grid, and Einstein), and after that, the user count drops way off to much smaller projects; which by the way are operating pretty reliably on an much smaller resource budget in part because users are drawn, moths to the flame, to SETI. These other projects very often are doing quite serious science as well.

If you are running CUDA devices, consider GPUGrid for example, or if you have unused fast ATI GPU resources, consider MilkyWay which currently provides the only optimized application which supports ATI GPU.

If you prefer long run work units, consider Climate, for mid length work units, Einstein works fine, and for shorter run work, there are a host of other projects which are generally running more reliably than SETI.

I mean, let's face it, the 'work' being done here is probably best characterized as speculative science, this isn't a bad thing, but if there is a resource bottleneck (and there is), it simply makes a lot of sense to consider reducing frustration levels by supporting projects that are engaged in basic science research to a larger degree, thereby reducing the apparent permanent overload condition which exists here.

Folks who simply stay with SETI and moan about its many issues (and should people elect to be honest about it, SETI is, for various reasons, among the least 'solid' of the BOINC projects), or folks who go into denial regarding those many issues and tout the 'give SETI more resources line' to the exclusion of alternatives, seem to me to rather miss the mark, and have missed it consistantly over the past few years.




IMHO Ned - well spoken
However it must be considered the frustration that ensues when one can't upload - download, or even report = such as is currently the case, probably until after next tuesday.

ID: 914030 · Report as offensive
HAL

Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 914032 - Posted: 4 Jul 2009, 18:15:40 UTC

56 more minutes and the first of my farm goes offline permanently - that's 2 WU's a day the supercrunchers won't have to compete for. The next one exits tomorrow morning.
No wingmen will be left stranded!
ID: 914032 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 914049 - Posted: 4 Jul 2009, 19:39:03 UTC - in response to Message 914015.  

IMHO Ned - well spoken
However it must be considered the frustration that ensues when one can't upload - download, or even report = such as is currently the case, probably until after next tuesday.


But your machines aren't frustrated - and there's no reason you should be either. You are not being deprived of anything other than worthless credits if your computers can't upload.

There's no reason why any of us such be so invested into the machine side of this project emotionally.
ID: 914049 · Report as offensive
Profile Matthew Love
Volunteer tester
Avatar

Send message
Joined: 26 Sep 99
Posts: 7763
Credit: 879,151
RAC: 0
United States
Message 914054 - Posted: 4 Jul 2009, 19:53:17 UTC

Did they have A major meltdown in servers?

LETS BEGIN IN 2010
ID: 914054 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 914056 - Posted: 4 Jul 2009, 19:58:41 UTC - in response to Message 914054.  

Did they have A major meltdown in servers?


Several issues have compounded the problem, but primarily it seems the major issue is the high traffic generated by clients requesting work more often due to shorter work sent out. Also, some new AP work was created after having a period of no AP, which are a bit larger than the normal workunits, and there's a lot of fast/hungry crunchers that are asking for that work as well.

To add to that, it seems that every time the requests die down, the staff have their weekly server outage to perform their routine tasks, which means holding off the masses until they are back up. Then everyone comes rushing back in for more work and the cycle starts all over again.
ID: 914056 · Report as offensive
Profile DPRGI - Luivul

Send message
Joined: 24 Jan 03
Posts: 17
Credit: 20,639,801
RAC: 0
Italy
Message 914062 - Posted: 4 Jul 2009, 20:22:29 UTC - in response to Message 914056.  

Did they have A major meltdown in servers?


Several issues have compounded the problem, but primarily it seems the major issue is the high traffic generated by clients requesting work more often due to shorter work sent out. Also, some new AP work was created after having a period of no AP, which are a bit larger than the normal workunits, and there's a lot of fast/hungry crunchers that are asking for that work as well.

To add to that, it seems that every time the requests die down, the staff have their weekly server outage to perform their routine tasks, which means holding off the masses until they are back up. Then everyone comes rushing back in for more work and the cycle starts all over again.


Only a question why from 13:30 of Friday to the 8:30 of the Saturday the system work fine with a network traffic normal?
Just for 20 Hours for me is very strange?
ID: 914062 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 914069 - Posted: 4 Jul 2009, 20:41:13 UTC - in response to Message 914062.  
Last modified: 4 Jul 2009, 20:45:19 UTC

Only a question why from 13:30 of Friday to the 8:30 of the Saturday the system work fine with a network traffic normal?

Network traffic hasn't been normal for any length of time for over 2 weeks. The period of time you're mentioning the splitters weren't running properly & very little work was being produced. That's been fixed which is why we now have so much download traffic.
Given the length of that problem period it will probably take well over 18 hours for this to clear.
Grant
Darwin NT
ID: 914069 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 914072 - Posted: 4 Jul 2009, 20:50:25 UTC

OK? I'm having a bit of difficulty getting my head round what is going on with uploads/downloads here.

Accepted - when download traffic is too high, upload traffic is strangled. Within an hour of that starting, my quaddie + GTX295 will stop asking for new work as the number of WU's waiting to upload is more than twice the number of CPUs, so I am no longer contributing to the download traffic. For the top hosts, with multiple GPUs, this will happen even quicker.

The current download "spike" has lasted rather more than 4 hours. By my reckoning that should mean that all C2D and C2Q hosts (as well as anything faster) should have a backlog of uploads that is preventing work requests by now. So the download "spike" is being maintained by hosts that crunch relatively few WU's per day to fill a gap that was left by a a 9-hour lack of downloads?

Something does not seem to add up...

F.

ID: 914072 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 914076 - Posted: 4 Jul 2009, 20:57:05 UTC - in response to Message 914072.  

Something does not seem to add up...

Don't forget that the traffic previously died befoe all hosts had refilled their caches. So you've got those chaches that still weren't full, combined with all the work that was processed & not replaced while the splitters weren't splitting. And it looks like there's no more AP work being generated, so those hosts will now be getting more MB work than when there was AP work avaiable.
Grant
Darwin NT
ID: 914076 · Report as offensive
Profile TCP JESUS
Avatar

Send message
Joined: 19 Jan 03
Posts: 205
Credit: 1,248,845
RAC: 0
Canada
Message 914080 - Posted: 4 Jul 2009, 21:20:38 UTC

What size of caches are the bigger 'crunchers' (with GPUs) running currently ?

I have been running a 4-day cache and was lucky enough to make it through the last little 'hickup' the other day without running out of work (with the help of Reschedule 1.7).....but today is a different story. Looks like I will be 'idle' before the sun goes down if I can't upload some results.

I had considered increasing my cache by a day or so, but left it alone to see what would happen after the last Network max-out. Unfortunately, I wasn't able to even come close to refilling HALF my cache, so regardless, a change made by me wouldn't have helped any at the time.

Is a 4 day cache size common for a 'big rig' (Octo-OC'd-I7 w/twin OC'd GTX 260's) ? should I consider 5 or more days ?

Thanks.
Allan
I am TCP JESUS...The Carpenter Phenom Jesus....and HAMMERING is what I do best!
formerly known as...MC Hammer.
ID: 914080 · Report as offensive
Andy Williams
Volunteer tester
Avatar

Send message
Joined: 11 May 01
Posts: 187
Credit: 112,464,820
RAC: 0
United States
Message 914084 - Posted: 4 Jul 2009, 21:26:37 UTC - in response to Message 914080.  

Is a 4 day cache size common for a 'big rig' (Octo-OC'd-I7 w/twin OC'd GTX 260's) ? should I consider 5 or more days ?


I've been running with 3 days since January. I ran out a couple of times. Given the latest round of troubles, I bumped it to 5 days halfway through June. Haven't run dry.
--
Classic 82353 WU / 400979 h
ID: 914084 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 914087 - Posted: 4 Jul 2009, 21:33:49 UTC


I got today ~ 250 WUs..

~ 80 VLAR killed, ~ 170 'shorties' and 'normal' WUs.

Ohh well..

My GPU cruncher do only idle at the day.

Yes.. he need ~ 800 MB AR=0.44x WUs/day.

Since some hours 14 result ULs which don't want to go to Berkeley.
So no new work request.. but new work would be available?


I'm really %$@&"§$%@!$€%§$% !!


I think the best would be to switch OFF the GPU cruncher and sell.
Then I would have a more relaxed life and wouldn't need to 'babysit' the PC.


Sorry.. but.. since I have my GPU cruncher I have no fun with SETI@home.


BTW.
Will answer tomorrow the questions to my post in the last 'panic thread'.
Now I'm tired.


BTW.
BOINC can only hold a ~ 3 day cache at my GPU cruncher. ~ 2,400 WUs. If I would go higher BOINC would go in crazy EDF mode and the PC can't crunch. Overloaded system RAM and CPU.
It's now the second time in ~ 1 1/2 months that I need to switch OFF. ~ 250 W in idle mode is too much for to wait for new work.

ID: 914087 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 914091 - Posted: 4 Jul 2009, 21:45:46 UTC - in response to Message 914080.  

Is a 4 day cache size common for a 'big rig' (Octo-OC'd-I7 w/twin OC'd GTX 260's) ? should I consider 5 or more days ?

Thanks.
Allan

Going to more than half the minimum turn-round time increases the chances of running into EDF so I have stuck at 3 days and have only run out of work after I deliberately ran down my cache to detach/re-attach after a failed upgrade to release about 1000 WU's back into the pool. The problem with EDF and CUDA (particularly if you have more than 1 GPU) is that multiple WU's can be put into "waiting to run" and this seems to increase the likelyhood of CPU fall-back being invoked.

I would recommend staying below 3.5 days for the cache.

F.
ID: 914091 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 914093 - Posted: 4 Jul 2009, 21:52:53 UTC - in response to Message 914015.  

IMHO Ned - well spoken
However it must be considered the frustration that ensues when one can't upload - download, or even report = such as is currently the case, probably until after next tuesday.

Frustration, sure. Frustration because the high demand exceeding capacity. My frustration because BOINC has a tendency to beat on the servers, sure.

In my opinion, most of this is caused by loading, and the thing that takes care of loading is time -- and weekdays and weekends are all the same, time passes even when Matt isn't at his desk.

Frustration too because most people seem to compare SETI (which is not that time critical) to something like Amazon.com, where missed connections are lost revenue. Amazon needs multiply-redundant everything. BOINC projects don't.

We just went through a longish period of failed uploads, and someone else commented that bandwidth is now pegged with downloads.

BOINC is supposed to tolerate all of this, and as a general rule, it does pretty well.

But "incompetent" is a little much. Those of us who hang out here know that Matt and Eric and Jeff, et. al., are both competent and dedicated, or we wouldn't see them on the weekends.
ID: 914093 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 914095 - Posted: 4 Jul 2009, 21:56:30 UTC - in response to Message 914030.  

Look, we know the litany, the SETI project is doing a *lot* with *relatively* resources. (I say *relatively* limited resources because there are dozens of other projects with far fewer resources than SETI). Given that 'Give me more power (or money or resources) lament of Captain user to Engineer SETI project lament is a constant, and one never to be fulfilled (more resources begets the need for more resources), it seems to me that folks really ought to be exploring one of the actual good things about the BOINC project -- project diversity. Add more projects, tamp down on the share of your CPU (and/or GPU) cycles that is allocated to SETI and balance the load.

SETI currently has four times the users as the next largest batch of BOINC projects (Rosetta, Climate, World Grid, and Einstein), and after that, the user count drops way off to much smaller projects; which by the way are operating pretty reliably on an much smaller resource budget in part because users are drawn, moths to the flame, to SETI. These other projects very often are doing quite serious science as well.

If you are running CUDA devices, consider GPUGrid for example, or if you have unused fast ATI GPU resources, consider MilkyWay which currently provides the only optimized application which supports ATI GPU.

If you prefer long run work units, consider Climate, for mid length work units, Einstein works fine, and for shorter run work, there are a host of other projects which are generally running more reliably than SETI.

I mean, let's face it, the 'work' being done here is probably best characterized as speculative science, this isn't a bad thing, but if there is a resource bottleneck (and there is), it simply makes a lot of sense to consider reducing frustration levels by supporting projects that are engaged in basic science research to a larger degree, thereby reducing the apparent permanent overload condition which exists here.

Folks who simply stay with SETI and moan about its many issues (and should people elect to be honest about it, SETI is, for various reasons, among the least 'solid' of the BOINC projects), or folks who go into denial regarding those many issues and tout the 'give SETI more resources line' to the exclusion of alternatives, seem to me to rather miss the mark, and have missed it consistantly over the past few years.

Amen.
ID: 914095 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (19) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.