Panic Mode On (34) Server Problems

Message boards : Number crunching : Panic Mode On (34) Server Problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51522
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011387 - Posted: 3 Jul 2010, 22:32:59 UTC - in response to Message 1011384.  


Anyone else finding the forums & web site going down for a few minutes at a time over the last few hours?


If it stays up long enough to post this reply.. YES!!!

It's been doing it since very early this morning...
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1011387 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1011388 - Posted: 3 Jul 2010, 22:40:28 UTC - in response to Message 1011387.  


Nice to know it's not just me.
Grant
Darwin NT
ID: 1011388 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 1011535 - Posted: 4 Jul 2010, 6:37:56 UTC

Well looks like another problem rears it's head again.
My main rig was starting to do ok on the quota of 20, but will run out of work soon.
The reason Downloads not Happening.
Well happy 4th July.


Dave
ID: 1011535 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1011538 - Posted: 4 Jul 2010, 6:51:47 UTC - in response to Message 1011535.  

Well looks like another problem rears it's head again.
My main rig was starting to do ok on the quota of 20, but will run out of work soon.
The reason Downloads not Happening.
Well happy 4th July.


Dave


Same problem. Project backoff due to constant HTTP errors.

Seems the servers are screwed after the crash.
- Jarryd
ID: 1011538 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13903
Credit: 208,696,464
RAC: 304
Australia
Message 1011541 - Posted: 4 Jul 2010, 6:56:22 UTC - in response to Message 1011538.  

Same problem. Project backoff due to constant HTTP errors.

Seems the servers are screwed after the crash.

Network traffic graphs show the pipe is full, downloads occuring as fast as they are requested, hence the HTTP errors. When the traffic drops off, the errors will too.

Grant
Darwin NT
ID: 1011541 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1011542 - Posted: 4 Jul 2010, 6:58:07 UTC

I hope this is a silly question...
Are the inter-computer communications within SETI@home riding on the same
network hub as the internet connection?


Janice
ID: 1011542 · Report as offensive
ToxicTBag

Send message
Joined: 5 Feb 10
Posts: 101
Credit: 57,197,902
RAC: 0
United Kingdom
Message 1011543 - Posted: 4 Jul 2010, 7:00:31 UTC

Happy July 4th all have a good one ;-)
ID: 1011543 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1011553 - Posted: 4 Jul 2010, 7:11:43 UTC

I'm uploading fine. Just downloads are failing.
- Jarryd
ID: 1011553 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 1011577 - Posted: 4 Jul 2010, 8:06:16 UTC - in response to Message 1011558.  

If your tasks are all VLAR's then you will need to ride it out.
If you have a mix then Reschedule some of them to the GPU's.

Dave
ID: 1011577 · Report as offensive
Profile MadMaC
Volunteer tester
Avatar

Send message
Joined: 4 Apr 01
Posts: 201
Credit: 47,158,217
RAC: 0
United Kingdom
Message 1011581 - Posted: 4 Jul 2010, 8:45:37 UTC

For all the Americans out there - Happy Independance Day Guys

Am I reading the news post right?
Are the servers really only going to allow 20 tasks per PC per day, or have I read it wrong?
Im curious as one of my hosts hasn't been able to get any work since the outage and is currently crunching milkyway until it can get work...

Could do with an explanation as Im not that clear on how it works?
ID: 1011581 · Report as offensive
Profile Hellsheep
Volunteer tester

Send message
Joined: 12 Sep 08
Posts: 428
Credit: 784,780
RAC: 0
Australia
Message 1011597 - Posted: 4 Jul 2010, 10:00:11 UTC - in response to Message 1011581.  

For all the Americans out there - Happy Independance Day Guys

Am I reading the news post right?
Are the servers really only going to allow 20 tasks per PC per day, or have I read it wrong?
Im curious as one of my hosts hasn't been able to get any work since the outage and is currently crunching milkyway until it can get work...

Could do with an explanation as Im not that clear on how it works?


Happy 4th July to you from Australia. :)

Back to the task at hand, after the last 3 day outage we had, a 20 WU limit was put in place to reduce the stress on the servers by specific hosts, and allowing more than just the big time crunchers access to work units, thus also keeping work units on the server ready to send out rather than dumping them all at once, meaning big time crunchers with 10 day caches don't get all the work and part time crunchers that have a half day cache can get work throughout the week when their machines need it.

The problem with this is some machines are getting CPU work only, and some are getting Astropulse only. Not to mention that the big time crunchers will not be able to keep their machines at 100% load through the next outage.

20 Work units on my basic home machine runs at 4 CPU WU's at a time, each taking about 1hr 30 mins. 16 GPU tasks taking about 10 minutes each, that's pretty much 20 work units in just under 2 hours.

Now there is a news post saying they're going to remove the limit Monday morning Berkeley time to allow big crunchers to fill up caches for the outage.

We'll see what happens though, i doubt all the big crunchers will get work.
- Jarryd
ID: 1011597 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 1011598 - Posted: 4 Jul 2010, 10:09:41 UTC

VLAR's will cause a huge problem for GPU crunchers mine included.
As a task finishes and reports if there are large amounts of VLAR's they will be sent.
If you are not using VLAR_KILL all your work will be CPU based.
Thus GPU's go cold.
There is a reverse to this of course, if there is a shorty storm CPU's will go cold.
I would prefer the second option.

Dave
ID: 1011598 · Report as offensive
Wandering Willie
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 136
Credit: 2,127,073
RAC: 0
United Kingdom
Message 1011606 - Posted: 4 Jul 2010, 11:05:36 UTC


Perhaps quota system is working WU's in the field now down to 3,258,508.

I have not seen it this low for a while, perhaps a few of the pendings are begining to validate at last.

Michael
ID: 1011606 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1011612 - Posted: 4 Jul 2010, 11:27:06 UTC - in response to Message 1011606.  

I would say it's because nobody can refill his Cache. Every personally
Cache is now overwritten by the Server rules to 20 Workunits per Host.
So, the amount of Workunits out in the Fields will be reduced.

Helli
ID: 1011612 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1011636 - Posted: 4 Jul 2010, 12:56:18 UTC - in response to Message 1011612.  

Well, looks like roughly a 12k/h deficit.
I find those spikes in up/downloads every 4 hours interesting.

Saying nobody can fill his cache is unfair to some 140,000 users with a RAC of 500 or less, to whom 20 WU would be enough for at least a 4 day cache.
That said the 27894 users with an RAC above 500 (thanks Robert Ribbeck, Scarecrow et.al.) are way below capacity and requirement.
So, an estimated 80% of active users got their piece of cake and will be happy - how about starting to give second helplings of the remaining cake?
At least that cake doesn't rot...

Seeing as it's Sunday and Monday a holiday in the US it looks highly unlikely to be enough if performing outage as scheduled.

Wonder if the electricity companies see a dip in power consumption?

I hope patience does not run out as fast as WUs... We'll need a lot more over the coming weeks.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1011636 · Report as offensive
PP

Send message
Joined: 3 Jul 99
Posts: 42
Credit: 10,012,664
RAC: 0
Sweden
Message 1011643 - Posted: 4 Jul 2010, 13:05:03 UTC - in response to Message 1011636.  

For those interested - my AP only cruncher has 37 tasks in progress right now so the limit doesn't seem to apply to AP.
http://setiathome.berkeley.edu/results.php?hostid=5097356&offset=0&show_names=0&state=1
ID: 1011643 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1011644 - Posted: 4 Jul 2010, 13:05:55 UTC

yupp, but these 27,000 Users neets - i assume - 80% of all Workunits available.

Helli
ID: 1011644 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1011648 - Posted: 4 Jul 2010, 13:19:41 UTC - in response to Message 1011540.  

No problems with downloads here. I turn in a WU, and I get one back right away.


The same with me. That is fine while the download servers are turned on, but at this rate I expect to enter the next 3 day outrage with about 4 hours of cache. And I increased my cache setting to 5 days, from 3.

Question: the effects of all this on us users is obvious, but what is the effect on the project? Compared to a (recent) typical week of unplanned outages, is S@H processing more work, less work, or about the same? And how does the processing speed compare to the flow of new work from the telescope?

ID: 1011648 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1011653 - Posted: 4 Jul 2010, 13:35:26 UTC - in response to Message 1011644.  
Last modified: 4 Jul 2010, 13:48:09 UTC

yupp, but these 27,000 Users neets - i assume - 80% of all Workunits available.

Helli


From the RAC/users chart with added daily credits they need about 60%
EDIT: or perhaps 52% if the numbers are calculated differently *sigh*
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1011653 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51522
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1011702 - Posted: 4 Jul 2010, 16:05:07 UTC - in response to Message 1011648.  



Question: the effects of all this on us users is obvious, but what is the effect on the project? Compared to a (recent) typical week of unplanned outages, is S@H processing more work, less work, or about the same? And how does the processing speed compare to the flow of new work from the telescope?


It will probably take a few weeks of this new cycle and the subsequent 'fine tuning' to assess the impact on Seti workflow of folks like me who are going to stick around but may not be able to get enough work to continue to crunch 24/7, and those who choose to leave for their own personal reasons.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1011702 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9

Message boards : Number crunching : Panic Mode On (34) Server Problems


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.