Panic Mode On (63) Server problems?

Message boards : Number crunching : Panic Mode On (63) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1180965 - Posted: 27 Dec 2011, 17:49:18 UTC - in response to Message 1180964.  


Same goes for GPU hosting rigs....the faster they are right now, the worse off they are.

What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up.

That's what really sux for us right now.



You said it bro...

That's why I have been stumping for weeks now for the Admins and Devs to address the Boinc code problems, get them behind us, and get the dang limits lifted.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1180965 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1180967 - Posted: 27 Dec 2011, 17:51:38 UTC - in response to Message 1180966.  

Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year?

That could be the case, but I am not certain.
If they are on hiatus for the whole week, they may let things limp along as they are other than possibly what could be addressed by remote.
Or, possibly an outage later in the week.

Just dunno for sure.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1180967 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1180970 - Posted: 27 Dec 2011, 17:55:03 UTC - in response to Message 1180961.  

What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain.
Same goes for GPU hosting rigs....the faster they are right now, the worse off they are.

What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up.

That's what really sux for us right now.


I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work. The quad-core machine would have about a 10% success rate. Slow machine would get its ~50 MBs in less than 10 requests, but the quad would have to ask for work 50+ times to get maybe 75.

Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1180970 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1180971 - Posted: 27 Dec 2011, 17:57:35 UTC - in response to Message 1180970.  

What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain.
Same goes for GPU hosting rigs....the faster they are right now, the worse off they are.

What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up.

That's what really sux for us right now.


I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work. The quad-core machine would have about a 10% success rate. Slow machine would get its ~50 MBs in less than 10 requests, but the quad would have to ask for work 50+ times to get maybe 75.

Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned.

I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1180971 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1180972 - Posted: 27 Dec 2011, 18:05:24 UTC - in response to Message 1180971.  

I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed.

That is true. And I've stated a few times that if we can get more bandwidth, it may create a whole new pile of problems all by itself by allowing more successful contacts to the database. It's one of those things that we'll just have to wait and see what happens and have some contingency plans lined up for some of the possible scenarios.

However, the good news is that with all of the enterprise-class networking equipment that is in place, we can get an actual gigabit link, but still rate-limit it to 100mbit, or 150mbit, whatever seems to allow the smoothest data transfer while keeping the database from getting DDoS'ed.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1180972 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1180974 - Posted: 27 Dec 2011, 18:09:30 UTC - in response to Message 1180972.  
Last modified: 27 Dec 2011, 18:09:53 UTC

I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed.

That is true. And I've stated a few times that if we can get more bandwidth, it may create a whole new pile of problems all by itself by allowing more successful contacts to the database. It's one of those things that we'll just have to wait and see what happens and have some contingency plans lined up for some of the possible scenarios.

However, the good news is that with all of the enterprise-class networking equipment that is in place, we can get an actual gigabit link, but still rate-limit it to 100mbit, or 150mbit, whatever seems to allow the smoothest data transfer while keeping the database from getting DDoS'ed.


Well, if you peruse the information in the GPUUG fundraising thread, you will see than many hardware upgrades are well on their way to being completed. With more to come.

As far as I know, we still do not have a real path in place for upgrading the bandwidth, other than having the project's pleas fall on the deaf ears of the Berk IT admins.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1180974 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1181033 - Posted: 28 Dec 2011, 0:01:54 UTC - in response to Message 1180966.  

Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year?


Hmm, the UC Berkeley Academic Calendar shows Monday, Tuesday, Thursday, and Friday as "Academic and Administrative Holiday".
                                                                   Joe
ID: 1181033 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51525
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1181034 - Posted: 28 Dec 2011, 0:11:53 UTC - in response to Message 1181033.  

Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year?


Hmm, the UC Berkeley Academic Calendar shows Monday, Tuesday, Thursday, and Friday as "Academic and Administrative Holiday".
                                                                   Joe

Ahhh....
So it looks like some of our indentured servants may be in the lab tomorrow for an outage party.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1181034 · Report as offensive
Richard1949

Send message
Joined: 20 Oct 99
Posts: 18
Credit: 232,635
RAC: 0
United States
Message 1181044 - Posted: 28 Dec 2011, 0:38:23 UTC

"I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work."
---------------------------------------------------
I can't even get anything for my single core machine.
ID: 1181044 · Report as offensive
Richard1949

Send message
Joined: 20 Oct 99
Posts: 18
Credit: 232,635
RAC: 0
United States
Message 1181046 - Posted: 28 Dec 2011, 0:41:18 UTC

"Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned."
----------------------------------------------
I keep getting "not requesting any tasks."
ID: 1181046 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 1181111 - Posted: 28 Dec 2011, 8:21:06 UTC


15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2.
Grant
Darwin NT
ID: 1181111 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37613
Credit: 261,360,520
RAC: 489
Australia
Message 1181112 - Posted: 28 Dec 2011, 8:56:51 UTC - in response to Message 1181111.  


15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2.

Personally I still put the current problems on the connection itself between the USA side of our undersea cable and HE as using a proxy here quickly clears any backlogs that occur.

Cheers.
ID: 1181112 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1181113 - Posted: 28 Dec 2011, 9:05:51 UTC - in response to Message 1181046.  

I keep getting "not requesting any tasks."

And why would that be a server problem when your client doesn't ask for work?

Gruß,
Gundolf
ID: 1181113 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1181114 - Posted: 28 Dec 2011, 9:15:31 UTC - in response to Message 1181112.  
Last modified: 28 Dec 2011, 9:16:15 UTC


15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2.

Personally I still put the current problems on the connection itself between the USA side of our undersea cable and HE as using a proxy here quickly clears any backlogs that occur.

Cheers.

Yeah, that certainly would appear to be that under-sea cable. I noticed in my messages tab last night that between "starting download" and "finished download" for an AP, 19 seconds elapsed (~430KB/sec). Of course it was a B3_P1 WU, so it took 24 seconds to error out once processing started. Go figure.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1181114 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1181118 - Posted: 28 Dec 2011, 10:41:02 UTC

So are we in an outage? Why is there no work?
ID: 1181118 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22723
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1181125 - Posted: 28 Dec 2011, 11:37:13 UTC

No signs of an outage, and I've been getting a fairly steady stream of work
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1181125 · Report as offensive
Richard1949

Send message
Joined: 20 Oct 99
Posts: 18
Credit: 232,635
RAC: 0
United States
Message 1181127 - Posted: 28 Dec 2011, 11:49:05 UTC - in response to Message 1181113.  

I keep getting "not requesting any tasks."

And why would that be a server problem when your client doesn't ask for work?

Gruß,
Gundolf


I don't know whats going on. I have uninstalled and reinstalled. I have reset the project. I have tried different BOINC versions. Nothing works. All I get when I request tasks is "not requesting tasks" message. Every other BOINC project works fine. It's only SETI I can't get anything from even though it shows it has 100's of thousands of tasks ready to send out.

ID: 1181127 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1181128 - Posted: 28 Dec 2011, 12:00:09 UTC - in response to Message 1181125.  

No signs of an outage, and I've been getting a fairly steady stream of work



Bouncing off the limits here.

Pity its all sitting in my download queue:-(

Dropped back to 1 WU per card and with max button pushing only getting enough to keep on average 2 out of 3 cards running.

Shorties and AP's, never a good combination.


Kevin


ID: 1181128 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1181129 - Posted: 28 Dec 2011, 12:01:00 UTC - in response to Message 1181127.  
Last modified: 28 Dec 2011, 12:11:23 UTC

I keep getting "not requesting any tasks."

And why would that be a server problem when your client doesn't ask for work?

Gruß,
Gundolf


I don't know whats going on. I have uninstalled and reinstalled. I have reset the project. I have tried different BOINC versions. Nothing works. All I get when I request tasks is "not requesting tasks" message. Every other BOINC project works fine. It's only SETI I can't get anything from even though it shows it has 100's of thousands of tasks ready to send out.

If your host isn't asking for work from Seti then eithier you're got No New Tasks set, you're got the Seti Project Suspended, you're got one or more Seti Tasks Suspended, or your Cache is Full already.

Claggy
ID: 1181129 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1181130 - Posted: 28 Dec 2011, 12:08:12 UTC - in response to Message 1181128.  

Shorties and AP's, never a good combination.


We're after the Goldilocks zone.
ID: 1181130 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Panic Mode On (63) Server problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.