Panic Mode On (31) Server problems

Message boards : Number crunching : Panic Mode On (31) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 982629 - Posted: 23 Mar 2010, 1:47:16 UTC - in response to Message 982449.  

I blame the Berkeley HVAC tech who pushed the reset button on the air conditioner and said "oh it's working, everything's ok"

The AC unit tripped it's circuit protection for a reason.
Until that reason is discovered, the tech who made the decision to let it ride has put the entire projects hardware in jeopardy.



Our A/C at work does that once or twice a year & they just reset it. Never mind that we have lost $5-6,000 in machines already from the cooling going down :/
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 982629 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 982818 - Posted: 23 Mar 2010, 23:00:08 UTC

Up, down, up, down, up ... it's going swell. :-)
ID: 982818 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65779
Credit: 55,293,173
RAC: 49
United States
Message 982950 - Posted: 24 Mar 2010, 3:15:00 UTC - in response to Message 982818.  
Last modified: 24 Mar 2010, 3:15:18 UTC

Up, down, up, down, up ... it's going swell. :-)

Anymore and I swear some might get sea sick(motion sickness). :D
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 982950 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 983338 - Posted: 25 Mar 2010, 1:15:07 UTC

I am seeing something that is very "freaky" to say the least. As I look at my pending credits once each morning and record numbers I look at Fragment to see what Network Traffic is doing. It is part of my check of Seti, Seti Beta, Server Status (both) and logging what my machines are doing.

So while Fragment is owned by UCB and not Seti. I can point you to a link to view. I will not inline the graphic. What I am seeing which is the freaky part, is that You are getting Uploads, Downloads and Scheduler requests through Without being MAXED out. That is Scary!

Uploads/Scheduler requests were maxed at 10 meg (can't fit a packet in edgewise).... Downloads dominated the bandwidth.

The Daily look
Daily

The Weekly look
Weekly

The Monthly look
Monthly

Please take a moment to Bookmark the URL, then you can peek when you want. Please do not make a Live link to Fragment.

Regards


Please consider a Donation to the Seti Project.

ID: 983338 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 983363 - Posted: 25 Mar 2010, 2:12:56 UTC - in response to Message 983338.  

I am seeing something that is very "freaky" to say the least. As I look at my pending credits once each morning and record numbers I look at Fragment to see what Network Traffic is doing. It is part of my check of Seti, Seti Beta, Server Status (both) and logging what my machines are doing.

So while Fragment is owned by UCB and not Seti. I can point you to a link to view. I will not inline the graphic. What I am seeing which is the freaky part, is that You are getting Uploads, Downloads and Scheduler requests through Without being MAXED out. That is Scary!

Uploads/Scheduler requests were maxed at 10 meg (can't fit a packet in edgewise).... Downloads dominated the bandwidth.

The Daily look
Daily

The Weekly look
Weekly

The Monthly look
Monthly

Please take a moment to Bookmark the URL, then you can peek when you want. Please do not make a Live link to Fragment.

Regards



I have had a bookmark to Fragment for at least the last 4 years on my computers.

ID: 983363 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 983381 - Posted: 25 Mar 2010, 2:48:45 UTC - in response to Message 983338.  

Pappa wrote:
I am seeing something that is very "freaky" to say the least. As I look at my pending credits once each morning and record numbers I look at Fragment to see what Network Traffic is doing. It is part of my check of Seti, Seti Beta, Server Status (both) and logging what my machines are doing.
...
What I am seeing which is the freaky part, is that You are getting Uploads, Downloads and Scheduler requests through Without being MAXED out. That is Scary!

Uploads/Scheduler requests were maxed at 10 meg (can't fit a packet in edgewise).... Downloads dominated the bandwidth.
...

I've always assumed the fairly frequent times when the sum of in and out exceeded 100 MBits/second were prima facie evidence the link is full duplex. You've made several posts which imply it is half duplex, how sure are you of that information?
                                                             Joe
ID: 983381 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 983411 - Posted: 25 Mar 2010, 5:31:22 UTC - in response to Message 983381.  

Pappa wrote:
I am seeing something that is very "freaky" to say the least. As I look at my pending credits once each morning and record numbers I look at Fragment to see what Network Traffic is doing. It is part of my check of Seti, Seti Beta, Server Status (both) and logging what my machines are doing.
...
What I am seeing which is the freaky part, is that You are getting Uploads, Downloads and Scheduler requests through Without being MAXED out. That is Scary!

Uploads/Scheduler requests were maxed at 10 meg (can't fit a packet in edgewise).... Downloads dominated the bandwidth.
...

I've always assumed the fairly frequent times when the sum of in and out exceeded 100 MBits/second were prima facie evidence the link is full duplex. You've made several posts which imply it is half duplex, how sure are you of that information?
                                                             Joe


Joe, et al

At some point in time I suspect that something changed. It has been the cause of many problems that were pointed at Seti. If in fact it is true, it had the Seti staff looking for problems they did not own.

So without saying I stated about "half duplex"... It is evident something has changed/fixed. Only a few would fully recognize your statement (or the implecations).

Regards




Please consider a Donation to the Seti Project.

ID: 983411 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 983435 - Posted: 25 Mar 2010, 6:36:50 UTC - in response to Message 983338.  
Last modified: 25 Mar 2010, 6:43:02 UTC

Uploads/Scheduler requests were maxed at 10 meg (can't fit a packet in edgewise).... Downloads dominated the bandwidth.

When?
Currently uploads are at 13.6Mb/s, the maximum was at 17.7Mb/s.
Normally it's around 10Mb/s, 13-14MB/s if there is a shorty storm & downloads are maxed out.

For the last few weeks there has been some sort of problem with the backend- whether the download server, scheduler or something else in the system. Even after outages, the maximum bandwidth hasn't been reached and even after the recovery there's still more than usual upload traffic.
One of the symptoms being "SETI@home Message from server: (Project has no jobs available)" messages. Usually you only get those when there's heavy load on the downloads. But i get multiples of those messages before i finally get some downloads allocated.
For whatever reason, the scheduler isn't able to meet the demand for downloads, hence those messages & the lower than usual download traffic after outages & the higher than usual upload traffic at all times.
Grant
Darwin NT
ID: 983435 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 983570 - Posted: 25 Mar 2010, 17:24:26 UTC - in response to Message 983435.  

Back in January, were having this discussion about what is in the Scheduler "Feeder Cycle."

The Message "SETI@home Message from server: (Project has no jobs available)" means that when Your machine phoned home and asked for work there was none to be had during what was left of the Feeder Cycle.

This forum message should put you into the middle of that discussion OMG please fix the scheduler

Regards

Please consider a Donation to the Seti Project.

ID: 983570 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 983575 - Posted: 25 Mar 2010, 17:50:36 UTC - in response to Message 983570.  

Back in January, were having this discussion about what is in the Scheduler "Feeder Cycle."

The Message "SETI@home Message from server: (Project has no jobs available)" means that when Your machine phoned home and asked for work there was none to be had during what was left of the Feeder Cycle.

This forum message should put you into the middle of that discussion OMG please fix the scheduler

And the fact is that those messages are much, much more prevalent now than at any time in the past.
The low level of outbound traffic & high level of inbound traffic are indicative of some sort of problem; most likely as a result of the clients having to make multiple repeated attempts in order to get work.
Grant
Darwin NT
ID: 983575 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 983646 - Posted: 25 Mar 2010, 22:19:51 UTC - in response to Message 983575.  
Last modified: 25 Mar 2010, 22:20:28 UTC

Back in January, were having this discussion about what is in the Scheduler "Feeder Cycle."

The Message "SETI@home Message from server: (Project has no jobs available)" means that when Your machine phoned home and asked for work there was none to be had during what was left of the Feeder Cycle.

This forum message should put you into the middle of that discussion OMG please fix the scheduler

And the fact is that those messages are much, much more prevalent now than at any time in the past.
The low level of outbound traffic & high level of inbound traffic are indicative of some sort of problem; most likely as a result of the clients having to make multiple repeated attempts in order to get work.

There is pretty much exactly one reason for the message: the feeder queue does not have a suitable task.

For example, if all you want is AP, and the slots for AP are empty: no work.

I can think of reasons to do this on purpose: flow control.

I can think of problems (just plain not having enough splitters, or no tapes to split) that would cause this.

Edit: could be more "hungry" users than can be supplied.

I don't think looking at traffic is enough to draw a conclusion.
ID: 983646 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 983763 - Posted: 26 Mar 2010, 4:50:31 UTC - in response to Message 983646.  
Last modified: 26 Mar 2010, 4:51:39 UTC

Back in January, were having this discussion about what is in the Scheduler "Feeder Cycle."

The Message "SETI@home Message from server: (Project has no jobs available)" means that when Your machine phoned home and asked for work there was none to be had during what was left of the Feeder Cycle.

This forum message should put you into the middle of that discussion OMG please fix the scheduler

And the fact is that those messages are much, much more prevalent now than at any time in the past.
The low level of outbound traffic & high level of inbound traffic are indicative of some sort of problem; most likely as a result of the clients having to make multiple repeated attempts in order to get work.

There is pretty much exactly one reason for the message: the feeder queue does not have a suitable task.

For example, if all you want is AP, and the slots for AP are empty: no work.

I can think of reasons to do this on purpose: flow control.

I can think of problems (just plain not having enough splitters, or no tapes to split) that would cause this.

Edit: could be more "hungry" users than can be supplied.

I don't think looking at traffic is enough to draw a conclusion.


Maybe not, but basing it on historical data it is odd that it takes several attempts before work is allocated no matter how light the traffic is, when prior to the initial Great Upload Incident of a few weeks ago (or however long ago it was), the only time you'd see those messages repeatedly was after an outage.
Grant
Darwin NT
ID: 983763 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 983789 - Posted: 26 Mar 2010, 6:20:41 UTC

There was statements (by Joe if I recall correctly) that current 2sec interval for refreshing those 100 tasks in memory ready to be downloaded is more than enough to max download bandwidth. That is, if servers work OK we should see maxed bandwidth before project has no jobs message will appear almost constantly (as I seen yesterday on my host). So, if download bandwidth not maxed and there are few thousands tasks in "ready to send" state as status page reported that time, then server (feeder?) has problems with bringing ready to send results in memory slots for actual downloading. Maybe that 2sec interval was changed recently. Maybe feeder can't do it's job due 2seconds. But it's apparently the bottleneck in this place.
ID: 983789 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 983806 - Posted: 26 Mar 2010, 6:43:34 UTC

Just checked my messages log and have seen this message several times throughout the night, but the last few attempts to collect new work have been successful.
ID: 983806 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 983818 - Posted: 26 Mar 2010, 7:09:34 UTC - in response to Message 983789.  

Maybe that 2sec interval was changed recently. Maybe feeder can't do it's job due 2seconds. But it's apparently the bottleneck in this place.

Or some other process causing the feeder to slow down.
W
hHether it's related or not, since the first Upload Incident the assimilators have been struggling to keep up. Prior to that Incident, once they'd caught up after an outage they had no problem keeping the queue close to nothing where as now there's almost always a large backlog of work to assimilate.
It may or may not be related to the difficulties in getting work on the first request.
Grant
Darwin NT
ID: 983818 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 983823 - Posted: 26 Mar 2010, 7:15:22 UTC

One thing I have noticed when getting new tasks is that, for some time there used to be around 60 or so delivered at a time.
Now they only seem to come down max at 20 or so at a time.When the crunchers cache is empty.

Dave
ID: 983823 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 983842 - Posted: 26 Mar 2010, 7:50:06 UTC - in response to Message 983823.  

One thing I have noticed when getting new tasks is that, for some time there used to be around 60 or so delivered at a time.
Now they only seem to come down max at 20 or so at a time.When the crunchers cache is empty.

Dave

It always used to be 20 at a time, I think.......
It was only in the recent past that work requests would generate large numbers of downloads in a single request.
If I am not mistaken.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 983842 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 983858 - Posted: 26 Mar 2010, 8:56:45 UTC - in response to Message 983823.  
Last modified: 26 Mar 2010, 8:57:20 UTC

One thing I have noticed when getting new tasks is that, for some time there used to be around 60 or so delivered at a time.
Now they only seem to come down max at 20 or so at a time.When the crunchers cache is empty.

Dave

YEs, same here. When I had manually fill cache (pressing update button over and over because of "no work available" messages all the time) max bunch I got was 29. Once - 27, mostly around or less than 20. Either I was very unlucky and always met end of those "100 tasks" (although requests were issues almost constantly), or there were no more "100" tasks in memory, but much less.
ID: 983858 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 983901 - Posted: 26 Mar 2010, 12:27:53 UTC - in response to Message 983842.  

One thing I have noticed when getting new tasks is that, for some time there used to be around 60 or so delivered at a time.
Now they only seem to come down max at 20 or so at a time.When the crunchers cache is empty.

Dave

It always used to be 20 at a time, I think.......
It was only in the recent past that work requests would generate large numbers of downloads in a single request.
If I am not mistaken.

Note sure when it change. However, When I started running S@H again last June or July I would get 40-60 tasks at a time on some machines.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 983901 · Report as offensive
Profile Lint trap

Send message
Joined: 30 May 03
Posts: 871
Credit: 28,092,319
RAC: 0
United States
Message 983976 - Posted: 26 Mar 2010, 15:50:19 UTC - in response to Message 983858.  

One thing I have noticed when getting new tasks is that, for some time there used to be around 60 or so delivered at a time.
Now they only seem to come down max at 20 or so at a time.When the crunchers cache is empty.

Dave

YEs, same here. When I had manually fill cache (pressing update button over and over because of "no work available" messages all the time) max bunch I got was 29. Once - 27, mostly around or less than 20. Either I was very unlucky and always met end of those "100 tasks" (although requests were issues almost constantly), or there were no more "100" tasks in memory, but much less.


From looking at my work fetch results, it does look like something has changed recently. The chart only includes results for 10 or more d/l's.



Martin
ID: 983976 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (31) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.