Panic Mode On (16) Server problems

Message boards : Number crunching : Panic Mode On (16) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901855 - Posted: 30 May 2009, 23:53:21 UTC - in response to Message 901847.  


Even though there's plenty of MB work ready to send all i'm getting are
31/05/2009 8:22:51 SETI@home Message from server: (Project has no jobs available)
messages.

Matt made a comment the other day that was kind-of interesting.

The scheduler has a small queue of 100 work units that are available for assignment. Apparently, space is reserved for each application, so if there are 40 slots reserved for AP (and I don't know, it could be divided equally) then the queue is only 60 Multibeam.

... and if the queue can't be replenished faster than the scheduler assigns work, then we get where we are today: lots of "no jobs available" messages because the feeder can't quite keep up.

Whatever..........the kitties seem to have the caches filled up.............

Which is ultimately the point. It's nice when your cache is full, but the only time you should worry (if ever) is when it is empty.

If the queue is low, BOINC will get work eventually.

Waiting is.
ID: 901855 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901856 - Posted: 30 May 2009, 23:57:12 UTC - in response to Message 901852.  


Even though there's plenty of MB work ready to send all i'm getting are
31/05/2009 8:22:51 SETI@home Message from server: (Project has no jobs available)
messages.

Matt made a comment the other day that was kind-of interesting.

The scheduler has a small queue of 100 work units that are available for assignment. Apparently, space is reserved for each application, so if there are 40 slots reserved for AP (and I don't know, it could be divided equally) then the queue is only 60 Multibeam.

... and if the queue can't be replenished faster than the scheduler assigns work, then we get where we are today: lots of "no jobs available" messages because the feeder can't quite keep up.

Indeed. And so far as we can tell, it is "equally" (BOINC server default), despite Josef's stirling efforts to tell the staff

a) this isn't optimal
b) it is configurable

There are glimmers of hope that the penny may be beginning to drop in the 2nd. par of message 900643. But how long it will be before that understanding is translated into config files, history alone will tell.

I often find myself asking "are words like fault applicable in this case?"

One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway).

I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc.
ID: 901856 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 901858 - Posted: 31 May 2009, 0:05:58 UTC - in response to Message 901856.  

I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc.

The joys of optimising a system. Remove one bottle neck to find or create another. Rinse & repeat till the system is about right- or it's reached it's breaking point.
Then upgrade the system & start all over again, with different bottlenecks & solutions.
Grant
Darwin NT
ID: 901858 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14673
Credit: 200,643,578
RAC: 874
United Kingdom
Message 901864 - Posted: 31 May 2009, 0:15:13 UTC - in response to Message 901856.  
Last modified: 31 May 2009, 0:25:10 UTC


Even though there's plenty of MB work ready to send all i'm getting are
31/05/2009 8:22:51 SETI@home Message from server: (Project has no jobs available)
messages.

Matt made a comment the other day that was kind-of interesting.

The scheduler has a small queue of 100 work units that are available for assignment. Apparently, space is reserved for each application, so if there are 40 slots reserved for AP (and I don't know, it could be divided equally) then the queue is only 60 Multibeam.

... and if the queue can't be replenished faster than the scheduler assigns work, then we get where we are today: lots of "no jobs available" messages because the feeder can't quite keep up.

Indeed. And so far as we can tell, it is "equally" (BOINC server default), despite Josef's stirling efforts to tell the staff

a) this isn't optimal
b) it is configurable

There are glimmers of hope that the penny may be beginning to drop in the 2nd. par of message 900643. But how long it will be before that understanding is translated into config files, history alone will tell.

I often find myself asking "are words like fault applicable in this case?"

One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway).

I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc.

Who said anything about 'fault'? Searching this thread only throws it up as a component of 'default'. But you may be trying to steer us on to another train of thought.

One of the problems I perceive with the whole BOINC/project setup is that it is huge and complex. Like the committee of blind people trying to describe an elephant, each participant tends to see only the part nearest to them.

Matt Lebofsky is brilliant at managing the hotch-potch of servers, operating systems, databases and network appliances that inhabit his closet. But he doesn't actually understand BOINC very well (no criticism: just acknowledgement of his speciality). In this case - don't forget that a balance-swing towards MB would result in shorter, though admitedly more numerous, downloads. I think that's well within Matt's skillset to manage.

Edit - after all these years, I still can't spell Matt's surname properly. And I know what it feels like.
ID: 901864 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 901867 - Posted: 31 May 2009, 0:22:34 UTC

Don't make the kitties mad................
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 901867 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66202
Credit: 55,293,173
RAC: 49
United States
Message 901868 - Posted: 31 May 2009, 0:32:41 UTC - in response to Message 901867.  

Don't make the kitties mad................

That's not possible Mark, Yer not a Joker, As Mad is not the right word, Angry is. Mad means Crazy and so many people always use the word Mad to say that their Angry when to those Who know the real meaning It says Crazy. And I don't think Yer Crazy. Someone else that I know of, Maybe, Maybe not.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 901868 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 901879 - Posted: 31 May 2009, 1:36:52 UTC - in response to Message 901868.  
Last modified: 31 May 2009, 1:46:54 UTC

Don't make the kitties mad................

That's not possible Mark, Yer not a Joker, As Mad is not the right word, Angry is. Mad means Crazy and so many people always use the word Mad to say that their Angry when to those Who know the real meaning It says Crazy. And I don't think Yer Crazy. Someone else that I know of, Maybe, Maybe not.


Not meaning to be pedantic, but mad as in crazy is just as legitimate an English definition as mad as in angry is.

If not, then there's at least two generations of Sesame Streeters who were erroneously told, "...it's not bad to get mad!". :-)

<edit> Back on topic; Yeah, this replica server has been trying hard all day long to go 'castors up' off and on. :-(

Alinator
ID: 901879 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 901887 - Posted: 31 May 2009, 1:58:18 UTC

Some observations on the Feeder weighting:

In addition to Matt's comments, I had an email exchange with Eric, his response was also favorable to shifting to more slots for MB.

The ap_splitter processes very seldom produce more than 1 result per second. The Feeder is supposed to refill the AP_v5 slots in the shared memory for the Scheduler every 2 seconds. Unless there's a queue of "ready to send" AP_v5 results, guess how many slots can possibly be filled.

Ned's thought that getting the number of MB slots back to something approaching the 100 available before AP was released might cause download difficulties is certainly a risk. Yet only delivering about 45 MBits/sec of work when AP_v5 isn't available doesn't make sense to me, the goal ought to be getting the MB side caught up. That shared memory mechanism does have possibilities as a limiter to keep downloads from saturating, but I'd rather see it implemented as a deliberate throttle a project could use when needed.

Having over 90% MB should mean that hosts with default preferences get mostly MB work, leaving more of the AP_v5 work available for those who really want it. I don't think turtles eat cat food unless they're really hungry...

When 5.05+ comes out of Beta and the optimized AP advantage shrinks, I expect those who focus on credits to be unhappy. Once the server-side credit multiplier adapts, I think credit rates for optimized MB and AP may be about equal.
                                                                  Joe
ID: 901887 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901892 - Posted: 31 May 2009, 2:16:17 UTC - in response to Message 901858.  

I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc.

The joys of optimising a system. Remove one bottle neck to find or create another. Rinse & repeat till the system is about right- or it's reached it's breaking point.
Then upgrade the system & start all over again, with different bottlenecks & solutions.

I've been working on a list server that is so incredibly efficient at generating outbound mail that it efficiently flooded the underlying SMTP server.

Slowing down the list server actually made messages go out faster.
ID: 901892 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 901893 - Posted: 31 May 2009, 2:23:12 UTC - in response to Message 901887.  



Having over 90% MB should mean that hosts with default preferences get mostly MB work, leaving more of the AP_v5 work available for those who really want it. I don't think turtles eat cat food unless they're really hungry...

When 5.05+ comes out of Beta and the optimized AP advantage shrinks, I expect those who focus on credits to be unhappy. Once the server-side credit multiplier adapts, I think credit rates for optimized MB and AP may be about equal.
                                                                  Joe


Do not make fun of the kitties.........

Whoever the 'turtles' are.........they eat
'cat food'......whenever and wherever they want to.......SIR.......meeeeeeeeowwwwwwwwwwwwwwwwwwwwwwwwwwww.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 901893 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901895 - Posted: 31 May 2009, 2:29:10 UTC - in response to Message 901864.  
Last modified: 31 May 2009, 2:30:18 UTC


I often find myself asking "are words like fault applicable in this case?"

One side effect (intended or otherwise) is to smooth out downloads. If work units were assigned twice as quickly, you'd see twice as many simultaneous connections to the download server, and bandwidth use would have more peaks (which tend to exceed bandwidth anyway).

I suspect that too much tuning here could make things worse. The inefficiencies here just allow more time for uploads/downloads/assimilation/etc.

Who said anything about 'fault'? Searching this thread only throws it up as a component of 'default'. But you may be trying to steer us on to another train of thought.

One of my recurring themes is that we have to look at this whole thing as one system -- starting at the recorder on the telescope, and ending in the science database.

The system includes the BOINC software running on each volunteer's PC.

The goal should be to maximize throughput from end to end.

My (sadly, long) experience with that is that when you have a chain A->B->C->D and you make "B" truly optimal, it tends to suck "A" dry, and flood "C" -- neither of which is optimal when you look at the "ABCD" system and not at each process.

I've had thoughts like "why only 100 slots" in the scheduler ready queue, followed by "why not have one queue per application" or "why not just make the queue bigger."

.... and then I realize that would mean the scheduler would never turn a system away hungry, and all of those hosts will instantly want to download the assigned files, and now we have a huge spike in bandwidth.

The BOINC servers tell the BOINC client "wait 11 seconds before calling again" and if that was 11 minutes, it would slow down all of the clients hungrily demanding work. Slow down the connection rate and you reduce the load on the scheduler. Requests complete faster because there is more bandwidth per connection.

Slowing work assignment means fewer failed attempts.

... and as long as our clients are not empty, there will be no wasted CPU cycles.

I know it is a paradox, but slowing down would very likely make things a lot faster.
ID: 901895 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901898 - Posted: 31 May 2009, 2:42:53 UTC - in response to Message 901887.  


Ned's thought that getting the number of MB slots back to something approaching the 100 available before AP was released might cause download difficulties is certainly a risk. Yet only delivering about 45 MBits/sec of work when AP_v5 isn't available doesn't make sense to me, the goal ought to be getting the MB side caught up. That shared memory mechanism does have possibilities as a limiter to keep downloads from saturating, but I'd rather see it implemented as a deliberate throttle a project could use when needed.

It seems to me that the only reason to reserve slots in the queue is to make sure that those who say "only multibeam" will have a good chance of getting only multibeam and those who say "only astropulse" will have a good chance of getting only astropulse.

For that to work, you have to always have a little bit of each available.

The problem (like in so many other things) comes when predicting the future.

I've read elsewhere that right now the Astropulse splitters have processed all of the waiting tapes, and there is nothing more to split. If there are 50 Astropulse "slots" then and 0 work, then there are always 50 empty slots.

What I'd probably code: try to fill 50 slots with AP, try to fill 50 slots with MB, then fill whatever is still open with whatever you have.

Plan "B" would be to keep count of what is being sent (AP or MB) and try to follow the balance -- if you ran out of AP you'd be sending 100% MB and all slots would be MB -- maybe giving the "minority" project a few more slots than strictly needed so it would tend to increase when that work was available.

I'm sure there are other factors we haven't thought about (yet).
ID: 901898 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 901900 - Posted: 31 May 2009, 2:46:10 UTC - in response to Message 901895.  
Last modified: 31 May 2009, 2:47:34 UTC


One of my recurring themes is that we have to look at this whole thing as one system -- starting at the recorder on the telescope, and ending in the science database.

The system includes the BOINC software running on each volunteer's PC.

The goal should be to maximize throughput from end to end.

My (sadly, long) experience with that is that when you have a chain A->B->C->D and you make "B" truly optimal, it tends to suck "A" dry, and flood "C" -- neither of which is optimal when you look at the "ABCD" system and not at each process.

I've had thoughts like "why only 100 slots" in the scheduler ready queue, followed by "why not have one queue per application" or "why not just make the queue bigger."

.... and then I realize that would mean the scheduler would never turn a system away hungry, and all of those hosts will instantly want to download the assigned files, and now we have a huge spike in bandwidth.

The BOINC servers tell the BOINC client "wait 11 seconds before calling again" and if that was 11 minutes, it would slow down all of the clients hungrily demanding work. Slow down the connection rate and you reduce the load on the scheduler. Requests complete faster because there is more bandwidth per connection.

Slowing work assignment means fewer failed attempts.

... and as long as our clients are not empty, there will be no wasted CPU cycles.

I know it is a paradox, but slowing down would very likely make things a lot faster.

Seems all very logical. I have similar prolem with tesla card. If 2 threads are fed to it with "high" priority it will effectively drown out the other sytem process causing a decrease in total throughput.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 901900 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 901901 - Posted: 31 May 2009, 2:51:56 UTC

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 901901 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901902 - Posted: 31 May 2009, 3:05:15 UTC - in response to Message 901901.  

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............

Mark,

I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat.

It's only a problem when the kitties start to starve.

Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible.

-- Ned
ID: 901902 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 901903 - Posted: 31 May 2009, 3:10:27 UTC - in response to Message 901902.  

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............

Mark,

I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat.

It's only a problem when the kitties start to starve.

Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible.

-- Ned

Ned......don't try to coddle me..........
I only look after my kitties' best interests.........not yours or anybody else's/

Got it???
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 901903 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 901904 - Posted: 31 May 2009, 3:12:42 UTC - in response to Message 901903.  
Last modified: 31 May 2009, 3:14:46 UTC

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............

Mark,

I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat.

It's only a problem when the kitties start to starve.

Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible.

-- Ned

Ned......don't try to coddle me..........
I only look after my kitties' best interests.........not yours or anybody else's/

Got it???

Sorry Ned.......that was rather rude.
But to the point.......I did not mean to be so brash.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 901904 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 901913 - Posted: 31 May 2009, 3:32:52 UTC - in response to Message 901904.  
Last modified: 31 May 2009, 3:33:12 UTC

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............

Mark,

I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat.

It's only a problem when the kitties start to starve.

Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible.

-- Ned

Ned......don't try to coddle me..........
I only look after my kitties' best interests.........not yours or anybody else's/

Got it???

Sorry Ned.......that was rather rude.
But to the point.......I did not mean to be so brash.

The project should feed all the kitties, not just the ones who yeowl at the bowl. They can't cater to some "fat cats" and let the rest starve.

Got it?
ID: 901913 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66202
Credit: 55,293,173
RAC: 49
United States
Message 901926 - Posted: 31 May 2009, 4:21:16 UTC - in response to Message 901879.  

Don't make the kitties mad................

That's not possible Mark, Yer not a Joker, As Mad is not the right word, Angry is. Mad means Crazy and so many people always use the word Mad to say that their Angry when to those Who know the real meaning It says Crazy. And I don't think Yer Crazy. Someone else that I know of, Maybe, Maybe not.


Not meaning to be pedantic, but mad as in crazy is just as legitimate an English definition as mad as in angry is.

If not, then there's at least two generations of Sesame Streeters who were erroneously told, "...it's not bad to get mad!". :-)

<edit> Back on topic; Yeah, this replica server has been trying hard all day long to go 'castors up' off and on. :-(

Alinator

Are You sure the Server didn't get bit? Maybe the server needs a Rabies shot. ;)
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 901926 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66202
Credit: 55,293,173
RAC: 49
United States
Message 901927 - Posted: 31 May 2009, 4:22:28 UTC - in response to Message 901913.  

When all else fails...........and the kitties cannot get what they want for their kibble bowl............you will have lost one cruncher............

Mark,

I know the kitties expect the bowl to be kept completely full at all time, but as long as the bowl is not empty, they'll be able to eat.

It's only a problem when the kitties start to starve.

Also, please realize that a big part of this discussion is how to best keep food in the bowl -- and in every other bowl -- as much as possible.

-- Ned

Ned......don't try to coddle me..........
I only look after my kitties' best interests.........not yours or anybody else's/

Got it???

Sorry Ned.......that was rather rude.
But to the point.......I did not mean to be so brash.

The project should feed all the kitties, not just the ones who yowl at the bowl. They can't cater to some "fat cats" and let the rest starve.

Got it?

Nope, Equal rations to all.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 901927 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (16) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.