quick server side update

Message boards : Technical News : quick server side update
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Jeff Cobb Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 122
Credit: 40,367
RAC: 0
United States
Message 1014300 - Posted: 10 Jul 2010, 15:01:29 UTC

Things are looking OK from on this end. When the project was brought on line yesterday, none of the public facing servers were dropping TCP connections with the exception of the upload server. TCP drops on the upload server went to zero in about three hours.

The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity?

The assimilators suddenly decided to start crashing on vader - a general protection exception in libc. I need to track that down. In the meantime, I moved the assimliators to bambi where they appear to run fine. Except for the known, occasional, memory leak which can, and did, bring a machine to it's knees. Another thing to track down. In the meantime (there are too many "meantimes"), I put an assimilator restarter in place on bambi. This method has been working well on vader. The assimilator queue is now draining.

Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.


ID: 1014300 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1014306 - Posted: 10 Jul 2010, 15:15:35 UTC - in response to Message 1014300.  
Last modified: 10 Jul 2010, 15:16:10 UTC



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1014306 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1014307 - Posted: 10 Jul 2010, 15:25:49 UTC - in response to Message 1014300.  

Thanks for the update Jeff. Things are working fine for me. VLARs are the biggest problem but this batch of work doesn't seem to have many so I don't have to move much around. I am still trying to bring down the VLARs from last week and they are all I have on my CPUs. Even so, I still have plenty of room for my GPU to get it's fill.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1014307 · Report as offensive
Jeff Cobb Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 122
Credit: 40,367
RAC: 0
United States
Message 1014309 - Posted: 10 Jul 2010, 15:38:32 UTC - in response to Message 1014306.  



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark


The global limit is just a backstop in case the per proc limits somehow go awry. We want the non-crunchers to get a WU in edgewise. But the global limit probably does not have to be so low. I just raised it to 1000. Let's see what the cricket graph shows as a result of that.

I think the cricket shape we are trying for is that of a bath tub. High during the Friday opening and the Monday queue filling but less than max on the weekend. How deep that tub is is where the tuning comes in.
ID: 1014309 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1014315 - Posted: 10 Jul 2010, 15:46:38 UTC - in response to Message 1014309.  



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark


The global limit is just a backstop in case the per proc limits somehow go awry. We want the non-crunchers to get a WU in edgewise. But the global limit probably does not have to be so low. I just raised it to 1000. Let's see what the cricket graph shows as a result of that.

I think the cricket shape we are trying for is that of a bath tub. High during the Friday opening and the Monday queue filling but less than max on the weekend. How deep that tub is is where the tuning comes in.

Thank you so much Jeff!!!
That did the trick. Both GPUs have now received some WU allocations and are happily back at work.

Thanks again!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1014315 · Report as offensive
Andre Howard
Volunteer tester
Avatar

Send message
Joined: 16 May 99
Posts: 124
Credit: 217,463,217
RAC: 0
United States
Message 1014323 - Posted: 10 Jul 2010, 15:52:06 UTC

Thanks Jeff Gpus finally have some work

ID: 1014323 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1014360 - Posted: 10 Jul 2010, 16:38:17 UTC

Jeff.........

Seems to me that downloads are going well. I would hope that you loosen up the restricions for CPU's and GPU's again later today and again tomorrow. So that some of the caches would be partially filled or maybe filled before you release all restrictions on Monday.

With everyone trying to fill the caches on Monday, that's just not working due to server loading.

Boinc....Boinc....Boinc....Boinc....
ID: 1014360 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1014367 - Posted: 10 Jul 2010, 16:59:14 UTC - in response to Message 1014360.  

Jeff.........

Seems to me that downloads are going well. I would hope that you loosen up the restricions for CPU's and GPU's again later today and again tomorrow. So that some of the caches would be partially filled or maybe filled before you release all restrictions on Monday.

With everyone trying to fill the caches on Monday, that's just not working due to server loading.

In my case, the load caused my E8500 PC to be reported as detached, loosing what cache i had, and then when i tried for fresh work, i end up with about 140 Ghost Wu's,
I've just got a couple more Wu's to do, then i can excise them.

Claggy
ID: 1014367 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 1014396 - Posted: 10 Jul 2010, 18:30:16 UTC - in response to Message 1014300.  

The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity?

How many posts did the threads have? What query was hanging?

Contribute to the Wiki!
ID: 1014396 · Report as offensive
ront

Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1014428 - Posted: 10 Jul 2010, 19:35:17 UTC

Good Afternoon,

Can anyone tell me how to get more than 2 WUs at a time? I have my "dingus" set to "10 days"and I still get only 2. If it is not an "AP", then I am out of business by the beginning of the 2d workday.

Your counsel would be most appreciated.

ront
ID: 1014428 · Report as offensive
Profile AllenIN
Volunteer tester
Avatar

Send message
Joined: 5 Dec 00
Posts: 292
Credit: 58,297,005
RAC: 311
United States
Message 1014430 - Posted: 10 Jul 2010, 19:44:27 UTC - in response to Message 1014309.  

Jeff,

I don't know much about the configuration of the Seti db, but right now I only get 10 wu's at a time and when the servers shutdown, that only leaves me a day and a half of work for my dual processor. I have to sit doing nothing for the other day and a half. Anything that I can do about that?

Allen

ID: 1014430 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 1014440 - Posted: 10 Jul 2010, 20:07:33 UTC - in response to Message 1014428.  

Good Afternoon,

Can anyone tell me how to get more than 2 WUs at a time? I have my "dingus" set to "10 days"and I still get only 2. If it is not an "AP", then I am out of business by the beginning of the 2d workday.

Your counsel would be most appreciated.

ront

Hi Ront,

As Jeff posted in the 1st message, currently SETI@home has a work unit limit.

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.


From Jeff's post, that I quoted, this is the reason your only getting a few WU's. On a Single Core computer, you can have up to 5 for your CPU, a Dual Core, 10, and so on. If you run a GPU, you can have up to 40 per processor. This is to help with slamming the server's with so many bottlenecks. Hope that helps a little, someone else will probably come around and explain further into it.
ID: 1014440 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1014444 - Posted: 10 Jul 2010, 20:22:55 UTC - in response to Message 1014440.  

Jeff posted over in NC forum that he has raised the global limit to 1000 a few hours ago and I believe he raised the CPU limit to 8 per core and the GPU limit to 50 per.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1014444 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1014451 - Posted: 10 Jul 2010, 20:42:14 UTC - in response to Message 1014444.  
Last modified: 10 Jul 2010, 20:45:05 UTC

The server stats going down makes it difficult to judge things, but looking at the network traffic graphs it shows that there were 2 very solid periods of traffic, and since then it's been very quiet (apart from a couple of very small bursts).

I was thinking next week it would be worth doing away with the overall limit from the begining, keeping the CPU limit at 5 per processor, but dropping the GPU limit to 20.
This way those with multiple GPUs will still be able to get work for all their processors, and those with just a slower CPU will also be able to get work. After the initial burst, up the CPU limit to 10, the GPUs to 20. After the next burst up the CPUs to 20, the GPUs to 40.
With a bit of luck, this will give 3 periods of maximum bandwidth, but each one for less time than the previous one & still keeping all systems busy- even those with multiple GPUs. After the 3rd surge it would probably be OK to do away with the host limits completely, or at the very least tripple them (ie 60 for the CPU, 120 for the GPU) & do away with the limit after that surge of traffic.
Grant
Darwin NT
ID: 1014451 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1014453 - Posted: 10 Jul 2010, 20:46:50 UTC - in response to Message 1014451.  

I hate to agree with Grant, but I will.
Janice
ID: 1014453 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1014476 - Posted: 10 Jul 2010, 21:36:24 UTC

Can these limit changes be done remote?
ID: 1014476 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1014486 - Posted: 10 Jul 2010, 22:40:12 UTC

Network traffic seems pretty tame again...
Time for another bump in the limits?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1014486 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 1014487 - Posted: 10 Jul 2010, 22:43:43 UTC - in response to Message 1014486.  

Network traffic seems pretty tame again...
Time for another bump in the limits?


Thinking the same thing myself- been over 15 hours now with very little traffic.
I was thinking of removing the overall limit & doubling the exitisng CPU & GPU limits & see how long the resulting surge lasts for.
Grant
Darwin NT
ID: 1014487 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1014491 - Posted: 10 Jul 2010, 22:57:44 UTC - in response to Message 1014487.  

Network traffic seems pretty tame again...
Time for another bump in the limits?


Thinking the same thing myself- been over 15 hours now with very little traffic.
I was thinking of removing the overall limit & doubling the exitisng CPU & GPU limits & see how long the resulting surge lasts for.

Yup...seems a shame to waste the bandwidth now and have a 24 hour cram session come Monday. And what if the servers crash then?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1014491 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1014532 - Posted: 11 Jul 2010, 2:16:15 UTC - in response to Message 1014430.  
Last modified: 11 Jul 2010, 2:19:03 UTC

Jeff,

I don't know much about the configuration of the Seti db, but right now I only get 10 wu's at a time and when the servers shutdown, that only leaves me a day and a half of work for my dual processor. I have to sit doing nothing for the other day and a half. Anything that I can do about that?

Allen


Allen:

To help prevent database/server crashes due to overload after the 3-day outage, there are TEMPORARY limits in place for downloads, based on how many WUs your computers already have in progress. Last time I checked, they were at 8 per cpu core, 48 per gpu, and 1000 per host. Since (per the public data on your S@H profile page) you have more than 8 WUs in progress on each of your cpu's, that's all you can get right now. You may get some more as you complete WUs, or as Jeff raises the limits.

Jeff has been raising the limits periodically since the servers came up on Friday morning, and he has said he will remove all temporary limits by Monday morning. Then you should be able to load up for the next 3-Day outage.

More current info is available in several threads running in the Number Crunching section of the Forum.
ID: 1014532 · Report as offensive
1 · 2 · 3 · Next

Message boards : Technical News : quick server side update


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.