quick server side update


log in

Advanced search

Message boards : Technical News : quick server side update

1 · 2 · 3 · Next
Author Message
Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 1 Mar 99
Posts: 110
Credit: 40,367
RAC: 0
United States
Message 1014300 - Posted: 10 Jul 2010, 15:01:29 UTC

Things are looking OK from on this end. When the project was brought on line yesterday, none of the public facing servers were dropping TCP connections with the exception of the upload server. TCP drops on the upload server went to zero in about three hours.

The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity?

The assimilators suddenly decided to start crashing on vader - a general protection exception in libc. I need to track that down. In the meantime, I moved the assimliators to bambi where they appear to run fine. Except for the known, occasional, memory leak which can, and did, bring a machine to it's knees. Another thing to track down. In the meantime (there are too many "meantimes"), I put an assimilator restarter in place on bambi. This method has been working well on vader. The assimilator queue is now draining.

Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.


____________

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38682
Credit: 573,454,634
RAC: 545,575
United States
Message 1014306 - Posted: 10 Jul 2010, 15:15:35 UTC - in response to Message 1014300.
Last modified: 10 Jul 2010, 15:16:10 UTC



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,203,464
RAC: 11,911
United States
Message 1014307 - Posted: 10 Jul 2010, 15:25:49 UTC - in response to Message 1014300.

Thanks for the update Jeff. Things are working fine for me. VLARs are the biggest problem but this batch of work doesn't seem to have many so I don't have to move much around. I am still trying to bring down the VLARs from last week and they are all I have on my CPUs. Even so, I still have plenty of room for my GPU to get it's fill.
____________


PROUD MEMBER OF Team Starfire World BOINC

Jeff Cobb
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 1 Mar 99
Posts: 110
Credit: 40,367
RAC: 0
United States
Message 1014309 - Posted: 10 Jul 2010, 15:38:32 UTC - in response to Message 1014306.



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark


The global limit is just a backstop in case the per proc limits somehow go awry. We want the non-crunchers to get a WU in edgewise. But the global limit probably does not have to be so low. I just raised it to 1000. Let's see what the cricket graph shows as a result of that.

I think the cricket shape we are trying for is that of a bath tub. High during the Friday opening and the Monday queue filling but less than max on the weekend. How deep that tub is is where the tuning comes in.
____________

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38682
Credit: 573,454,634
RAC: 545,575
United States
Message 1014315 - Posted: 10 Jul 2010, 15:46:38 UTC - in response to Message 1014309.



Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were:

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.

Jeff.....
Could you consider removing the global limit and just work with the per device limits?
I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power.
Others have reported the same problem.
With the per device limit in place, I fail to see why the global limit is necessary.

Thanks,
Mark


The global limit is just a backstop in case the per proc limits somehow go awry. We want the non-crunchers to get a WU in edgewise. But the global limit probably does not have to be so low. I just raised it to 1000. Let's see what the cricket graph shows as a result of that.

I think the cricket shape we are trying for is that of a bath tub. High during the Friday opening and the Monday queue filling but less than max on the weekend. How deep that tub is is where the tuning comes in.

Thank you so much Jeff!!!
That did the trick. Both GPUs have now received some WU allocations and are happily back at work.

Thanks again!
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Andre Howard
Volunteer tester
Avatar
Send message
Joined: 16 May 99
Posts: 119
Credit: 151,485,294
RAC: 83,207
United States
Message 1014323 - Posted: 10 Jul 2010, 15:52:06 UTC

Thanks Jeff Gpus finally have some work
____________

Profile Geek@PlayProject donor
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2464
Credit: 85,425,554
RAC: 20,211
United States
Message 1014360 - Posted: 10 Jul 2010, 16:38:17 UTC

Jeff.........

Seems to me that downloads are going well. I would hope that you loosen up the restricions for CPU's and GPU's again later today and again tomorrow. So that some of the caches would be partially filled or maybe filled before you release all restrictions on Monday.

With everyone trying to fill the caches on Monday, that's just not working due to server loading.

____________
Boinc....Boinc....Boinc....Boinc....

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4058
Credit: 32,807,749
RAC: 5,290
United Kingdom
Message 1014367 - Posted: 10 Jul 2010, 16:59:14 UTC - in response to Message 1014360.

Jeff.........

Seems to me that downloads are going well. I would hope that you loosen up the restricions for CPU's and GPU's again later today and again tomorrow. So that some of the caches would be partially filled or maybe filled before you release all restrictions on Monday.

With everyone trying to fill the caches on Monday, that's just not working due to server loading.

In my case, the load caused my E8500 PC to be reported as detached, loosing what cache i had, and then when i tried for fresh work, i end up with about 140 Ghost Wu's,
I've just got a couple more Wu's to do, then i can excise them.

Claggy

Nicolas
Avatar
Send message
Joined: 30 Mar 05
Posts: 160
Credit: 10,335
RAC: 0
Argentina
Message 1014396 - Posted: 10 Jul 2010, 18:30:16 UTC - in response to Message 1014300.

The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity?

How many posts did the threads have? What query was hanging?
____________

Contribute to the Wiki!

ront
Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1014428 - Posted: 10 Jul 2010, 19:35:17 UTC

Good Afternoon,

Can anyone tell me how to get more than 2 WUs at a time? I have my "dingus" set to "10 days"and I still get only 2. If it is not an "AP", then I am out of business by the beginning of the 2d workday.

Your counsel would be most appreciated.

ront
____________

Profile AllenIN
Send message
Joined: 5 Dec 00
Posts: 159
Credit: 12,919,550
RAC: 14,196
United States
Message 1014430 - Posted: 10 Jul 2010, 19:44:27 UTC - in response to Message 1014309.

Jeff,

I don't know much about the configuration of the Seti db, but right now I only get 10 wu's at a time and when the servers shutdown, that only leaves me a day and a half of work for my dual processor. I have to sit doing nothing for the other day and a half. Anything that I can do about that?

Allen

____________

KB7RZF
Volunteer tester
Avatar
Send message
Joined: 15 Aug 99
Posts: 9463
Credit: 3,087,193
RAC: 2,299
United States
Message 1014440 - Posted: 10 Jul 2010, 20:07:33 UTC - in response to Message 1014428.

Good Afternoon,

Can anyone tell me how to get more than 2 WUs at a time? I have my "dingus" set to "10 days"and I still get only 2. If it is not an "AP", then I am out of business by the beginning of the 2d workday.

Your counsel would be most appreciated.

ront

Hi Ront,

As Jeff posted in the 1st message, currently SETI@home has a work unit limit.

CPU 5 per processor
GPU 40 per processor
total (global) limit : 140

About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday.

We will go for a better mix of files (and angle ranges) going into next week's server run.


From Jeff's post, that I quoted, this is the reason your only getting a few WU's. On a Single Core computer, you can have up to 5 for your CPU, a Dual Core, 10, and so on. If you run a GPU, you can have up to 40 per processor. This is to help with slamming the server's with so many bottlenecks. Hope that helps a little, someone else will probably come around and explain further into it.
____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,203,464
RAC: 11,911
United States
Message 1014444 - Posted: 10 Jul 2010, 20:22:55 UTC - in response to Message 1014440.

Jeff posted over in NC forum that he has raised the global limit to 1000 a few hours ago and I believe he raised the CPU limit to 8 per core and the GPU limit to 50 per.
____________


PROUD MEMBER OF Team Starfire World BOINC

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,533,143
RAC: 48,570
Australia
Message 1014451 - Posted: 10 Jul 2010, 20:42:14 UTC - in response to Message 1014444.
Last modified: 10 Jul 2010, 20:45:05 UTC

The server stats going down makes it difficult to judge things, but looking at the network traffic graphs it shows that there were 2 very solid periods of traffic, and since then it's been very quiet (apart from a couple of very small bursts).

I was thinking next week it would be worth doing away with the overall limit from the begining, keeping the CPU limit at 5 per processor, but dropping the GPU limit to 20.
This way those with multiple GPUs will still be able to get work for all their processors, and those with just a slower CPU will also be able to get work. After the initial burst, up the CPU limit to 10, the GPUs to 20. After the next burst up the CPUs to 20, the GPUs to 40.
With a bit of luck, this will give 3 periods of maximum bandwidth, but each one for less time than the previous one & still keeping all systems busy- even those with multiple GPUs. After the 3rd surge it would probably be OK to do away with the host limits completely, or at the very least tripple them (ie 60 for the CPU, 120 for the GPU) & do away with the limit after that surge of traffic.
____________
Grant
Darwin NT.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,630,854
RAC: 163
United States
Message 1014453 - Posted: 10 Jul 2010, 20:46:50 UTC - in response to Message 1014451.

I hate to agree with Grant, but I will.
____________

Janice

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 836
Credit: 42,239,640
RAC: 68,851
Denmark
Message 1014476 - Posted: 10 Jul 2010, 21:36:24 UTC

Can these limit changes be done remote?

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38682
Credit: 573,454,634
RAC: 545,575
United States
Message 1014486 - Posted: 10 Jul 2010, 22:40:12 UTC

Network traffic seems pretty tame again...
Time for another bump in the limits?
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,533,143
RAC: 48,570
Australia
Message 1014487 - Posted: 10 Jul 2010, 22:43:43 UTC - in response to Message 1014486.

Network traffic seems pretty tame again...
Time for another bump in the limits?


Thinking the same thing myself- been over 15 hours now with very little traffic.
I was thinking of removing the overall limit & doubling the exitisng CPU & GPU limits & see how long the resulting surge lasts for.
____________
Grant
Darwin NT.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38682
Credit: 573,454,634
RAC: 545,575
United States
Message 1014491 - Posted: 10 Jul 2010, 22:57:44 UTC - in response to Message 1014487.

Network traffic seems pretty tame again...
Time for another bump in the limits?


Thinking the same thing myself- been over 15 hours now with very little traffic.
I was thinking of removing the overall limit & doubling the exitisng CPU & GPU limits & see how long the resulting surge lasts for.

Yup...seems a shame to waste the bandwidth now and have a 24 hour cram session come Monday. And what if the servers crash then?
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6092
Credit: 663,953
RAC: 1,235
United States
Message 1014532 - Posted: 11 Jul 2010, 2:16:15 UTC - in response to Message 1014430.
Last modified: 11 Jul 2010, 2:19:03 UTC

Jeff,

I don't know much about the configuration of the Seti db, but right now I only get 10 wu's at a time and when the servers shutdown, that only leaves me a day and a half of work for my dual processor. I have to sit doing nothing for the other day and a half. Anything that I can do about that?

Allen


Allen:

To help prevent database/server crashes due to overload after the 3-day outage, there are TEMPORARY limits in place for downloads, based on how many WUs your computers already have in progress. Last time I checked, they were at 8 per cpu core, 48 per gpu, and 1000 per host. Since (per the public data on your S@H profile page) you have more than 8 WUs in progress on each of your cpu's, that's all you can get right now. You may get some more as you complete WUs, or as Jeff raises the limits.

Jeff has been raising the limits periodically since the servers came up on Friday morning, and he has said he will remove all temporary limits by Monday morning. Then you should be able to load up for the next 3-Day outage.

More current info is available in several threads running in the Number Crunching section of the Forum.

1 · 2 · 3 · Next

Message boards : Technical News : quick server side update

Copyright © 2014 University of California