Message boards :
Technical News :
quick server side update
Message board moderation
Author | Message |
---|---|
Jeff Cobb Send message Joined: 1 Mar 99 Posts: 122 Credit: 40,367 RAC: 0 |
Things are looking OK from on this end. When the project was brought on line yesterday, none of the public facing servers were dropping TCP connections with the exception of the upload server. TCP drops on the upload server went to zero in about three hours. The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity? The assimilators suddenly decided to start crashing on vader - a general protection exception in libc. I need to track that down. In the meantime, I moved the assimliators to bambi where they appear to run fine. Except for the known, occasional, memory leak which can, and did, bring a machine to it's knees. Another thing to track down. In the meantime (there are too many "meantimes"), I put an assimilator restarter in place on bambi. This method has been working well on vader. The assimilator queue is now draining. Others have reported it, but I will report it again here. The job limits we started, and ran with, with yesterday were: CPU 5 per processor GPU 40 per processor total (global) limit : 140 About an hour ago, I upped it just a bit to 6, 48, 150. I will remove all limits on Monday. We will go for a better mix of files (and angle ranges) going into next week's server run. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Jeff..... Could you consider removing the global limit and just work with the per device limits? I have 2 rigs that have work cached for the CPU in excess of the global limit, so it prevents them from getting any work for the GPUs, which are now just idling and consuming power. Others have reported the same problem. With the per device limit in place, I fail to see why the global limit is necessary. Thanks, Mark "Time is simply the mechanism that keeps everything from happening all at once." |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Thanks for the update Jeff. Things are working fine for me. VLARs are the biggest problem but this batch of work doesn't seem to have many so I don't have to move much around. I am still trying to bring down the VLARs from last week and they are all I have on my CPUs. Even so, I still have plenty of room for my GPU to get it's fill. PROUD MEMBER OF Team Starfire World BOINC |
Jeff Cobb Send message Joined: 1 Mar 99 Posts: 122 Credit: 40,367 RAC: 0 |
Jeff..... The global limit is just a backstop in case the per proc limits somehow go awry. We want the non-crunchers to get a WU in edgewise. But the global limit probably does not have to be so low. I just raised it to 1000. Let's see what the cricket graph shows as a result of that. I think the cricket shape we are trying for is that of a bath tub. High during the Friday opening and the Monday queue filling but less than max on the weekend. How deep that tub is is where the tuning comes in. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Jeff..... Thank you so much Jeff!!! That did the trick. Both GPUs have now received some WU allocations and are happily back at work. Thanks again! "Time is simply the mechanism that keeps everything from happening all at once." |
Andre Howard Send message Joined: 16 May 99 Posts: 124 Credit: 217,463,217 RAC: 0 |
Thanks Jeff Gpus finally have some work |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Jeff......... Seems to me that downloads are going well. I would hope that you loosen up the restricions for CPU's and GPU's again later today and again tomorrow. So that some of the caches would be partially filled or maybe filled before you release all restrictions on Monday. With everyone trying to fill the caches on Monday, that's just not working due to server loading. Boinc....Boinc....Boinc....Boinc.... |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Jeff......... In my case, the load caused my E8500 PC to be reported as detached, loosing what cache i had, and then when i tried for fresh work, i end up with about 140 Ghost Wu's, I've just got a couple more Wu's to do, then i can excise them. Claggy |
Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0 |
The boinc database is keeping up. It was doing ~1000 queries per second mos of the day yesterday. It's down to about half of that now. Hiding those two threads (jobs limits and outage schedule) really helped. I'm not sure why those queries were hanging around so much. Number of posts? Waves of popularity? How many posts did the threads have? What query was hanging? Contribute to the Wiki! |
ront Send message Joined: 25 Aug 01 Posts: 77 Credit: 386,336 RAC: 0 |
Good Afternoon, Can anyone tell me how to get more than 2 WUs at a time? I have my "dingus" set to "10 days"and I still get only 2. If it is not an "AP", then I am out of business by the beginning of the 2d workday. Your counsel would be most appreciated. ront |
AllenIN Send message Joined: 5 Dec 00 Posts: 292 Credit: 58,297,005 RAC: 311 |
Jeff, I don't know much about the configuration of the Seti db, but right now I only get 10 wu's at a time and when the servers shutdown, that only leaves me a day and a half of work for my dual processor. I have to sit doing nothing for the other day and a half. Anything that I can do about that? Allen |
KB7RZF Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 |
Good Afternoon, Hi Ront, As Jeff posted in the 1st message, currently SETI@home has a work unit limit.
From Jeff's post, that I quoted, this is the reason your only getting a few WU's. On a Single Core computer, you can have up to 5 for your CPU, a Dual Core, 10, and so on. If you run a GPU, you can have up to 40 per processor. This is to help with slamming the server's with so many bottlenecks. Hope that helps a little, someone else will probably come around and explain further into it. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Jeff posted over in NC forum that he has raised the global limit to 1000 a few hours ago and I believe he raised the CPU limit to 8 per core and the GPU limit to 50 per. PROUD MEMBER OF Team Starfire World BOINC |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
The server stats going down makes it difficult to judge things, but looking at the network traffic graphs it shows that there were 2 very solid periods of traffic, and since then it's been very quiet (apart from a couple of very small bursts). I was thinking next week it would be worth doing away with the overall limit from the begining, keeping the CPU limit at 5 per processor, but dropping the GPU limit to 20. This way those with multiple GPUs will still be able to get work for all their processors, and those with just a slower CPU will also be able to get work. After the initial burst, up the CPU limit to 10, the GPUs to 20. After the next burst up the CPUs to 20, the GPUs to 40. With a bit of luck, this will give 3 periods of maximum bandwidth, but each one for less time than the previous one & still keeping all systems busy- even those with multiple GPUs. After the 3rd surge it would probably be OK to do away with the host limits completely, or at the very least tripple them (ie 60 for the CPU, 120 for the GPU) & do away with the limit after that surge of traffic. Grant Darwin NT |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
I hate to agree with Grant, but I will. Janice |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
Can these limit changes be done remote? |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Network traffic seems pretty tame again... Time for another bump in the limits? "Time is simply the mechanism that keeps everything from happening all at once." |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Network traffic seems pretty tame again... Thinking the same thing myself- been over 15 hours now with very little traffic. I was thinking of removing the overall limit & doubling the exitisng CPU & GPU limits & see how long the resulting surge lasts for. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Network traffic seems pretty tame again... Yup...seems a shame to waste the bandwidth now and have a 24 hour cram session come Monday. And what if the servers crash then? "Time is simply the mechanism that keeps everything from happening all at once." |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
Jeff, Allen: To help prevent database/server crashes due to overload after the 3-day outage, there are TEMPORARY limits in place for downloads, based on how many WUs your computers already have in progress. Last time I checked, they were at 8 per cpu core, 48 per gpu, and 1000 per host. Since (per the public data on your S@H profile page) you have more than 8 WUs in progress on each of your cpu's, that's all you can get right now. You may get some more as you complete WUs, or as Jeff raises the limits. Jeff has been raising the limits periodically since the servers came up on Friday morning, and he has said he will remove all temporary limits by Monday morning. Then you should be able to load up for the next 3-Day outage. More current info is available in several threads running in the Number Crunching section of the Forum. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.