Message boards :
Number crunching :
Panic Mode On (63) Server problems?
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
And now uploads appear to becomming rather iffy. Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Regarding stuck WUs that have been around for a long period of time.. Do those have to be manually kicked by one of the staff? I know it can be a painful and cumbersome task. I was thinking maybe we could make a thread for listing as many as can be found, and maybe then, some sort of shell script can be made up once there's a list of taskIDs? Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0 |
"I doubt that SETI is playing favorites. But on the other hand I have not seen anyone attempt to explain why different host computers can get all the work they want and others get nothing. Since the bandwidth is maxed out anyway there seems to be no interest in persuing this problem." -------------------------------------------------- Thats what I thought,,,that they would send out work randomly. But now I wonder. Seems the same people over and over get all they want while others can't get so much as one WU. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I doubt that SETI is playing favorites. I've seen the same thing on my own network (not related to the HE issues). Single-core machine had no problem building a 10-day cache while the [at the time] quad-core machine was struggling to keep 2-3 days of cache. I know I'm not the only one that sees/saw things like that. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I doubt that SETI is playing favorites. What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain. Same goes for GPU hosting rigs....the faster they are right now, the worse off they are. What makes it really tough on the big rigs is that when things are working OK, as in between shorty storms, they are not now allowed to build a large enough cache to carry them through the times when comms tighten up. That's what really sux for us right now. "Time is simply the mechanism that keeps everything from happening all at once." |
Jon Send message Joined: 12 Aug 09 Posts: 157 Credit: 139,063,241 RAC: 0 |
You said it bro... Jon |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
That's why I have been stumping for weeks now for the Admins and Devs to address the Boinc code problems, get them behind us, and get the dang limits lifted. "Time is simply the mechanism that keeps everything from happening all at once." |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? That could be the case, but I am not certain. If they are on hiatus for the whole week, they may let things limp along as they are other than possibly what could be addressed by remote. Or, possibly an outage later in the week. Just dunno for sure. "Time is simply the mechanism that keeps everything from happening all at once." |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain. I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work. The quad-core machine would have about a 10% success rate. Slow machine would get its ~50 MBs in less than 10 requests, but the quad would have to ask for work 50+ times to get maybe 75. Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
What is so odd about that? All things being equal, the quad core is going to do 4 times the amount of work than the single core would. So would have to get and successfully download 4 times the amount of work just to stay even, much less build it's cache. So when up/download and work requests are not flowing well, it's going to be the first one to feel the pain. I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. "Time is simply the mechanism that keeps everything from happening all at once." |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. That is true. And I've stated a few times that if we can get more bandwidth, it may create a whole new pile of problems all by itself by allowing more successful contacts to the database. It's one of those things that we'll just have to wait and see what happens and have some contingency plans lined up for some of the possible scenarios. However, the good news is that with all of the enterprise-class networking equipment that is in place, we can get an actual gigabit link, but still rate-limit it to 100mbit, or 150mbit, whatever seems to allow the smoothest data transfer while keeping the database from getting DDoS'ed. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I think optimizing the scheduler is a moot point until such time as there is bandwidth available to support it. My view is that has to happen first, then scheduler or other server based bottlenecks can be addressed as they are identified. You can schedule all the work you want, but if the hosts cannot get it downloaded, it cannot be processed. Well, if you peruse the information in the GPUUG fundraising thread, you will see than many hardware upgrades are well on their way to being completed. With more to come. As far as I know, we still do not have a real path in place for upgrading the bandwidth, other than having the project's pleas fall on the deaf ears of the Berk IT admins. "Time is simply the mechanism that keeps everything from happening all at once." |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? Hmm, the UC Berkeley Academic Calendar shows Monday, Tuesday, Thursday, and Friday as "Academic and Administrative Holiday". Joe |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Shouldn't we be in the middle of the usual Tuesday outage by now? Maybe they'll skip the outage this time, because staff is on leave during Christmas/New Year? Ahhh.... So it looks like some of our indentured servants may be in the lab tomorrow for an outage party. "Time is simply the mechanism that keeps everything from happening all at once." |
Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0 |
"I do agree, however the part that I forgot was that the single-core machine would get at least one task about 95% of the time it asked for work." --------------------------------------------------- I can't even get anything for my single core machine. |
Richard1949 Send message Joined: 20 Oct 99 Posts: 18 Credit: 232,635 RAC: 0 |
"Something else I'm pondering is if there is any way to speed up the refill rate for the feeder. I've heard that it fills up every two seconds. I wonder if that can be dropped to 1 second if it's even possible? That may alleviate a lot of those "project has no tasks available" messages when server status shows 200,000+ waiting to be assigned." ---------------------------------------------- I keep getting "not requesting any tasks." |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
15min to download 1 WU is a bit of a PITA when it takes less than 3min to do 2. Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 36828 Credit: 261,360,520 RAC: 489 |
Personally I still put the current problems on the connection itself between the USA side of our undersea cable and HE as using a proxy here quickly clears any backlogs that occur. Cheers. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
I keep getting "not requesting any tasks." And why would that be a server problem when your client doesn't ask for work? Gruß, Gundolf |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Yeah, that certainly would appear to be that under-sea cable. I noticed in my messages tab last night that between "starting download" and "finished download" for an AP, 19 seconds elapsed (~430KB/sec). Of course it was a B3_P1 WU, so it took 24 seconds to error out once processing started. Go figure. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.