Message boards :
Number crunching :
Panic Mode On (97) Server Problems?
Message board moderation
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Meow! The kitties have 157 to play with............... ?? "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Oh i got 1 AP. Nice. With each crime and every kindness we birth our future. |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Meow! The kitties have 157 to play with............... Thinking about it may be a week and a half was a bit generous. Servers have definitely been keeping the ready to send the buffer full for the last 3 or so days. I forgot about when we almost ran out of work |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Meow! The kitties have 157 to play with............... The GPUs got a bit shy, but with 9 rigs going, the CPU buffer is always doing pretty well. I can run the GPUs dry and still have days worth of work on the CPUs. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Cactus Bob Send message Joined: 19 May 99 Posts: 209 Credit: 10,924,287 RAC: 29 |
Well I got 14 of the AP puppies. The most I have had ever. Mind you I just returned a couple months ago after being AWAL for several years. This is on 1 machine so 157 on 9 machines seems close. I haven't tweaked anything to get more AP's but maybe It would be worth a shot. May the AP force be with you Bob ------------------------ Sig files are overrated, maybe Sometimes I wonder, what happened to all the people I gave directions to? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SETI@home classic workunits 4,321 SETI@home classic CPU time 22,169 hours |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
My APs are buried in cache. But, I'll turn them in sooner than most. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
I managed to snag 80 APs before they ran out. I disagree about workload distribution not being on this thread. It should be the function of the scheduling server to distribute work efficiently. If that is not occurring for whatever reason it's germane to this thread. Perhaps a re-think of the scheduling program to take into account the number of errors produced by a user and apportion the work units accordingly. i.e. 0 errors = full distribution, 25% error rate = 25% reduction in WUs delivered to those machines, etc. Just an idea. "Sour Grapes make a bitter Whine." <(0)> |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
What I think ... If you get an error, you should get a test file (with known results), and until you return a valid result for the "test" you don't get more work, you just get "test" files. Send in a valid test WU, then you get work. Basically, Prove to me you can provide scientific results, I don't want crap! |
wiesel111 Send message Joined: 5 Jan 08 Posts: 9 Credit: 1,227,675 RAC: 1 |
wow I got a AP from computer 7567951 I did some checking and found computers 7567951 , 7567912 ,7568646 ,7568596 and 7567941 Computer information is identical right down to the floating point and integer speeds hmmmm interesting 21 + "7568637" Here a small update of the identical computer. One was doubled (7568508), but I find some more: 7572890, 7567931, 7568597, 7568503, 7569519, 7572867 After the new ap's from today I found 8 more (7567911, 7567922, 7567938, 7568510, 7568518, 7568644,7569479, 7572925) All 35 identical in chronological order of building: 7567886, 7567911, 7567912, 7567913, 7567922, 7567924, 7567931, 7567938, 7567941, 7567951, 7568499, 7568501, 7568503, 7568504, 7568508, 7568510, 7568511, 7568512, 7568513, 7568516, 7568518, 7568519, 7568596, 7568597, 7568637, 7568640, 7568642, 7568643, 7568644, 7568646, 7569479, 7569519, 7572867, 7572890, 7572925 |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I don't know if this can be considered exploiting the system, but it's not really a new concept (I've known about it for several years, myself). I'm not sure it has any effect. Looking over the logs while I was asleep. I managed 92 new AP while they were going out & I set a 10 day cache for my venues. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Seahawk Send message Joined: 8 Jan 08 Posts: 937 Credit: 8,157,029 RAC: 5 |
I got 10 wu_0 ap buried in 190 MB. Its been so long since I've seen a wu_0 that I had to look several times to make sure. I used to be a cruncher like you, then I took an arrow to the knee. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I'm not sure it has any effect. Looking over the logs while I was asleep. I managed 92 new AP while they were going out & I set a 10 day cache for my venues. But did you already have something in your cache (MBs)? I had more success getting something on an empty cache by requesting a small amount of work at a time, rather than 2.6M seconds. I remember when there are 10+ tapes available, it often took 10-20 "no work available" replies before I'd start to get some APs, but once they started coming in, the cache would fill moderately quickly. I managed to snag 80 APs before they ran out. I disagree about workload distribution not being on this thread. It should be the function of the scheduling server to distribute work efficiently. If that is not occurring for whatever reason it's germane to this thread. Here's an idea: how about the quota system? It's already in-place, it just needs to have the minimum value be able to go down to 1, and more importantly, be enforced (I'm about 98% sure it is not enforced at all presently). But that's a rant I've had a few times in another thread already. I'll stop now. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I'm pretty sure that some older incarnations of Boinc do not work with the limits in place on the servers at all. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I think I've heard that. Old builds like 5.10.45 can get thousands of WUs presently, and I think it is due to the fact that those really old builds don't send a list of what they have back to the server during scheduler contacts. I think that started in the 6.x series. I know 6.2.19 sends its list during contact. But other than that, I don't think the quota system is enforced. Shortly after APv7 was released, my "max tasks per day" was in the mid-40s, and I got close to 70 APs in the course of an hour or two. If the quota system was enforced.. I shouldn't have been able to get more than "max tasks per day." That's what I'm basing my theory off of, so.. I could be wrong. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... I don't think the quota system is enforced. Shortly after APv7 was released, my "max tasks per day" was in the mid-40s, and I got close to 70 APs in the course of an hour or two. If the quota system was enforced.. I shouldn't have been able to get more than "max tasks per day." That's what I'm basing my theory off of, so.. I could be wrong. If that was a single core system, perhaps you're right. Otherwise, the "max tasks per day" is multiplied by the number of cores for CPU tasks, and for GPU is further multiplied by the project's "gpu_multiplier" setting. I fully agree that quota system is woefully inadequate for hosts which turn in a lot of results as successfully processed which are subsequently found to be invalid. I can remember at least two extended discussions on the boinc_dev mailing list where various arguments were presented concerning possible adjustments. Those were probably before GPU processing became a major factor. In any case, IMO it would take concern from multiple projects to convince Dr. Anderson a change should be made. Joe |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
... I don't think the quota system is enforced. Shortly after APv7 was released, my "max tasks per day" was in the mid-40s, and I got close to 70 APs in the course of an hour or two. If the quota system was enforced.. I shouldn't have been able to get more than "max tasks per day." That's what I'm basing my theory off of, so.. I could be wrong. Okay, yeah, that's kind of what I thought was going on in the other thread that I was discussing/ranting about this in. The quotas should be per application, not per device/cores. So that makes it even scarier for those runaway machines that spew out thousands upon thousands of -9 overflow tasks, because the quota will not go below 33, and if they have multiple GPUs * gpu_multiplier = entirely useless to even have a quota system in the first place. Thanks for clearing that one up.. I've been trying to wrap my head around the problem to try to suggest reasonable solutions, and I needed more information. So those are my suggestions: make the quotas per application, not device, and let it go down to 1. That won't fix the problem on runaway machines, but it will do some serious damage control and essentially keep them from being a problem. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
As of 9 Jun 2015, 11:30:04 UTC Replica seconds behind master Offline 0m :( Aloha, Uli |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
After the new ap's from today I found 8 more (7567911, 7567922, 7567938, 7568510, 7568518, 7568644,7569479, 7572925) Last 5 are not identical to first 30. Anyway, first 30 (feels like classroom) identical computers has currently 537 AP units in progress. Almost 8% of all workunits in field. I did see 20 Valid WU's and 558 WU's 203 (0xcb) EXIT_ABORTED_VIA_GUI. Is there a problem with GTX660 driver, ver 350.12 on every machine? Or what? |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
Is there a problem with GTX660 driver, ver 350.12 on every machine? From what I have seen on the threads.....YES. "Sour Grapes make a bitter Whine." <(0)> |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Ver 350.12 is OpenCl 1.2 If you want to crunch APs with this version you need to modify your apps with the ones provided by Raistmer. Otherwise you need to roll back to an earlier version that still has OpenCL 1.1 Think that is Ver 347.88. Zalster |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.