Message boards :
Number crunching :
Panic Mode On (84) Server Problems?
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 21 · Next
Author | Message |
---|---|
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I think its more a feeder problem since still over 300.000 V7 are ready to send. The question then becomes: do we split a feeder or feed a splitter? |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30870 Credit: 53,134,872 RAC: 32 |
Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do. About to run out of work is normal with the BOINC 7.X. Unless you change the setting for connect every to a long value, it drains the queue until it is dry before it asks for more work. e.g. set to default 10 times a day, queue is drained to 1/10 day's work. Set to every 2 days, queue is drained to 2 days work. Yeah, kind of works backwards. I don't suggest a too big value for the extra work fill either. We haven't had a long outage since they have moved to the co-lo and too big a number here simply runs you into the max tasks limit and you don't get it anyway. |
kittyman Send message Joined: 9 Jul 00 Posts: 51470 Credit: 1,018,363,574 RAC: 1,004 |
And I see the 'mystery bandwidth' has continued. The kitties have been losing cache all night. Not a usual thingy since the move to the colo. Hopefully they'll sort it all during today's outage. "Time is simply the mechanism that keeps everything from happening all at once." |
Fred E. Send message Joined: 22 Jul 99 Posts: 768 Credit: 24,140,697 RAC: 0 |
Guess something got fixed. My first request after maintenance brought me back to the limits. Hope it holds up. And I see the 'mystery bandwidth' has continued. Don't know what to think of that. Maybe they thorottled back to Lab-to-CoLo transfers? Another Fred Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop. |
rob smith Send message Joined: 7 Mar 03 Posts: 22385 Credit: 416,307,556 RAC: 380 |
Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
@Gary - THANKS! That explains why I7-3820 (my new cruncher) wasn't even asking for WUs except when nearly empty. I changed the settings per your suggestion, and the machine promptly loaded up to the limit. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36014 Credit: 261,360,520 RAC: 489 |
Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-( It certainly isn't giving in, is it? Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13821 Credit: 208,696,464 RAC: 304 |
Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51470 Credit: 1,018,363,574 RAC: 1,004 |
Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out. Well, that stuck dataset 20jn12ac is not helping anything. Tying up one MB splitter that is not producing WUs. Eric said that Matt had restarted it. It apparently is still in terminal fail mode. I send Eric another message, and we'll have to wait until tomorrow to see if they kick it again or just boot the sucker. "Time is simply the mechanism that keeps everything from happening all at once." |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732 The beginnings of an inconclusive train. I guess build 1846 and stock CPU app didn't quite agree with each other..? Its obvious cuda32 just completely got it wrong. It'll be interesting to see what happens when cuda50 gives it a whirl. edit: and then I looked into it more.. the cuda32 attempt is a runaway machine. hostid 6721035. PMed to inform them and pointed to NC to ask for solutions for fixing it. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732 Wish we'd get behind why some hosts don't print stderr... bloody annoying that. A person who won't read has no advantage over one who can't read. (Mark Twain) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13821 Credit: 208,696,464 RAC: 304 |
I only just noticed it, but it looks like we hit another record after todays outage. 691Mb/s download traffic. Grant Darwin NT |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732 And the cuda50 task came back and agreed with both CPU apps. Looks like the runaway host was doing the thing the server is set to do.. send a few (thousand?) tasks to each type of app and see which works best. 32 obviously didn't work, 42 isn't working either, and 50 isn't showing any promise, either. Bad drivers, or a bad card is what I'm thinking. Or it could just be some other environment misconfiguration. Hard telling. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30870 Credit: 53,134,872 RAC: 32 |
@Gary - THANKS! That explains why I7-3820 (my new cruncher) wasn't even asking for WUs except when nearly empty. I changed the settings per your suggestion, and the machine promptly loaded up to the limit. Welcome. I think a lot of people are getting caught by this and don't realize it. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13821 Credit: 208,696,464 RAC: 304 |
Network traffic graph is interesting at the moment- it's got to be the flatest (while the system is running) that i can recall. Grant Darwin NT |
kittyman Send message Joined: 9 Jul 00 Posts: 51470 Credit: 1,018,363,574 RAC: 1,004 |
Isn't it time now to finally kick that 20jn12ac file? It's been sitting there, stuck for over a week now, holding up one splitter from doing useful splitting. I poked Eric about it again.... Dunno if he's in the lab right now or not. "Time is simply the mechanism that keeps everything from happening all at once." |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13821 Credit: 208,696,464 RAC: 304 |
Network traffic graph is interesting at the moment- it's got to be the flatest (while the system is running) that i can recall. At least untill the huge spike went off the 24hr graph. Still, while up & down there aren't any huge spikes or dips in it. Grant Darwin NT |
rob smith Send message Joined: 7 Mar 03 Posts: 22385 Credit: 416,307,556 RAC: 380 |
And while they are loading some new tapes perhaps they can do something about poor old 20jn12ac which has been stuck at this: 20jn12ac 50.20 GB (13) (done) For days. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
And while they are loading some new tapes perhaps they can do something about poor old 20jn12ac which has been stuck at this: They have done something about it. There has not been a splitter running on that file for several days. Still listed on the page, but not being processed. Donald Infernal Optimist / Submariner, retired |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Good job, anonymous wingmate. I appreciate you aborting your AP on the CPU and making too many errors to validate. workunit 1266348384 Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.