Message boards :
Number crunching :
Panic Mode On (101) Server Problems?
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 27 · Next
Author | Message |
---|---|
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Drat. My lowly single-core machine picked up a re-send for one of the terminally-ill MBs. There goes the consecutive valid count. http://setiathome.berkeley.edu/workunit.php?wuid=1954413233 edit: Question: if the auto-corr config values (as mentioned by Richard) are zero instead of the values they should be... then theoretically, couldn't one just open the WU in a hex editor and put those values back to something non-zero so it would crunch properly? Surely it's not that simple of a fix though... edit2: I think I just understood from re-reading.. that's in the output result file that is from. So then it would still have to be something in the header for the WU itself that decides it can't run auto-corr, right? Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
Got to love the perversity of chance. Gotta love it. Like how my Pentium D just sucked down APs, but the quad core Xeon, nope ... |
rob smith Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380 |
Well it looks as if the splitters are behaving themselves just now.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Well it looks as if the splitters are behaving themselves just now.... I had been thinking that their splitting rate has been higher than it's been for some time (pretty much since the PFB splitters came in. Generally anything less than 5 splitters running & output is barely 27/s. Multiple splitters on the one channel, even less. So far all 7 splitters have been running on only 4 (even down to 2) files & still they're pumping out the work). I just didn't want to tempt fate. To add to the perversity of chance, now that so many caches are pretty much empty, 90% of the work I've been getting have been shorties. Although there are some GPU WUs i'll keep an eye on. 04mr11ae Estimated completion times for longer running GPU WUs are usually not much more than 35min. These ones are all around 45min. Grant Darwin NT |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
It was a really nice flow lately, lots of APs gave me an nice RAC but now it's over again. No APs, no MBs, lots of invalid tasks >>>>> had to power down my cruncher once again. Just wish this project would be a bit more stable. |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
NX-01 you can probably blame Matt he probably was the sucker left to do the programming changes It's called passing the buck hehehehehe Sorry Matt couldn't resit that one :-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
edit: Question: if the auto-corr config values (as mentioned by Richard) are zero instead of the values they should be... then theoretically, couldn't one just open the WU in a hex editor and put those values back to something non-zero so it would crunch properly? Surely it's not that simple of a fix though... Yes, the autocorr settings I quoted were lifted from the downloaded data file before it was crunched. They could be edited between downloading and crunching, so that the proper analysis was done and reported. But there are two flies in that ointment - one potential, one certain. Potential: editing the WU data file would change its MD5 checksum. I think that's only checked as the download completes, but it might get checked when the task is launched as well (it probably should be). BOINC would be within its rights to reject the file for tampering. Certain: unless you could be certain that your wingmate had also edited the data, your result would be different from all the others, and would fail validation. |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
NX-01 you can probably blame Matt he probably was the sucker left to do the programming changes It's called passing the buck hehehehehe Don't they test changes on beta anymore before they go live on main? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Don't they test changes on beta anymore before they go live on main? Posted earlier in this very thread, Just so you know we're working on the splitter problem - a new bit of splitter code was put into play yesterday. It was working well enough in beta, but apparently it still wasn't ready for prime time. We have some debugging and cleaning up to do but we'll be back soon enough with more workunits.... - Matt Grant Darwin NT |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
Well, that's strange then. BTW: Everthing was running fine before, at least for me, so I wonder what problem they are trying to fix with the new code? (sorry if the answer is already in this thread somewhere). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Well, that's strange then. They've been working slowly behind the scenes for most of this year, preparing the entire processing chain (telescope --> recorder --> splitter --> application(s) --> validator --> assimilator) to handle observations made at the Green Bank observatory. The new splitters are dual-purpose, designed to handle either Arecibo or Green Bank data as required. Edit - none of that is particularly new, I'm just repeating what Matt has posted in Technical News. See, for example, Jun 23 2015 and Aug 31 2015. |
qbit Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0 |
Ok, thx Richard, hope they can fix everything soon. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
But there are two flies in that ointment - one potential, one certain. The MD5s are easy enough.. just make the change to the header, re-MD5 the file, put the new hash into client_state. Unless that MD5 is cross-checked with the scheduler upon contact (I would hope that it does), then I can likely see that being a problem. As far as wingmates.. I know there's hardly any guarantee that random wingmates would ever respond, and then it would be even less likely that if anyone does respond, they would know how to fix their WU the same as you did. In the case of some of these WUs, you just need two of them out of the total of 10 to match. So if 6 or 7 of the wingmates never respond to PMs about it.. you just need one out of the total of 9 to respond and know how to do this. Of course, this is all hypothetical at best anyway, because I believe this totally falls into the category of tampering which is not only unethical but also prohibited. I was just wondering if there was technically something that could be done on the client-side to fix these broken WUs, and I suppose I already got my answer... theoretically, yes; realistically, no. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Caches are slowly filling, only another 300,000 to go. Returned-per-hour is right up there, over 100,000/hr for the last 6 hours. Hopefully some of the new files will give a few more longer running WUs. Help reduce the load a bit. Grant Darwin NT |
Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25 |
I'm still getting an unusually high number of invalid's. Just me or are others getting them as well. Thanks |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
There are a lot of invalid tasks floating though the system due to a coding error, which I believe has been fixed. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
There are a lot of invalid tasks floating though the system due to a coding error, which I believe has been fixed. I haven't noticed any errors on the WUs since they had a play with the splitter code to sort it out. Although I've already had several _9s on my systems, those automatic error WUs will be floating around for months. Grant Darwin NT |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I also got a lot more invalids today. Most of them have no autocorr still. With each crime and every kindness we birth our future. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I'm still getting an unusually high number of invalid's. Since the original splitter problem seems to have been fixed, everything you get from now on for those WUs will be resends, tasks _2 thru _9. Whereas a "good" WU only requires 2 hosts to put it to bed, these suckers need 5 times that many, all of it just wasted host processing. In the absence of any action by the admins to block the resends and stop wasting resources, a lot of those WUs will be circling the drain for many weeks to come. After getting stuck with about 30 Invalids on my xw9400 in the initial wave, I've since managed to abort about 150 of those garbage tasks before they could run, freeing up a lot of processing time for actual productive work! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I also got a lot more invalids today. Resends (or the dregs of your cache, depending on how fast you process work), will take months to clear them all out. And probably 90% of all your current Inconclusives will end up being Invalid as well. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.