Message boards :
Number crunching :
Panic Mode On (101) Server Problems?
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 27 · Next
Author | Message |
---|---|
ChrisD Send message Joined: 25 Sep 99 Posts: 158 Credit: 2,496,342 RAC: 0 |
LOL, how many Commodore Cassettes does it take to hold 50GB? Anyone still have a Comodore to verify? :) As far as I remember data was written at 1200 baud and each block was written twice for error correction. (Maybe I am wrong, it might have been 600 Baud only. Anyone still have that manual?) At 50 bytes/sec a 60 min Cassette will hold 175 KiloBytes. Where can we store 285,715 cassettes :) :) ChrisD |
WezH Send message Joined: 19 Aug 99 Posts: 576 Credit: 67,033,957 RAC: 95 |
...that was the event I was thinking about - totally frustrating, but at leas this time the splitters are running out of tapes very rapidly.... Hmm... 90-minutes tape (45 minutes on each side) will hold on the order of 150 kilobytes on each side if no compression or fast loader is used More that I can carry :D |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
Just noticed one of my latest downloads has an estimated run time of 10,776 hours. It had been running for 12 minutes with only 0.001% completed. I noticed that also. I`m still at work so can`t look any closer atm. We had such an issue at beta not that Long ago. All invalids have no autocorr section. With each crime and every kindness we birth our future. |
Mike Send message Joined: 17 Feb 01 Posts: 34255 Credit: 79,922,639 RAC: 80 |
This Task has autocorr but also ran 100% on CPU- http://setiathome.berkeley.edu/result.php?resultid=4496130911 With each crime and every kindness we birth our future. |
Gene Send message Joined: 26 Apr 99 Posts: 150 Credit: 48,393,279 RAC: 118 |
21no11aa batch of work throwing "triplets >30" kind of error. So far this morning, I've had 12 tasks end with computation error. They are all from the 21no11aa.994.18891.5.12.xx batch. Doesn't seem to matter whether they're for CPU or GPU. I picked one GPU failed work unit and reran it (in a "benchmark" sandbox) as a CPU task; it ended with the same stderr failure. There are 8 more of these in the work buffer. They only take a few seconds to exit so I'll just let them pass through in turn. Some wingmen I can find are also showing the same triplet count error, but the tasks are being reissued to reach a quorum. /EDIT: The autocorr count is missing in the stderr result. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Just so you know we're working on the splitter problem - a new bit of splitter code was put into play yesterday. It was working well enough in beta, but apparently it still wasn't ready for prime time. We have some debugging and cleaning up to do but we'll be back soon enough with more workunits.... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
It was working well enough in beta, but apparently it still wasn't ready for prime time That is why it is called the weekly outrage. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Thanks Matt... |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
Hello Huston are you there !! Huston ozzie 1 one here are you reading us !!! Huston hello !! Where having difficulty reading you Huston are you there !!!! AS the crew start to panic " What's happened down there we haven't herd from them in hours " says one crew member , another say's "oh no WW3 has started that's why we can't hear them Huston has been hit with a nuke aaaaaaaaaaaaaaahhhhhhhhhhhhh , where doomd" Hope you can get things sorted soon I'm out of GPU work and i'll be out off cpu work in a few more hours anyway fingers crossed you can fix the problem soon |
Wild6-NJ Send message Joined: 4 Aug 99 Posts: 43 Credit: 100,336,791 RAC: 140 |
Hello Huston are you there !! Huston ozzie 1 one here are you reading us !!! Huston hello !! Where having difficulty reading you Huston are you there !!!! (Apologies to the vegans out there) |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Still, can't say we didn't try Yeah, that's the same thing that happened back in January. Even though the splitter gets fixed, if they don't do anything to block resends for those WUs, new tasks just keep getting created and sent back out again until the WU maxes out with 10 Invalids, doing nothing but wasting host resources along the way. Very irritating! Back in January and February, I managed to abort most of the ones I received that I could identify. I'll probably start doing it again shortly with these. The thing is, that earlier batch all came from one original file, as I recall, whereas this time there seem to be multiple source files. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Woke up this morning to find 55 Invalids, notice also that the Server Status shows 5 splitters running, but the Splitter Status shows only 3, all working on the 1 last file. Work in progress has dropped by around 1 million. It's going to take a very long time to recover from this outage once the splitters are sorted out. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I got my first batch from the new tape about half an hour ago. All shorties, and with the new reduced file size (which I suspect is deliberate - it doesn't seem to be a problem by itself). But at least the v7 processing seems to be working properly for this batch. I've also seen a couple of changes made to the splitters, to make it less likely they'll lose their configuration data, and to shut them down automatically if it all goes wrong. Time will tell. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
More odd WUs. 26oc11ac.1324.4157.12.12.89_1 26oc11ac.1324.4157.12.12.107_1 26oc11ac.1324.4157.12.12.65_0 26oc11ac.1324.4157.12.12.59_0 26oc11ac.1324.4157.12.12.71_1 26oc11ac.1324.4157.12.12.77_1 26oc11ac.1324.4157.12.12.83_1 26oc11ac.1324.4157.12.12.234_1 26oc11ac.1324.4157.12.12.113_1 26oc11ac.1324.4157.12.12.240_1 26oc11ac.1324.4157.12.12.47_0 26oc11ac.1324.4157.12.12.41_1 All shorties. They start running, % Progress counts off, till they get to about 5%, then it resets to zero. Elapsed time continues to run, Progress just sits on 0% Aborted all. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
At least you let them run long enough to display Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... in stderr.txt I gave Jason one of those (run to completion, so we could be sure it wasn't the "too many triplets" half of that information message), but he hasn't commented on the alternative threshold levels yet. Reporting pseudo-progress until the first checkpoint is standard for your v7.6.6 client. It's supposed to reassure you that something is happening. Edit: 26oc11ac? that tape was split some 14 hours ago, while you were asleep. Since then, we've had several hours without work, and now new tapes with new splitters. I'll reserve judgement until the morning. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Reporting pseudo-progress until the first checkpoint is standard for your v7.6.6 client. It's supposed to reassure you that something is happening. It certainly does. Watching time ticking away with no progress being made is... unsettling. Especially since the day before I had a WU that ran for 30min with progress stuck at 0.001% and the estimated run time had climbed to 10,776 hours. WUs in question, 14jl11ac.12197.15609.3.12.158_0 14jl11ac.12197.15609.3.12.156_1 The next WUs I got ran OK, but were very, very, very short. 08ap11ae.30787.24607.9.12.242_1 08ap11ae.30787.24607.9.12.88_1 08ap11ae.30787.24607.9.12.248_0 2 to GPU, 1 to CPU. GPU estimated run times were under 3 min, took 1:43 (usual shorty estimate 12min) CPU estimated run time about 35min, 10% done in 3 min (usual shorty estimate 1hr 40m). Completed OK. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Another couple of odd WUs. 16se11ab.25031.20517.5.12.238_3 23oc11ah.12765.24804.6.12.19_3 GPU WU, don't know what the estimated run times were, but they completed in just over 3min 30s. Usual time to completion for GPU shorties is 13-16min. The result of no autocorrelation? Even so, before it was introduced shortie WUs (running 2 at a time) took way longer than 3min 30 to process. Should be out of GPU work on this system in the next 30min or so. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Just noticed some more anaomalies, these ones VLARs. Usual runtime on this system is 4-4.5hrs. Estimated run times for these VLARs- 1hr 50min- 2hr 2min. Would normally take most of the day to get to them. Will suspend other work & see how they go. 16oc11aa.20967.14169.7.12.90.vlar_0 29ap11ad2518.14791.8.12.129.vlar_2 21no11aa.31868.24617.13.12.60.vlar_2 EDIT- All of the 16oc11aa WUs ran for 4 secs & then finished. Same with 260c11ac & 26ap11ab WUs. 21no11aa, one completed after 2min 20s, others still running. Other WUs still running. And another of the running, but no Progress WUs. 13min and counting, 0.000% done, estimated time remaining- 23,664hrs & climbing. Aborted. 26ap11ab.9472.85225.14.12.42_2 Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Got to love the perversity of chance. I've got 2 systems, a Core 2 Duo & and i7. Naturally the i7 can do a lot more work than the C2D. With the present lack of work, the C2D gets work every 45min or so. The i7, every 2 (or more) hours. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.