1)
Message boards :
Number crunching :
I QUIT!
(Message 915689)
Posted 8 Jul 2009 by ![]() Post: Quote of Bill Walker: Are you sure you're Canadian? you(sic) over simplify(sic) complicated arguements(sic) like a German, or an American. OK, time for a mid-thread review. The message quoted here is the most, um, 'interesting' of the bunch, and it does help me to draw a serious conclusion...... Seriously, that must be some strong beer and Schnapps you've got in The Great White North! Bob |
2)
Message boards :
Number crunching :
I'm quitting Seti 'cause of increased power cost coming from "Cap and Trade"
(Message 912376)
Posted 28 Jun 2009 by ![]() Post: ... Not to bash on those who choose to go the extra mile, but the project never expected anyone to build entire farms and shoulder more responsibility (power consumption) than anyone else. Well put, Ozzfan! I can vouch for your last paragraph. Quite often, our competitive nature turns a hobby into something that feels like a job... A difficult job that takes over too much of our life. Bob |
3)
Message boards :
Number crunching :
CLOSED** SETI/BOINC Milestones (tm) XVII **CLOSED
(Message 908926)
Posted 18 Jun 2009 by ![]() Post:
Hey Byron! Yeah, SETI@home is the ultimate test for any computer. The next one is going to get tortured just like the others. It will be fun. Hey everybody with milestones - You found SETI@home, you found the forums, you found the Milestones thread, you experienced a personal milestone, and you posted it. Isn't that a milestone in itself? It took me 4 years before I stumbled over to the forums. Doh. Bob |
4)
Message boards :
Number crunching :
CLOSED** SETI/BOINC Milestones (tm) XVII **CLOSED
(Message 908790)
Posted 18 Jun 2009 by ![]() Post: I retired the Top Host that served so well for so long. It was a wonderful machine. Yep, that ExecPC BBS. Still available via Telnet at bbs.execpc.com My good friend Curt Shambeau is still running it in his basement with an original Novell file server and 80386 hardware. Yes, 80386. He's going for the duration record. I birthed it in 1983, I left in 1998. We spanned the era from 110baud to broadband, Arpanet to Internet. Wow, what a trip. Past employees are now re-forming the collective on Facebook. Anecdotes? Oh my. Visits from the FBI (sometimes with concealed weapons) to ask me if I am illegally downloading HBO movies from satellites with "those dozens of 2400baud modems"...Death threats from users gone crazy...Employees and/or users coming together in marriage...The employee party where they drank no beer, just Mountain Dew, and slam-danced themselves dizzy (and someone made our company logo out of Spam)...The international file compression competition...The Million Caller Milestone competitions...We used up all of the phone lines in an entire industrial park before we went to fiber...And the intensity of working with 100 brilliant employees plus thousands of blazingly intelligent users (peaked at 85,000 active customers during my tour of duty), all for the same cause - to advance mankind's progress through new and better ways to share knowledge and information. It was fun. The good people who run SETI@home and the loyal users of SETI@home are, for me, a wonderful deja vu of the experience of creating and running ExecPC. That is one of the biggest compliments I can give. Thanks, OzzFan. Bob |
5)
Message boards :
Number crunching :
CLOSED** SETI/BOINC Milestones (tm) XVII **CLOSED
(Message 908014)
Posted 16 Jun 2009 by ![]() Post: I retired the Top Host that served so well for so long. It was a wonderful machine. Check out my profile here for a picture of the computer, plus some technical data. Bob |
6)
Message boards :
Number crunching :
6.6.36 Released - FYI
(Message 907896)
Posted 15 Jun 2009 by ![]() Post: The scheduler in 6.6.36 seems to be totally broken. Starting a machine from scratch last night I've received probably 300 6.03s. Meanwhile my GPUs have been idle for 16 hours. S@H only on the machine in question. Funny, the night before that I received about 300 608s and only two 603s. Then got some 603s, one took a looong time to run, it adjusted the Duration Correction Factor (DCF), that put the system into 'hurry up' mode (EDF), that created a string of "Waiting to run", that over-ran the GPU memory, that locked up the computer, that influenced me to detach it from the project. Now the computer is sitting in the penalty box. :) The host in question will be allowed to play again, but only after it and BOINC decide to get along. Waiting patiently for BOINC 6.10.xx, which should have multiple DCF, one for each class of task on that host. Just a note of support: IMHO, SETI@home and BOINC are the most ambitious projects of their type in all the world. All research projects have periods of instability and adaptation. We are in the middle of one of those times. It is the nature of the beast. Bleeding edge is painful but exciting. Just an opinion from a SETI@home fan... Bob |
7)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 906190)
Posted 11 Jun 2009 by ![]() Post: 1) AP tasks influencing the anticipated running times of CUDA tasks is a design flaw in BOINC which we're going to have to live with until at least BOINC v6.10 Thanks, Richard, that sums it up nicely. The Lunatics Unified Installer was so much fun to play with, I was hoping to avoid doing the fpops/flops process on another computer. Now that I've done the fpops/flops thing, "To completion" times have settled down, so the odds of EDF mode will decrease. I have also decreased the CUDA cache to 2 days. Funny, isn't it, that the answer for another BOINC-exception-case on SETI@home once again is "keep your cache under 2 or 3 days"? Bob |
8)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 905949)
Posted 10 Jun 2009 by ![]() Post: Fred said: ... When one EDF CUDA WU finished another started and the partner EDF CUDA WU continued to completion. When a CPU MB WU completed and uploaded, there was no effect on the CUDA WU's in flight. And when I gave up and restored the original settings (including removing the config flags) and allowed the pre-empted CUDA WU's to complete, they did so without error (no -5!!). The only difference I can point out with my situation is as follows: Whenever the "Waiting to run" problem happened, I had a queue of waiting CPU tasks that was way beyond my "Additional work buffer" setting. On one computer I had 5 days of CPU tasks waiting, "Additional work buffer" set to 1.6 days. The other computer had 10+ days of CPU tasks waiting, "Additional work buffer" set to 4.5 days. Perhaps BOINC (or whatever) only gets confused when the CPU tasks queue is full of extra days of work in comparison to the GPU queue? Bob |
9)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 905800)
Posted 10 Jun 2009 by ![]() Post: Fred, is there an imbalance in cache size between your 608 vs. 603+AP queue on your system? I mean, do you have an intentional difference in cache size, achieved by filling up one, then going back to a shorter queue for running? On mine there is a full 10 days of AP for the CPU, and half that duration of MB for CUDA. I'm wondering if BOINC is confused by the cache size contrast? That is it! That is exactly what I just observed. I turned the host back on for 5 minutes and observed the following: 1. An AP task completed. 2. This influenced the CUDA tasks to nearly triple their "To completion" estimates from 8min to 22min. 3. EDF mode arrived, and CUDA task hijacking began. Yesterday, after restoring 10+ days of AP, and running ONLY AP (no CUDA), my system put the 8 running AP tasks into "Running high priority", with no ill effect. Problem is only with EDF on CUDA, possibly only while running CUDA in conjunction with AP or 603 on CPU. Troubling is the fact that completion of an AP task on the CPU should heavily influence the "time to completion" estimate for CUDA tasks. If that is how the current BOINC works, that does not seem appropriate. Bob |
10)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 905783)
Posted 10 Jun 2009 by ![]() Post: If it's anything like mine (running v6.6.28, .31, or .33), then once the GPU is in EDF mode (and it doesn't show "high priority" in BM) then for every 2 tasks that start, one will be left in "waiting to run" when the next 2 start until it comes out of EDF mode. I have recently noticed that with v6.6.33, when the GPU is in EDF mode, the completion of a CPU task will put *both* GPU tasks into "wating to run" and start a new CPU tasks and 2 new CUDA tasks. The only way to avoid all this seems to be to stay out of EDF mode on the CUDA which I do manage to do most of the time with my 3-day cache. I think that is exactly what happened. It looked like the first "waiting to run" is proper, later deadline, single task suspended. As you say, the next time (minutes later) a GPU task (typically a shorty) starts, it caused two running GPU tasks to revert to "waiting to run". After that, suspension happened in multiples of two. Keep in mind we are running two GPU. Let us not forget that Questor is experiencing this with a single GPU. So he must be getting singles of "waiting to run" as it happens. It does NOT happen when I run ONLY CPU or ONLY GPU. I can beat the heck out of such a setup and it never fails. As soon as I add a task to the other side of the computer, it eventually has this issue. Fred, is there an imbalance in cache size between your 608 vs. 603+AP queue on your system? I mean, do you have an intentional difference in cache size, achieved by filling up one, then going back to a shorter queue for running? On mine there is a full 10 days of AP for the CPU, and half that duration of MB for CUDA. I'm wondering if BOINC is confused by the cache size contrast? Note: Don't anyone get too upset about the big AP cache here - a 3.3.31 "waiting to run" crash took most of my hard drive with it last week. I finally got it running long enough to retrieve the 'lost' AP units yesterday. Bob |
11)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 905763)
Posted 10 Jun 2009 by ![]() Post: Wating to Run issue: Hoo Hah! It just created another seven! This thing is a "Waiting to run" factory! Bob (Edit changed one to two. Oh yeah.) (Edit again, changed two to seven. Wow.) (Last edit: I'm shutting this computer off. Ready for debugging.) |
12)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 905761)
Posted 10 Jun 2009 by ![]() Post: Wating to Run issue: I have a system here that created 12 waiting to run last night (from which it crashed). I rebooted it, aborted the 12 hung tasks, and it just created 3 more in the past few minutes. If it is not too tough to do, I'll volunteer it for debugging. (Thanks to everyone who responded so far. No obvious pattern has emerged re. this problem.) Bob |
13)
Message boards :
Number crunching :
Warning once again when upgrading Boinc to newer versions
(Message 905751)
Posted 10 Jun 2009 by ![]() Post: I've had all of those problems with both 6.6.31 and 6.6.33. 12 tasks were "waiting to run", and this system crashed out last night. The odd part is Vyper is NOT running any tasks on the CPU (is that correct?), and I thought this problem only appeared in the CPU+GPU situation. There is no good pattern to the problem yet. Bob |
14)
Message boards :
Number crunching :
Panic Mode On (16) Server problems
(Message 905537)
Posted 9 Jun 2009 by ![]() Post: Here I wait Spinning Atom CPU In an idle state Not a WU to do! Entropy encroaching Little processor that could Time to do some coaxing Kick the servers! Kick them good! Hee hee. |
15)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 904018)
Posted 5 Jun 2009 by ![]() Post: Re. the "waiting to run" issue, where tasks say "waiting to run" and never get restarted... It happens to Fred W when he is running CUDA tasks plus either AP or MB on his CPU. Same for me. Is everyone who is experiencing this problem running tasks on their CPU as well as on CUDA? If so, this might be a very good clue about the bug. Bob |
16)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 903682)
Posted 4 Jun 2009 by ![]() Post: Bob, you may be thinking of the way they have narrowed the range on the VLARs. I've got a few that would have been -6'd before but are now deemed to be acceptable to run. Raistmer was a bit on the cautious side when he first figured the range. Good point. I'll watch for a near-VLAR to see if the duration factor goes out of wack and forces high priority. Also, I'll try starting with a cache of .1day to decrease the odds of a long-running WU skewing durations too far. Bob |
17)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 903680)
Posted 4 Jun 2009 by ![]() Post: From Richard: I miss the old "Accessible view" option in BOINC - where tasks resorted to 'natural' order. Points: 1. I run only SETI@home, no other projects at this time. 2. Apparent EDF mode (preempting) still has NONE of my tasks saying "High Priority" 3. Preempted tasks usually have a deadline EARLIER THAN the task that replaced it. 4. "Waiting to run" tasks never get restarted. 5. I don't think my duration correction factor ever got skewed enough (from near-VLAR runtime influence) to force EDF. But I will double check this. 6. I DID have AP running on the CPU on both computers. This might be a factor. Before I volunteer the big system for sacrificial testing, I'll try running it as CUDA-only, with no AP. This will eliminate some obvious questions in order to purify the test environment. As soon as it sees the first "waiting to run", you can tell me how to torture the computer in any way you like. Then we will at least know it is not related to AP on the same system. I'm ready to retire it and take it to storage, but if it can help out with the problem, let's do it. Bob |
18)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 903640)
Posted 4 Jun 2009 by ![]() Post:
Perhaps it is a bad GPU. I assumed it was overloaded with resident "wait to run" tasks, maybe it is actually bad memory. I will remove that card and try some test runs without it. Bob |
19)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 903630)
Posted 4 Jun 2009 by ![]() Post: From Fred W: My system with two single-core CUDA cards (two GTX285) was creating "wait to run" tasks at a rate comparable, card per card, with my 6xGTX295 system. Fred's observation is alarming. Since the preempting is also happening on my single-core GPU system, it goes back to my question: Why is the preempting happening in the first place? In other words, with a short cache (1.6 days), and plenty of time for all tasks to complete before deadline, why does a WU go EDF in the first place? Is the WU born and flagged that way before I get it? I thought EDF was a calculated state based on the immediate context within the local host? Bob |
20)
Message boards :
Number crunching :
BOINC v6.6.31 available
(Message 903612)
Posted 4 Jun 2009 by ![]() Post:
The VLARs don't seem to be a problem. The autokill has been working OK, I think. ... Yes, that was after the upgrade. Still, it does not make sense that tasks with later deadlines are doing the preempting, and the QX9770 system was only running a cache of 1.6 days. I have not kept up on the pre-empt situation as discussed in the forums here. It has been surprising how much it happens on my systems since upgrades above BOINC 6.6.21. It finally got bad enough that both systems are now detached and turned off... out of necessity - they just don't work anymore with today's configurations of software and WUs. </stderr_txt> Last night I reset the system before going to bed. Unfortunately, the big system (6xGTX295) got so many preempts during my 6 hours of sleep, well, it died.
Thanks for the input, John. I'm guessing the nature of the 6xGTX295 exaggerates such problems, and the end result (failure) arrives sooner. That's why I tried building the system for SETI with only two GPU. The problems are more subtle, yet it was also creating 3 or 4 "waiting to run"'s per day for the past couple days. Question: What is the decision process that makes BOINC preempt in this situation? Is this a bug, or is there some logic to it? Bob |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.