Longer MB tasks are here

Author	Message
Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920679 - Posted: 23 Jul 2009, 14:49:59 UTC - in response to Message 920607. No actually it is a step Forward. It reduces the load without making everyone give up Optimized Apps and does more Science that is Backwards compatible. I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then. Jord et al Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta. It can be found here Windows Seti@Home apps I am also aware that there is an issue with the basic stock app that cause some unpredicatable things to happen with a VLAR. As it was coded (ported) mostly by Nvidia it makes it tough to pin down exactly what is happening. The Lunatics crew is working on defining that needs to be done. The intent is to provide feedback when that is complete. I for one will be working to insure that Eric logs into Lunatics as it is announced. Please consider a Donation to the Seti Project. ID: 920679 ·

James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54	Message 920680 - Posted: 23 Jul 2009, 14:50:09 UTC I have a mix on my Old P4. Im running luanatics unified ver.02 . The new estimated times for the longer ones are about 7.5 hours. My first one is 47% done and shows 3.5 hours running and 3.5 hours to go. The old wu.s took any where from 3 to 4,5 hours to run. So I dont think that is to bad. Glad im running op apps though on the p4 would be running 14 hours on the new work. My mac hasnt got any new work units yet. so i dont know what will happen there. [/quote] Old James ID: 920680 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920682 - Posted: 23 Jul 2009, 14:57:03 UTC - in response to Message 920660. Zen What this sound like is your preferences are not set to write the checkpoint. Because of that when Boinc is shutdown for whatever reason the work that was incomplete is gone. This should be an easy fix in that you can go to Computing preferences and insure there is a value set in the "Write to disk at most every = 60 seconds" then at most you would only lose 60 seconds. After that if is still continues then there would be hardware things to look at. Currently, anyone running an Optimized Application will continue to work and "should" cause no ill effects (errors). If you have a larger number of workunit errors report here in Number Crunching. Regards I don't know if anyone else has experienced a problem with the new work units or not, but I have. I got a short unit with the other longer MB files I downloaded this morning. It was about 25% completed when I shut BOINC down temporarily. When I restarted BOINC the task started again, but from 0 percent complete. From my perspective this is a major flaw with the new work units. I stop and restart BOINC on all of my computers from time to time, not to mention power interruptions and restarting the computer itself. If I'm going to lose work in progress each time, it becomes counter productive. Last night during a storm I lost power to my computers five different times. If I had been running the longer work units and they zeroed out when stopped, I would have lost 20 or more hours of actual computing time. In the past when stopping BOINC or my computer I have lost a few seconds of computing time on work units in progress. I don't mind running longer work units, I do mind losing work in progress. Please consider a Donation to the Seti Project. ID: 920682 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 920687 - Posted: 23 Jul 2009, 15:11:50 UTC - in response to Message 920682. Last modified: 23 Jul 2009, 15:14:09 UTC ...then at most you would only lose 60 seconds... Actually Al, I think there was a recent change in Boinc that multiplies this by number of cores (possibly +GPUs?), to space out write frequency on many core machines to obey overall write interval, instead of on a per app basis. So that would be 60secs X n per running application. n being some constant figure derived from detected #cpus (or possibly total compute device count). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 920687 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920707 - Posted: 23 Jul 2009, 16:39:49 UTC - in response to Message 920687. ...then at most you would only lose 60 seconds... Actually Al, I think there was a recent change in Boinc that multiplies this by number of cores (possibly +GPUs?), to space out write frequency on many core machines to obey overall write interval, instead of on a per app basis. So that would be 60secs X n per running application. n being some constant figure derived from detected #cpus (or possibly total compute device count). Jason as You mention it, I do recall something passing through my email. While I do not recall the final results, it will vary with which version of Boinc is installed by the user and how many CPU's. No checkpoint places work at risk during power hits. One of the tests in Boinc Alpha is to shutdown Boinc while WU's are being processed and uploads/Downloads to insure nothing is lost. Please consider a Donation to the Seti Project. ID: 920707 ·

[B^S] madmac Volunteer tester Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0	Message 920708 - Posted: 23 Jul 2009, 16:46:13 UTC If we are to get more credit for these longer WUs mine took 4.5 hrs how can we check when pending credit has gone and the task one is not running? ID: 920708 ·

Wandering Willie Volunteer tester Send message Joined: 19 Aug 99 Posts: 136 Credit: 2,127,073 RAC: 0	Message 920713 - Posted: 23 Jul 2009, 16:55:51 UTC Rough guide by my results on Beta are winged 6.08 - 6.08 138+ Rough guide by my results on Beta are winged 6.08 - 6.03 113+ Michael ID: 920713 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920721 - Posted: 23 Jul 2009, 17:08:16 UTC - in response to Message 920708. Patience.... If we are to get more credit for these longer WUs mine took 4.5 hrs how can we check when pending credit has gone and the task one is not running? The basis of my phone conversation with Eric and the request to write what I wrote before making the thread "sticky." I am aware of other "adjustments" happening while defining the "whole load" the addtion of Users and Computers have caused. So while the major hurdle of breaking the Upload/Download cycle "appears" to be solved. Cleanup still has to happen. Some of the Horror Story, "may" show when it is time for Matt to do the Tech News. Please consider a Donation to the Seti Project. ID: 920721 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 920732 - Posted: 23 Jul 2009, 17:41:17 UTC - in response to Message 920607. Last modified: 23 Jul 2009, 17:45:21 UTC I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then. Well, if this becomes a big problem, something along the lines of if (task-error = VLAR-kill) => set host_daily_quota = -1 added to scheduling-server should work nicely to stop the run-away hosts... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." ID: 920732 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20304 Credit: 7,508,002 RAC: 20	Message 920734 - Posted: 23 Jul 2009, 18:12:06 UTC - in response to Message 920721. ... "adjustments" happening while [re]defining the "whole load" the addtion of Users and Computers have caused. ... Does this mean that Berkeley have abandoned (or delayed) the hope of s@h crunching through the Arecibo WUs in near "real time"? Looking forward to Matt's summary of what got tweaked and in what way! Happy crunchin', Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 920734 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20304 Credit: 7,508,002 RAC: 20	Message 920735 - Posted: 23 Jul 2009, 18:15:47 UTC - in response to Message 920732. if (task-error = VLAR-kill) => set host_daily_quota = -1 added to scheduling-server should work nicely to stop the run-away hosts... That's already taken care of in the way aborted WUs decrease the number new WUs that can be uploaded. However, there may be good cause to rejig the arithmetic for restricting the number of new WUs allowed per good WU return result. Best of all would be for the CUDA VLAR problem to be fixed, or at least for the Boinc servers to not send out VLARs for processing by CUDA in the first place. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 920735 ·

Fred W Volunteer tester Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0	Message 920736 - Posted: 23 Jul 2009, 18:20:18 UTC - in response to Message 920735. Best of all would be for the CUDA VLAR problem to be fixed, or at least for the Boinc servers to not send out VLARs for processing by CUDA in the first place. This may be whistling in the wind, but we have yet to see what effect CUDAv2.3 has on VLAR crunch times... (Note: To me the "glass is always half-full" :) F. ID: 920736 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920744 - Posted: 23 Jul 2009, 18:43:05 UTC - in response to Message 920734. ... "adjustments" happening while [re]defining the "whole load" the addtion of Users and Computers have caused. ... Does this mean that Berkeley have abandoned (or delayed) the hope of s@h crunching through the Arecibo WUs in near "real time"? Looking forward to Matt's summary of what got tweaked and in what way! Happy crunchin', Regards, Martin I did not ask about progress of getting Arecibo back to active (fresh work). As to how much the Live Data Rate would be? The most I "recall" seeing was 5 disk images loaded from the same day so in that case it would be pulling 250gigabytes of data. Eric did state part of the problem here http://setiathome.berkeley.edu/forum_thread.php?id=54707&nowrap=true#920335 Please consider a Donation to the Seti Project. ID: 920744 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 920753 - Posted: 23 Jul 2009, 19:34:37 UTC I'm still thinking the quotas should be -1 for a bad task, and +2 for a good task instead of *2. That way a host as to prove themselves more reliable over a longer period of time to get their quota back up to 100/cpu/day instead of killing 50 tasks and returning one good one and being back at 100. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 920753 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 920770 - Posted: 23 Jul 2009, 20:36:21 UTC What I am seeing in Seti Beta Run times are hard as both machines are AMD, 1 Opti 175 and the other is an X2-6000 Stock AR ------ Claimed Credit 0.357023 - 121.54 0.380619 - 113.98 0.408166 - 81.70 0.415195 - 80.31 0.467143 - 75.08 0.693202 - 59.24 1.233490 - 29.80 1.511305 - 19.11 1.472896 - 19.32 8.152810 - 24.97 Please consider a Donation to the Seti Project. ID: 920770 ·

SmartWombat Send message Joined: 9 Jan 04 Posts: 64 Credit: 6,577,011 RAC: 0	Message 920778 - Posted: 23 Jul 2009, 20:55:42 UTC - in response to Message 920679. Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta. I don't want to push away the VLAR units to other people, but I do want the faster calculation of the optimised apps. I look forward to an installer that can let me have the optimised apps but without the VLAR killer. I am quite happy to process the VLAR WU but want to do it faster ! PAul [IMG][/IMG] ID: 920778 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 920812 - Posted: 23 Jul 2009, 22:28:00 UTC - in response to Message 920778. Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta. I don't want to push away the VLAR units to other people, but I do want the faster calculation of the optimised apps. I look forward to an installer that can let me have the optimised apps but without the VLAR killer. I am quite happy to process the VLAR WU but want to do it faster ! It will slow down your host. If you want to process VLAR with GPU - download corresponding V12 app (it should be available at Lunatics), w/o VLAR-kill mod, and update your app_info for it. And regarding VLAR-kill and AR - yes, increase in sensivity should not change task AR value. So same app can be used, no upgrade required. But I highly recommend to update to CUDA 2.3 runtime DLLs. nVidia greatly increased CUFFT library performance and FFT takes pretty big share of processing time in MB. ID: 920812 ·

Marius Volunteer tester Send message Joined: 11 Mar 00 Posts: 12 Credit: 16,655,085 RAC: 0	Message 920816 - Posted: 23 Jul 2009, 22:36:04 UTC - in response to Message 920732. Last modified: 23 Jul 2009, 22:38:19 UTC I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then. Well, if this becomes a big problem, something along the lines of if (task-error = VLAR-kill) => set host_daily_quota = -1 added to scheduling-server should work nicely to stop the run-away hosts... Thats one way to look at it ;) IMO the server just needs to give the vlar's to cpu host instead of gpu, but server needs to be programmed and thus more work and i can imagine that hasn't much priority at this moment. In the mean time just use a additional tool to move vlar's to the cpu and avoid vlar-kill. ID: 920816 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 920826 - Posted: 23 Jul 2009, 22:55:37 UTC - in response to Message 920816. Indeed, VLAR-kill is only "safety fuse" now. Everyone who really cares about his host productivity should use CPU<->GPU rebranding tool. ID: 920826 ·

Zen Send message Joined: 25 May 99 Posts: 9 Credit: 3,659,629 RAC: 0	Message 920842 - Posted: 23 Jul 2009, 23:28:46 UTC - in response to Message 920682. [quote]Zen What this sound like is your preferences are not set to write the checkpoint. Because of that when Boinc is shutdown for whatever reason the work that was incomplete is gone. This should be an easy fix in that you can go to Computing preferences and insure there is a value set in the "Write to disk at most every = 60 seconds" then at most you would only lose 60 seconds. After that if is still continues then there would be hardware things to look at. [quote] My preferences have been set at "Write to disk at most every 60 seconds". I think it may be the default when BOINC is installed. I haven't had a problem with losing work previously. The computer I ran that particular work unit on is one (the first one) I built about 9 months ago. I'm not a techno-genius, but I built it under the supervision of one. It's a moderate AMD dual core that is slightly over-clocked. I don't have a CUDA graphics card, the OS is quite stable and all other hardware is functioning within normal parameters. I virus scan daily with updated dat files. I defrag and scan my hard drive for errors on the first of every month. That being said, if you guys think I have a hardware incompatibility issue, I'd be grateful if you could offer some suggestions on where to start looking. (Please keep in mind I may have to have my techno genius translate, because quite frankly I don't always understand the finer points on these boards.) Oh, I have been running one of the optimized applications from the Lunatics site on this machine since I brought it online for Seti. I will confess that I'm not exactly sure what the new "Unified Installer" from Lunatics is supposed to help. Perhaps one of you would be kind enough to explain it to me? Also, I still run BOINC 6.4.7 because the newer versions lock up my computers. I think the newest versions of BOINC were written assuming a faster connection speed than I am capable of on dial up. (Even though faster speeds sound nice, living on the farm is very, very nice.) This moring is the first time I've lost all progress on a work unit when shutting down BOINC. The work unit that was running on the other core at the time lost a few seconds while the one in question had been running at least 30 minutes. This leads me to believe the problem is some kind of compatibility issue with the new work units or perhaps something was changed when they were created. What ever it is, I would appreciate any and all help in finding a fix. Regards and Gratitude Zen ID: 920842 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.