Longer MB tasks are here

Message boards : Number crunching : Longer MB tasks are here
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920679 - Posted: 23 Jul 2009, 14:49:59 UTC - in response to Message 920607.  

No actually it is a step Forward. It reduces the load without making everyone give up Optimized Apps and does more Science that is Backwards compatible.

I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then.


Jord et al

Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta.
It can be found here Windows Seti@Home apps
I am also aware that there is an issue with the basic stock app that cause some unpredicatable things to happen with a VLAR. As it was coded (ported) mostly by Nvidia it makes it tough to pin down exactly what is happening. The Lunatics crew is working on defining that needs to be done. The intent is to provide feedback when that is complete. I for one will be working to insure that Eric logs into Lunatics as it is announced.



Please consider a Donation to the Seti Project.

ID: 920679 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 920680 - Posted: 23 Jul 2009, 14:50:09 UTC

I have a mix on my Old P4. Im running luanatics unified ver.02 . The new estimated times for the longer ones are about 7.5 hours. My first one is 47% done and shows 3.5 hours running and 3.5 hours to go.
The old wu.s took any where from 3 to 4,5 hours to run.
So I dont think that is to bad. Glad im running op apps though on the p4 would be running 14 hours on the new work.
My mac hasnt got any new work units yet. so i dont know what will happen there.
[/quote]

Old James
ID: 920680 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920682 - Posted: 23 Jul 2009, 14:57:03 UTC - in response to Message 920660.  

Zen

What this sound like is your preferences are not set to write the checkpoint. Because of that when Boinc is shutdown for whatever reason the work that was incomplete is gone. This should be an easy fix in that you can go to Computing preferences and insure there is a value set in the "Write to disk at most every = 60 seconds" then at most you would only lose 60 seconds. After that if is still continues then there would be hardware things to look at.



Currently, anyone running an Optimized Application will continue to work and "should" cause no ill effects (errors). If you have a larger number of workunit errors report here in Number Crunching.

Regards


I don't know if anyone else has experienced a problem with the new work units or not, but I have. I got a short unit with the other longer MB files I downloaded this morning. It was about 25% completed when I shut BOINC down temporarily. When I restarted BOINC the task started again, but from 0 percent complete.

From my perspective this is a major flaw with the new work units. I stop and restart BOINC on all of my computers from time to time, not to mention power interruptions and restarting the computer itself. If I'm going to lose work in progress each time, it becomes counter productive. Last night during a storm I lost power to my computers five different times. If I had been running the longer work units and they zeroed out when stopped, I would have lost 20 or more hours of actual computing time.

In the past when stopping BOINC or my computer I have lost a few seconds of computing time on work units in progress. I don't mind running longer work units, I do mind losing work in progress.


Please consider a Donation to the Seti Project.

ID: 920682 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 920687 - Posted: 23 Jul 2009, 15:11:50 UTC - in response to Message 920682.  
Last modified: 23 Jul 2009, 15:14:09 UTC

...then at most you would only lose 60 seconds...
Actually Al, I think there was a recent change in Boinc that multiplies this by number of cores (possibly +GPUs?), to space out write frequency on many core machines to obey overall write interval, instead of on a per app basis. So that would be 60secs X n per running application. n being some constant figure derived from detected #cpus (or possibly total compute device count).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 920687 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920707 - Posted: 23 Jul 2009, 16:39:49 UTC - in response to Message 920687.  

...then at most you would only lose 60 seconds...
Actually Al, I think there was a recent change in Boinc that multiplies this by number of cores (possibly +GPUs?), to space out write frequency on many core machines to obey overall write interval, instead of on a per app basis. So that would be 60secs X n per running application. n being some constant figure derived from detected #cpus (or possibly total compute device count).


Jason as You mention it, I do recall something passing through my email. While I do not recall the final results, it will vary with which version of Boinc is installed by the user and how many CPU's.

No checkpoint places work at risk during power hits.

One of the tests in Boinc Alpha is to shutdown Boinc while WU's are being processed and uploads/Downloads to insure nothing is lost.


Please consider a Donation to the Seti Project.

ID: 920707 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 920708 - Posted: 23 Jul 2009, 16:46:13 UTC

If we are to get more credit for these longer WUs mine took 4.5 hrs how can we check when pending credit has gone and the task one is not running?
ID: 920708 · Report as offensive
Wandering Willie
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 136
Credit: 2,127,073
RAC: 0
United Kingdom
Message 920713 - Posted: 23 Jul 2009, 16:55:51 UTC

Rough guide by my results on Beta are winged 6.08 - 6.08 138+
Rough guide by my results on Beta are winged 6.08 - 6.03 113+

Michael
ID: 920713 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920721 - Posted: 23 Jul 2009, 17:08:16 UTC - in response to Message 920708.  

Patience....

If we are to get more credit for these longer WUs mine took 4.5 hrs how can we check when pending credit has gone and the task one is not running?


The basis of my phone conversation with Eric and the request to write what I wrote before making the thread "sticky."

I am aware of other "adjustments" happening while defining the "whole load" the addtion of Users and Computers have caused. So while the major hurdle of breaking the Upload/Download cycle "appears" to be solved. Cleanup still has to happen.

Some of the Horror Story, "may" show when it is time for Matt to do the Tech News.



Please consider a Donation to the Seti Project.

ID: 920721 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 920732 - Posted: 23 Jul 2009, 17:41:17 UTC - in response to Message 920607.  
Last modified: 23 Jul 2009, 17:45:21 UTC

I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then.

Well, if this becomes a big problem, something along the lines of

if (task-error = VLAR-kill) => set host_daily_quota = -1

added to scheduling-server should work nicely to stop the run-away hosts...
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 920732 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20304
Credit: 7,508,002
RAC: 20
United Kingdom
Message 920734 - Posted: 23 Jul 2009, 18:12:06 UTC - in response to Message 920721.  

... "adjustments" happening while [re]defining the "whole load" the addtion of Users and Computers have caused. ...

Does this mean that Berkeley have abandoned (or delayed) the hope of s@h crunching through the Arecibo WUs in near "real time"?

Looking forward to Matt's summary of what got tweaked and in what way!

Happy crunchin',

Regards,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 920734 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20304
Credit: 7,508,002
RAC: 20
United Kingdom
Message 920735 - Posted: 23 Jul 2009, 18:15:47 UTC - in response to Message 920732.  

if (task-error = VLAR-kill) => set host_daily_quota = -1

added to scheduling-server should work nicely to stop the run-away hosts...

That's already taken care of in the way aborted WUs decrease the number new WUs that can be uploaded.

However, there may be good cause to rejig the arithmetic for restricting the number of new WUs allowed per good WU return result.

Best of all would be for the CUDA VLAR problem to be fixed, or at least for the Boinc servers to not send out VLARs for processing by CUDA in the first place.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 920735 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 920736 - Posted: 23 Jul 2009, 18:20:18 UTC - in response to Message 920735.  

Best of all would be for the CUDA VLAR problem to be fixed, or at least for the Boinc servers to not send out VLARs for processing by CUDA in the first place.

This may be whistling in the wind, but we have yet to see what effect CUDAv2.3 has on VLAR crunch times...

(Note: To me the "glass is always half-full" :)

F.
ID: 920736 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920744 - Posted: 23 Jul 2009, 18:43:05 UTC - in response to Message 920734.  

... "adjustments" happening while [re]defining the "whole load" the addtion of Users and Computers have caused. ...

Does this mean that Berkeley have abandoned (or delayed) the hope of s@h crunching through the Arecibo WUs in near "real time"?

Looking forward to Matt's summary of what got tweaked and in what way!

Happy crunchin',

Regards,
Martin


I did not ask about progress of getting Arecibo back to active (fresh work). As to how much the Live Data Rate would be? The most I "recall" seeing was 5 disk images loaded from the same day so in that case it would be pulling 250gigabytes of data.

Eric did state part of the problem here http://setiathome.berkeley.edu/forum_thread.php?id=54707&nowrap=true#920335



Please consider a Donation to the Seti Project.

ID: 920744 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 920753 - Posted: 23 Jul 2009, 19:34:37 UTC

I'm still thinking the quotas should be -1 for a bad task, and +2 for a good task instead of *2. That way a host as to prove themselves more reliable over a longer period of time to get their quota back up to 100/cpu/day instead of killing 50 tasks and returning one good one and being back at 100.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 920753 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 920770 - Posted: 23 Jul 2009, 20:36:21 UTC

What I am seeing in Seti Beta

Run times are hard as both machines are AMD, 1 Opti 175 and the other is an X2-6000

Stock
AR ------ Claimed Credit
0.357023 - 121.54
0.380619 - 113.98
0.408166 - 81.70
0.415195 - 80.31
0.467143 - 75.08
0.693202 - 59.24
1.233490 - 29.80
1.511305 - 19.11
1.472896 - 19.32
8.152810 - 24.97

Please consider a Donation to the Seti Project.

ID: 920770 · Report as offensive
SmartWombat
Avatar

Send message
Joined: 9 Jan 04
Posts: 64
Credit: 6,577,011
RAC: 0
United Kingdom
Message 920778 - Posted: 23 Jul 2009, 20:55:42 UTC - in response to Message 920679.  

Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta.

I don't want to push away the VLAR units to other people, but I do want the faster calculation of the optimised apps.

I look forward to an installer that can let me have the optimised apps but without the VLAR killer.

I am quite happy to process the VLAR WU but want to do it faster !
PAul

[IMG][/IMG]
ID: 920778 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 920812 - Posted: 23 Jul 2009, 22:28:00 UTC - in response to Message 920778.  

Lunatics has produced a Non VlarKill version that I have tested and used in Seti Beta.

I don't want to push away the VLAR units to other people, but I do want the faster calculation of the optimised apps.

I look forward to an installer that can let me have the optimised apps but without the VLAR killer.

I am quite happy to process the VLAR WU but want to do it faster !

It will slow down your host. If you want to process VLAR with GPU - download corresponding V12 app (it should be available at Lunatics), w/o VLAR-kill mod, and update your app_info for it.


And regarding VLAR-kill and AR - yes, increase in sensivity should not change task AR value. So same app can be used, no upgrade required.
But I highly recommend to update to CUDA 2.3 runtime DLLs.
nVidia greatly increased CUFFT library performance and FFT takes pretty big share of processing time in MB.
ID: 920812 · Report as offensive
Marius
Volunteer tester

Send message
Joined: 11 Mar 00
Posts: 12
Credit: 16,655,085
RAC: 0
Netherlands
Message 920816 - Posted: 23 Jul 2009, 22:36:04 UTC - in response to Message 920732.  
Last modified: 23 Jul 2009, 22:38:19 UTC

I hope you're correct on that. How about those that use the VLAR killer for their GPUs? If all these are classed as VLAR, they'll be continuously killing them and downloading more work; no letup in the (down)load then.

Well, if this becomes a big problem, something along the lines of

if (task-error = VLAR-kill) => set host_daily_quota = -1

added to scheduling-server should work nicely to stop the run-away hosts...


Thats one way to look at it ;) IMO the server just needs to give the vlar's to cpu host instead of gpu, but server needs to be programmed and thus more work and i can imagine that hasn't much priority at this moment.

In the mean time just use a additional tool to move vlar's to the cpu and avoid vlar-kill.
ID: 920816 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 920826 - Posted: 23 Jul 2009, 22:55:37 UTC - in response to Message 920816.  

Indeed, VLAR-kill is only "safety fuse" now.
Everyone who really cares about his host productivity should use CPU<->GPU rebranding tool.
ID: 920826 · Report as offensive
Zen

Send message
Joined: 25 May 99
Posts: 9
Credit: 3,659,629
RAC: 0
United States
Message 920842 - Posted: 23 Jul 2009, 23:28:46 UTC - in response to Message 920682.  

[quote]Zen

What this sound like is your preferences are not set to write the checkpoint. Because of that when Boinc is shutdown for whatever reason the work that was incomplete is gone. This should be an easy fix in that you can go to Computing preferences and insure there is a value set in the "Write to disk at most every = 60 seconds" then at most you would only lose 60 seconds. After that if is still continues then there would be hardware things to look at.


[quote]

My preferences have been set at "Write to disk at most every 60 seconds". I think it may be the default when BOINC is installed. I haven't had a problem with losing work previously. The computer I ran that particular work unit on is one (the first one) I built about 9 months ago. I'm not a techno-genius, but I built it under the supervision of one. It's a moderate AMD dual core that is slightly over-clocked. I don't have a CUDA graphics card, the OS is quite stable and all other hardware is functioning within normal parameters. I virus scan daily with updated dat files. I defrag and scan my hard drive for errors on the first of every month. That being said, if you guys think I have a hardware incompatibility issue, I'd be grateful if you could offer some suggestions on where to start looking. (Please keep in mind I may have to have my techno genius translate, because quite frankly I don't always understand the finer points on these boards.)

Oh, I have been running one of the optimized applications from the Lunatics site on this machine since I brought it online for Seti. I will confess that I'm not exactly sure what the new "Unified Installer" from Lunatics is supposed to help. Perhaps one of you would be kind enough to explain it to me? Also, I still run BOINC 6.4.7 because the newer versions lock up my computers. I think the newest versions of BOINC were written assuming a faster connection speed than I am capable of on dial up. (Even though faster speeds sound nice, living on the farm is very, very nice.)

This moring is the first time I've lost all progress on a work unit when shutting down BOINC. The work unit that was running on the other core at the time lost a few seconds while the one in question had been running at least 30 minutes. This leads me to believe the problem is some kind of compatibility issue with the new work units or perhaps something was changed when they were created. What ever it is, I would appreciate any and all help in finding a fix.

Regards and Gratitude
Zen


ID: 920842 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Longer MB tasks are here


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.