Message boards :
Number crunching :
Open Beta test: SoG for NVidia, Lunatics v0.45 - Beta6 (RC again)
Message board moderation
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I suspect Richard would be would quite happy if someone else were to release a Lunatics SoG-specific update installer. . . I have created a few ghosts myself, and I am sure not all were due to my mistakes. But this process, while fiddly, does work to recover them. I received a detailed step-by-step approach to recover lost tasks on SETI hosts from another very helpful SETIzen. It works but I encountered several snags in trying to implement it. Such as when uploading the completed task an immediate update would foil the attempt to recover ghosted WUs. So I have added a few extra steps. 1. Suspend network and set No New Tasks. 2. Wait until you have a single task that has finished and is ready to upload. 3. With network suspended execute a project update to start the timer until the next update, this will give you a 5 minute window. 4. After update sequence finished resume network activity and allow task to upload then suspend network again before an update can start. 5. Suspend processing. 6. Make a backup copy of the BOINC data directory.* (see NOTE, below) 7. Resume network activity, then "update" the project to report the waiting task (although, with NNT set, that will probably happen automatically). 8. Suspend network activity again then exit BOINC completely. 9. Restore the BOINC directory backup.* (see NOTE, below) 10. Restart BOINC and processing should have resumed, if not start it. 11. Increase your work buffer to accommodate your "lost" tasks. (BOINC won't send any tasks if it thinks your buffer is already "full".) 12. Check that the 5 minute window has passed since the last scheduler request in Step 4 completed, set "Allow new tasks", then resume network activity. 13. If BOINC doesn't automatically initiate a scheduler request, hit "Update" to once again report the same waiting task. This "should" result in one or more "lost" tasks being re-sent. [I find that regardless of how many ghosted tasks are assigned to you the system will only resend about 20 at a time]. NOTE: As was also suggested in that thread, it is probably sufficient to simply backup and restore client_state.xml (and perhaps client_state_prev.xml), but I tend to like to be sure I'm keeping everything in sync and so I choose to do the entire BOINC data directory. It's just as simple, really. Happy Crunching Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
With v0.45 installed, is there a way to download a newer app used on Main, then edit the app_info.xml from v0.45 accordingly? . . Very, very carefully. Stephen . |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It would make good sense if the workunit deadlines were cut if the aim is to keep the database small. . . And a shorter time limit does NOT preclude slower machines from taking part, it merely requires them to contact the servers more often. And while a 1 week limit may seem a bit short for some it can certainly be less than the 4 to 6 weeks currently used, maybe half would be a good move. A healthy and active cruncher, no matter how slow, rarely needs more than 2 weeks to process a WU, and the unhealthy ones would not hold a task in limbo for more than 3 weeks (and even that is too long). . . My beliefs anyway ... Stephen :) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You have to do EVERYTHING right.... cross your fingers ...... and hold your tongue - just so to use the method you describe. In my case of 600 "ghosted" tasks the process would take forever at 20 tasks retrieved every sequence. Also when you are constantly at your server defined limits because you process work so quickly you would have to run your onboard tasks down to nil after a NNT to even begin the sequence. I found out the hard way that what you would think is a simple task of restoring an old backup of client_state.xml has its own dangers. I lost 5 years of work history at Einstein because the backup restoration caused a change in cross-project ID and a new computer Host ID to be created at Einstein because BOINC detected a unexpected change in <rpc_seqno>. This also caused a "phantom" Host ID at Einstein and loss of 13 million credits because they are attached to the old Host ID. If you try to merge the old and new hosts BOINC always chooses the newest Host ID. That fiasco is why I will never edit app_info manually anymore. I'll just have to wait for Richard to crank out a new Lunatics installer with r3584 in it or whatever Raistmer is working on his latest revision. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Consider ARM6 phone crunching only when charging. http://setiathome.berkeley.edu/host_app_versions.php?hostid=7435236 Среднее Ð²Ñ€ÐµÐ¼Ñ Ð¾Ð±Ñ€Ð°Ð±Ð¾Ñ‚ÐºÐ¸ 25.85 days That is, ~4 weeks. No need to touch deadlines. If "97%" complete "just in one-two days" then fine, those tasks will be validated and deleted from BOINC database faster. They even not need to be kept week, just those 2 days. But other still have chance to participate. P.S. What need to be touched instead is quota system that fails to catch hyper-fast broken GPU hosts. That can produce thousands of broken results per day - amount that slow crunchers will not accumulate through whole year! SETI apps news We're not gonna fight them. We're gonna transcend them. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
Consider ARM6 phone crunching only when charging. I cannot see the benefit to the project of catering for such low processing capability devices. Do they generate publicity or revenue? Even my Raspberry Pi, with a RAC of 87.46 (!) still has a turnaround of 2.74 days. That is about 1/500 of what my PC produces. The only reason I keep it connected is because it makes me happy that way - a bit like keeping a hamster with a wheel. Is it right to keep the database inflated just to enable folks to keep their pet devices warm? What is the real benefit here? I agree on your last point. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The only reason I keep it connected is because it makes me happy that way You answered on your question. Is it right to keep the database inflated just to enable folks to keep their pet devices warm? What is the real benefit here? "Inflated database" it's quite relative term. As long as it works OK it's not "inflated". And as I already said, the influence of slow hosts on that perceived "inflation" over-estimated. So yes, it is right to provide ability to help SETI on everything that can compute. That's right spirit. And spirit much more valuable than revenue. Should be even in "free market" world :P SETI apps news We're not gonna fight them. We're gonna transcend them. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
As soon as we adopt the intentions of the technology vendors at present, then more than 95% of our hardware compute capacity is immediately defunct. It would be nice if we could all have shiny new Kaby Lakes and GTX 1080s, but it's not realistically going to happen... especially when the gains represent poor return for the money. They need to do a lot better before we can say goodbye to stuff that works. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
Nevertheless, time moves on. Looking back over the last few years SETI compute power increases by 5-10% each year. 1080s are soon to be old hat with the 1080Ti arriving soon, AMD's RyZen will hopefully perk up the CPU market pushing Intel towards adopting 10 nm quicker and I'm reading that we're hopeful for even better SETI apps in the near future. Older devices will continue to be retired and all this will all impact the average turnaround time, making analysis ever closer from near time to real time. The current deadlines for completion will continue to grow to be further away from the time taken for the bulk of canonical results to be achieved. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The current deadlines for completion will continue to grow to be further away from the time taken for the bulk of canonical results to be achieved. 1) Deadlines will not grow. They will remain the same. 2) So what? Why this bother? Did you ever observe new version roll up where BOINC mis-predict estimated time to completion? With shorter deadline % of killed tasks will be even higher. So, all this just to make credit accounted faster for those over-competitive ones who can't just leave host as it is and spend time on optimization instead of credit rise watching? It will accounted eventually - that's all what matter. Regarding real time processing: we do only pre-process. Final processing is Nebula. And it requires data movement into Atlas. This SETI search isn't real-time one by design. Its sensitivity comes from data accumulation over few observations spanned by years. Real-time processing just self-imposed goal, not really required for this kind of search. Would be good to complete search faster, but cutting some processing power will not make it faster, it will make it slower. Particular result mean validation time isn't matter. There are millions of such results. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
The current deadlines for completion will continue to grow to be further away from the time taken for the bulk of canonical results to be achieved. 1) As average turnaround shrinks and deadlines remain the same, the time between them grows. 2) Why bother optimising the process? To decrease entries from slow to verify workunits. From my understanding this would enable a greater cache size to be allowed without impacting the database size. If older, unverified workunits sitting in the database that would otherwise be reduced by shortening the deadlines have little impact on the backend system then I have little to argue about. But it would be interesting to know what proportion that would be. I could do that by looking at my own cache I guess. I'll go and make a coffee and see if I can make some figures. On your last point though, the potential computational benefit to be gained I agree, is minimal. It would arise from those very high end machines that run dry during an outage. Looking at Petri's record breaker, his turnaround average is 0.16 days. A little under 4 hours. That means he's probably running dry half way into an 8 hour outage, about once a week. Or around 2.5 % of the time. Everyone else will be less than this. However, it could be argued that enabling a larger cache to ensure that 2.5 % performance gain from Petri, at the expense of our slower tail, would be better for the computational output of the project - as that 2.5% could be substantially larger than the tail output. But that would put us at odds with an egalitarian ideal. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This imply assumption that maintenance time directly depends on database size. And that database size directly (or largely enough) depends on mean deadline time. Second assumption looks very unjustified for the reasons listed earlier. First assumption requires some proofs too. And third assumption is that if deadline shrinkage will take place it will automatically lead to number of tasks per host limit rise. Hardly... EDIT: also, there are means to increase number of cached tasks to survive outage (if operator really interested in such). There will be no technical means to extend deadline in case of its shrinkage though. So, still no-no. EDIT2: From another thread:
Very true. And imposes much stronger load to database than all whole slow hosts together. Just because GPU can produce fault result in couple of SECONDS. And sending let say 1 valid result (~30min for average modern device) per 10 such invalids will sustain steady flow of results - how many per day? That's real danger, not deadlines... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
This imply assumption that maintenance time directly depends on database size. Not sure all those implications directly follow but nevertheless I've taken a look at my older pendings and can prove I am wrong... 80 % of the pendings are less than 2 weeks old. 12 % are older than 4 weeks and 6 % are older than 6 weeks with the oldest being about 8 weeks old. Current deadlines appear to be about 7.5 weeks although some appear to be 3 weeks and I'm not sure why that is. All my pendings occur just beyond 1, 7.5 week deadline, suggesting that workunits requiring more than one resend are extremely rare, and when resends happen they turnaround on average in a couple of days. According to these figures anyway, trying to reclaim space by reducing deadlines from 7.5 to 4 weeks will gain only about 12 % space. That would be just 90 workunits based on my pendings and probably not worth it. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Adding: With v8, some certain subtle precision refinements became effective, designed to converge the dominant CPU and proper GPU applications cross platform, along with easing prior discrepancies that affected 64 bit CPU builds Vs x86 original. While the desired effect was achieved, yielding better than 5% inconclusive (i.e. resend) to pending ratios on the large scale, an unintended consequence was that 'rogue' machines and/or applications became more visible. That's not necessarily a bad thing by any means, just unexpected. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I've no idea who this Dr Grey is, but what I do know is that I would back Jason and Raistmer aganst his technical ability any day. Have a look at his team name. I've got no qualms about anyone questioning my, raistmer's or the probably 30+ other people's work. It's been a long road and a long way to go. Only thing I take issue with is the fairly recent drives to abandon working hardware & applications in the name of progress. That's simply a deadend approach, because it discards the most time-proven parts for unknowns. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Dr Grey Send message Joined: 27 May 99 Posts: 154 Credit: 104,147,344 RAC: 21 |
I've no idea who this Dr Grey is, but what I do know is that I would back Jason and Raistmer aganst his technical ability any day. Have a look at his team name. Thanks for the vote of confidence for my technical ability. For the record I have none, but I don't see any harm in challenging the status quo and offering up ideas that are worth discussing. Do you? |
Mike Send message Joined: 17 Feb 01 Posts: 34362 Credit: 79,922,639 RAC: 80 |
First of all it is Richards decision when he will release a new Installer. From my testing point of view and being part of the installer crew the work is done. But we shouldn`t forget this time of the year Richard might have taken a few days off which is well deserved IMHO. The whole Lunatics team is working on spare free time and a lot of work has been finnished so far. So another few days/weeks doesn`t really matter. Happy New Year @all. With each crime and every kindness we birth our future. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
First of all it is Richards decision when he will release a new Installer. Richard has always been very generous with donating his time and services to the project, and does a great job. No worries. He shall get to it when he gets to it. Happy New Year to all from the kitty crew. Meow! "Time is simply the mechanism that keeps everything from happening all at once." |
AMDave Send message Joined: 9 Mar 01 Posts: 234 Credit: 11,671,730 RAC: 0 |
How about copying the requisite files from the Seti Beta directory to the Seti Main directory?  I think those files areWith v0.45 installed, is there a way to download a newer app used on Main, then edit the app_info.xml from v0.45 accordingly?
♦   MultiBeam_Kernels_r3584.cl ♦   MultiBeam_Kernels_r3584.cl_GeForceGTX950.bin_V7_SoG_35900 ♦   r3584_IntelRCoreTMi76700KCPU400GHz_x86.wisdom ♦  setiathome_8.22_windows_intelx86__opencl_nvidia_SoG.exe |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
How about copying the requisite files from the Seti Beta directory to the Seti Main directory?  I think those files areWith v0.45 installed, is there a way to download a newer app used on Main, then edit the app_info.xml from v0.45 accordingly? Two of those files MultiBeam_Kernels_r3584.cl_GeForceGTX950.bin_V7_SoG_35900 r3584_IntelRCoreTMi76700KCPU400GHz_x86.wisdom Are created when you run the app. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.