Message boards :
Number crunching :
Upgrading Cuda App - BOINC Trashing Units
Message board moderation
Author | Message |
---|---|
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Just tried to install my GTX470. As usual with a change of app, BOINC immediately trashed all the units on the machine including those completed units waiting to upload. It does this because it sees that the units have been assigned to a particular version of the app and if it can't find that version, in this case V6.09, it deletes the WU files. It even deleted the GPU units I had moved to the CPU prior to fitting the new card. The app_info file is good, I triple checked that before I started BOINC. Is there any work around for this ?? At least this time I remembered to make a backup of the BOINC and BOINC_DATA folders so that everything was there when I rolled back to the old cards :-S T.A. |
Tim Norton Send message Joined: 2 Jun 99 Posts: 835 Credit: 33,540,164 RAC: 0 |
I have also been caught out with this and lost a few units - fortunately only ones in my cache not completed ones :) it would be nice if Boinc actually asked you what you wanted to do rather than delete everything but that would mean being user friendly rather than functional i have not found a work around other than backup your directories but finish all the work you want to do and abort the rest before apply new hardware or an optimised app sometimes you forget and leave a poor wingman hanging would be good if you could mark a wu unit as "opps sorry" so it could be sent out again quickly not a real help but my two penny worth :) Tim |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
I always suspend boinc when updating... Official Abuser of Boinc Buttons... And no good credit hound! |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
I always suspend boinc when updating... I did all the right things, suspended BOINC and Networking but the problem occurs at the initial startup, I think when it's reading the client_state file, it compares what's in the client state file with the app_info. If the app version for a WU listed in the client_state doesn't agree with the version in the app_info BOINC says "bye, bye" to the WU, even it's been completed. T.A. |
TheFreshPrince a.k.a. BlueTooth76 Send message Joined: 4 Jun 99 Posts: 210 Credit: 10,315,944 RAC: 0 |
Before you make any changes: - shutdown Boinc - backup both Boinc folders - disconnect from the internet - try if a change works - if it works; connect to the internet again - if it doesnt work; delete both Boinc folders, restore the ones from the backup and try again :) Rig name: "x6Crunchy" OS: Win 7 x64 MB: Asus M4N98TD EVO CPU: AMD X6 1055T 2.8(1,2v) GPU: 2x Asus GTX560ti Member of: Dutch Power Cows |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
Guys, pardon this old non-Fermi character butting in but I think I might see a problem or two. He is installing a new 470 card in place of "older" (probably non-Fremi) cards. He has an app_info with 6.09 but probably doesn't have an entry for 6.10 Fermi so of course it will trash all his 6.09s. Try reading up on what needs to be done here http://setiathome.berkeley.edu/forum_thread.php?id=60079 PROUD MEMBER OF Team Starfire World BOINC |
Questor Send message Joined: 3 Sep 04 Posts: 471 Credit: 230,506,401 RAC: 157 |
Guys, pardon this old non-Fermi character butting in but I think I might see a problem or two. As long as the new fermi app is referenced in app_info.xml as 609 then it wont matter - the workunit details will match with existing 609 WUs. OTOH if he has put an entry as 610 for the fermi card in app_info and removed the 609 reference then all 609 WUs will be trashed because no apps match. Possibly post app_info.xml? - it will be easier to analyze what the problem might be. (an example <workunit> and <result> section for CPU and GPU would help as well). Also have to watch that platform entries match - especially between CPU and GPU for when rescheduling. You might need a couple of very similar looking entries in app_info to cope with the possble combinations especially during the transisiton between the cards. John. GPU Users Group |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Update. With some app_info fiddling I've managed to stop it deleting V6.09 files and it is actually crunching GPU units on all cards. But there is a bug in it where it doesn't see AK-v8_win_SSSE3x.exe entries and so deletes the CPU files. This bug was in the original app_info file I copied from the Fermi thread. The AK-v8 file is in the SAH directory and as far as I can tell the entries for it are the same as in the original V6.09 app_info which works. The error message is "State file error missing application AK_v8_win_ssse3x.exe" The file is posted below. One other bug is that it is trying to run 3 tasks on all cards not just the Fermi. What sort of entry do I need to limit the non-fermi cards to one unit at a time? <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <file_name>AK_v8_win_SSSE3x.exe</file_name> <executable/> </file_info> <file_info> <name>setiathome_6.09_windows_intelx86__cuda23.exe</name> <executable/> </file_info> <file_info> <name>cudart.dll</name> <executable/> </file_info> <file_info> <name>cufft.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_intelx86</platform> <flops>6051935388.55510675</flops> <file_ref> <file_name>AK_v8_win_SSSE3x.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_x86</platform> <file_ref> <file_name>AK_v8_win_SSSE3x.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> <file_info> <name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</name> <executable/> </file_info> <file_info> <name>cudart32_30_14.dll</name> <executable/> </file_info> <file_info> <name>cufft32_30_14.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>609</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <flops>320000000000</flops> <plan_class>cuda</plan_class> <file_ref> <file_name>setiathome_6.09_windows_intelx86__cuda23.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart.dll</file_name> </file_ref> <file_ref> <file_name>cufft.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <coproc> <type>CUDA</type> <count>0.33</count> </coproc> </app_version> </app_info> TIA Brodo |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Update. Let's see, there's an extra <app_version> section for CPU, a missing </app_version> tag for the cuda_fermi section which led to the original trashing of cuda23 tasks. The following has stuff needing deletion in red and needed additions in green: <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <file_name>AK_v8_win_SSSE3x.exe</file_name> <executable/> </file_info> <file_info> <name>setiathome_6.09_windows_intelx86__cuda23.exe</name> <executable/> </file_info> <file_info> <name>cudart.dll</name> <executable/> </file_info> <file_info> <name>cufft.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_intelx86</platform> <flops>6051935388.55510675</flops> <file_ref> <file_name>AK_v8_win_SSSE3x.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_x86</platform> <file_ref> <file_name>AK_v8_win_SSSE3x.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> <file_info> <name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</name> <executable/> </file_info> <file_info> <name>cudart32_30_14.dll</name> <executable/> </file_info> <file_info> <name>cufft32_30_14.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>609</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.05</max_ncpus> <flops>320000000000</flops> <plan_class>cuda</plan_class> <file_ref> <file_name>setiathome_6.09_windows_intelx86__cuda23.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart.dll</file_name> </file_ref> <file_ref> <file_name>cufft.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <coproc> <type>CUDA</type> <count>0.33</count> </coproc> </app_version> </app_info> Joe |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
In addition to Joe's points, I'd update Boinc to 6.10.58, it won't delete files that aren't in the app_info, and should correctly state the Fermi's GFlop's, as well as numerous other fixes and improvements, I'm not so sure about having both Fermi and Cuda23 apps in the app_info, doubt that's going to work properly, Claggy |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Sorry Joe Still getting this error 24/07/2010 06:16:55 SETI@home [error] State file error: missing application file AK_v8_win_SSSE3x.exe And I copied your file twice just to make sure :-( Any ideas about the... <type>CUDA</type> <count>0.33</count> entry so it doesn't run multiple WU's on the 2xx series GPU's ? @ Claggy I don't think it's using the V6.09 app on any GPU. I get this in the start up 24/07/2010 06:16:55 NVIDIA GPU 0: GeForce GTX 470 (driver version 25896, CUDA version 3010, compute capability 2.0, 1280MB, 272 GFLOPS peak) 24/07/2010 06:16:55 NVIDIA GPU 1: GeForce GTS 250 (driver version 25896, CUDA version 3010, compute capability 1.1, 512MB, 471 GFLOPS peak) 24/07/2010 06:16:55 NVIDIA GPU 2: GeForce GTS 250 (driver version 25896, CUDA version 3010, compute capability 1.1, 512MB, 442 GFLOPS peak) But work is suspended atm till I can upload and report the finished units. Then I'll play a bit more. Regards T.A, |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
And I copied your file twice just to make sure :-( And did you remove the red sections? ;-) Gruß, Gundolf |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
And I copied your file twice just to make sure :-( Yes, I changed them to purple :-P |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Sorry Joe You're also missing the following from the <plan_class>cuda_fermi</plan_class> <app_version> section: <coproc> <type>CUDA</type> <count>0.33</count> </coproc> Put it straight before the green </app_version> Claggy |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
You're also missing the following from the <plan_class>cuda_fermi</plan_class> <app_version> section: As an additional entry or move it up from the bottom ?? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
You're also missing the following from the <plan_class>cuda_fermi</plan_class> <app_version> section: An additional entry, Claggy |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Sorry Joe Hmm, the 0.2 and later installers have the newer AK_v8b_win_SSSE3x.exe file. Make sure the file is there and double check whether it's the v8 or v8b version, make the app_info.xml match... Edit: As to running multiple instances on the Fermi and only single instances on the others, I doubt it's possible. Having the <count>0.33</count> only in the cuda_fermi app_version might possibly help, but with the mix of cards I think BOINC will launch any of the cuda tasks on any of the cards. The inability to deal with the details for different card types is why BOINC's default is to use only the most capable card and any which are a close match. The <use_all_gpus> option allows human judgement to override the default, but in this case you'll have to decide whether the increased productivity on the GTX 470 more than offsets the decreased productivity on the other cards. Additional edit: It looks like the other cards don't have sufficient memory for multiple tasks, so IMO you'll either have to settle for single instances on the GTX 470 or remove the other cards. Joe |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
Well I gave up trying to get that app_info working. There's a bug there somewhere that nobody was able to spot. I made another by taking my working V6.09 app_info and carefully going through it substituting the V6.10 components for the originals. It worked straight off the bat. I'll post it in the Fermi thread for anyone who's interested I gave up on the idea of running the the 470 and the 285 in the same box as while the 285 could handle 2 threads at once quite well, it choked on 3, taking around 45 minutes per unit compared to the 470's 20 minutes. With 1 or 2 threads they ran neck and neck. Reasonably impressed with the 470 now that I've got it working and stable. Running 3 threads at stock speed it takes around 20 to 25 minutes per unit, in otherwords its the equivalent of 3 GTS250's and my machine with 3 GTS250's runs a slightly higher RAC than machines with 1 GTX295. I found the way to stop units being trashed was to use Reschedule to move all the units to the CPU, then run S@NL-Fred's tool over the lot. After that you can use either Fred's tool or Reschedule 1.9 to move them back to the GPU. Don't ask me why it works, it just does :-) Thanks to all those who offered help and advice along the way. The Terror |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.