Message boards :
Number crunching :
Running SETI@home on an nVidia Fermi GPU
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 15 · Next
Author | Message |
---|---|
John Send message Joined: 21 May 99 Posts: 51 Credit: 5,667,907 RAC: 0 ![]() |
How do I tell the difference between fermi credits pending and Cpu Credits pending? Host ID 5012909 ![]() |
![]() ![]() Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 ![]() |
i accidentally dumped "rationed" MP work units by using Reschedule to move work from the cpu to the gpu, boinc then dumped the work since there wasn't a application. can't i put the fermi app down as 608 too, or would that bork the science? ![]() ![]() |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
Only 6.10 is safe for Fermi, As to renaming the 6.10 app and the name in the app_info file, I don't see any harm in that, Unless ones a Muppet. :D[/quote] Can you clarify this please? Im a muppet!!! I'm presuming that I rename the 6.10 app to something else and then edit the app info to reflect the new name? Im guessing it has to be the 6.08 version? But I don't want to screw up anything else... The fermi part of the app_info is a couple of posts below edited by claggy.. ![]() |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
[quote] Can you clarify this please? Im a muppet!!! Im guessing that I want to replace any mention of the 6.10 app in my app_info with MB_608_CUDA_V12_VLARkill_FPLim2048.exe and then rescheduler would work?? The fermi part of the app_info is a couple of posts below edited by claggy, I have taken this and had a go... Have I done this right, if this app info is correct I will rename the exe to MB_608_CUDA_V12_VLARkill_FPLim2048.exe to reflect the changes I have made.. <app> <name>setiathome_enhanced</name> </app> <file_info> <name>MB_608_CUDA_V12_VLARkill_FPLim2048.exe</name> <executable/> </file_info> <file_info> <name>cudart32_30_14.dll</name> <executable/> </file_info> <file_info> <name>cufft32_30_14.dll</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>608</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <platform>windows_intelx86</platform> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>MB_608_CUDA_V12_VLARkill_FPLim2048.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>608</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <platform>windows_x86_64</platform> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>MB_608_CUDA_V12_VLARkill_FPLim2048.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> </app_info> ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
[quote] STOP!!! Please don't guess. Please take the time and trouble to get it right. I'm only just catching up after last night. First, please identify which host you're talking about, while I catch up on the previous discussion. Then we can work out the proper way to proceed. But renaming executables isn't it. |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
Damn, that red STOP is scary... I haven't changed a thing yet, I was waiting until those more knowledgeable than me had replied to see if I was right.. My host ID is 5424775 ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Two general obervations, prompted by MadMaC's post but applicable to everyone. 1) Please don't rename executable files. They would still work, but the name has no impact on the sort of issues we're discussing here. What it would do is to cause major confusion if you ever had to ask for help here in the future. If you post a fragment of app_info saying that you're using "this_app", but you're actually using "that_app" under another name, we'll all end up getting into even more bother than we are already. 2) The ReScheduler tool was written by Marius. He posted it in a discussion thread at Lunatics, but that's the limit of Lunatics' involvement. Unfortunately, shortly after he wrote and posted it, he became very busy at work and couldn't stay around to provide support. It's a brilliant tool, and many people are - rightly - grateful to Marius for writing it. But no-one except Marious really knows, in detail, exactly how it works. It's a black box, with no source code. So adapting it to work in new situations, which Marius didn't plan for (and no reason why he should) requires experimentation and observation. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Damn, that red STOP is scary... OK, that's your GTX 470. So, for the time being, the application you have to use is setiathome_6.10_windows_intelx86__cuda_fermi.exe. If you've ever renamed any files, I suggest you download a fresh copy so you can be sure it does exactly what it says on the tin.... Now, to ReScheduler. We had a conversation about this a week ago, in this very thread: You: message 1001516 Me: message 1001523 You: message 1001578 You had problems, but you didn't actually post the results of the experiment I suggested. Can you remember what they were? |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
Well I can remember trying to add a block into my app_info starting up boinc and it trashed all my tasks (or discarded them to be more accurate)... Just tried it again and the same has happened - This seems a bit strange as I only moved 11% of my taks to the gpu, so there should have been some left.. Id post the entire client state.xml but it was 847K, now it is 7K and all my tasks have gone. Boinc was shutdown after I suspended the GPU I think the problem is down to me either not identifying the tasks correctly in the client state xml or I have made an error somewhere else. Either way, I will have to wait now until I can get some more work and then try again.. Damn - will have to post an apology to my wingmen... Off to a meeting now, but will check back in when I get back to my desk... ![]() |
ToxicTBag Send message Joined: 5 Feb 10 Posts: 101 Credit: 57,197,902 RAC: 0 ![]() |
First a big thank you to all who are contributing to the thread and are aiming for a greater understanding into how to get theres and others configurations to work with seti. My setup is as follows machine 1 3 gtx 470 machine 2 1 gtx 480 machine 3 2x gtx 295 and 1 x gtx 260 Please correct me if i am wrong the machines are as follows Machine 1 and 2 were running 6.10.56 with opt lunatics .36 and had not downloaded work in 3 days along with the 100 quota message. Machine 3 was running 6.10.56 with opt .36 and had work but was not receiving new work along with 100 message and app_info error message(though it carried on working). When the upload servers began working yesterday 1 and 2 uploaded all their work(not a lot) number 3 had over 1500 completed wu's and began uploading after it had uploaded around 1000 of the tasks i had a message your password is not valid. I tried a restart more than once and when it did connect it said i was not attached to any projects, i checked my computers tab in boinc and it had uploaded a lot of the work but showed a lot as client detached. That is my situation at present, i would like to ask the following. On machines 1, 2 and 3 which have downloaded a small amount of work now should i just install 6.10.56 app with no opt .36 and use rescheduler to catch vlars etc and wait and see what happens once present cache is cleared? Apologies for the long winded question but i have read a lot of threads and have been given a lot of advice but do not seem to be getting anywhere.I am not very good with editing the app_info and would appreciate any advice the thread has to offer. Thank you in advance. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
The process was meant to be: 1) Stop BOINC 2) ReSchedule tasks 3) Examine client_state 4) Modify app_info 5) Start BOINC You will lose work if you allow #5 without successfully completing #3 and #4. In general, I would urge people not to post app_info in a sticky 'information' thread like this - it just confuses other readers in the future, who may have difficulty picking out the "working" advice from the "faulty" requests for help. Start another thread, which can fade off the first page when the problem is solved, or at a pinch send someone a PM. And never post the whole of client_state! Apart from the size, it's got security information in it. In this case, all we need is a matched <workunit> / <result> pair for a rescheduled job. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
On machines 1, 2 and 3 which have downloaded a small amount of work now should i just install 6.10.56 app with no opt .36 and use rescheduler to catch vlars etc and wait and see what happens once present cache is cleared? Much of your question relates to the general server problems, which are better covered in the other threads. But since machines 1 and 2 have fermi cards: Make sure you have already downloaded both Fermi (v6.10) work and CPU (v6.03) work, so you have the necessary equipment to handle rescheduled tasks. Then, use ReScheduler only to move VLAR work from GPU, to CPU. Don't even attempt to move work in the opposite direction until we get MadMaC's problem sorted out, and we can draw up more general rules for other people to follow. I can't work it out without MadMaC's co-operation, because many of his problems are related to his 64-bit operating system, and I don't have one here to practice on. |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
In this case, all we need is a matched <workunit> / <result> pair for a rescheduled job. Is there an easy way to match the ID of a rescheduled WU. One of the reasons I started up boinc in suspended mode was to see if I could get a workunit ID to search for in the client state xml file? Once I have downloaded some work I will post back here - it might not be until tommorrow the way the servers are going though.. edit - I have some cpu work so suspend boinc, shut it down and try and go through the xml file and see what I can find.. I will not start boinc up at all.. Interesting Re-scheduler logs --------------------------- Reschedule version 1.9 Time: 15-06-2010 14:20:10 User testing for a reschedule CPU tasks: 128 (0 VLAR, 66 VHAR) GPU tasks: 0 (0 VLAR, 0 VHAR) Reschedule needed because not enough GPU units --------------------------- Reschedule version 1.9 Time: 15-06-2010 14:20:17 User forced a reschedule Boinc applications setiathome_enhanced 603 windows_intelx86 No SETI cuda application found And it doesn't move any units - Im going to look but I know my fermi app is there... ![]() |
ToxicTBag Send message Joined: 5 Feb 10 Posts: 101 Credit: 57,197,902 RAC: 0 ![]() |
[quote But since machines 1 and 2 have fermi cards: Make sure you have already downloaded both Fermi (v6.10) work and CPU (v6.03) work, so you have the necessary equipment to handle rescheduled tasks. Then, use ReScheduler only to move VLAR work from GPU, to CPU. Don't even attempt to move work in the opposite direction until we get MadMaC's problem sorted out, and we can draw up more general rules for other people to follow. I can't work it out without MadMaC's co-operation, because many of his problems are related to his 64-bit operating system, and I don't have one here to practice on.[/quote] Thank you for taking the time to reply Richard very much appreciated, i will continue to watch carefully for progress on these problems. TTBag |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
Thanks for the clarification on this VW - will keep this as plan B Im going to work with richard to see if we can get re-scheduler working with fermi so that we can come up with a process for doing this as I know Im not the only one who needs it working as the normal scheduler does not work that well for fermi yet.. I have shutdown BOINC, used rescheduler to move 6 tasks across,but I having trouble locating the 6 tasks I have moved in the client state.xml file. What field am I looking for? I can see 4 active tasks, lots of results, and Im guessing that I want to look at the workunit field and version number? I can see an app version filed which gives the relevant info on the filename ile_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> Then a section below which is a mirror of my app info... Then I get to the workunit section - sample below <workunit> <name>05no09ag.3059.6207.13.10.168</name> <app_name>setiathome_enhanced</app_name> Im guessing that this is where the error is <version_num>610</version_num> <rsc_fpops_est>157714230303466.000000</rsc_fpops_est> <rsc_fpops_bound>1577142303034660.000000</rsc_fpops_bound> <rsc_memory_bound>33554432.000000</rsc_memory_bound> <rsc_disk_bound>33554432.000000</rsc_disk_bound> <file_ref> <file_name>05no09ag.3059.6207.13.10.168</file_name> <open_name>work_unit.sah</open_name> </file_ref> </workunit> Then I get down to the result field <result> <name>05no09ag.19879.25021.12.10.248_0</name> <final_cpu_time>0.000000</final_cpu_time> <final_elapsed_time>0.000000</final_elapsed_time> <exit_status>0</exit_status> <state>2</state> <platform>windows_intelx86</platform> <version_num>610</version_num> <plan_class>cuda</plan_class> <wu_name>05no09ag.19879.25021.12.10.248</wu_name> <report_deadline>1280546919.000000</report_deadline> <received_time>1276603771.589111</received_time> <file_ref> <file_name>05no09ag.19879.25021.12.10.248_0_0</file_name> <open_name>result.sah</open_name> </file_ref> </result> Thats all I can find.. Any help anyone on anything else I should be looking for..... ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I was just in the middle of typing an answer to this when maintenance hit, 30 minutes before I was expecting it. They were in a hurry to get started! (And I was never so relieved to see maintenance start - went straight out for a long walk in the sunshine. I needed that!) Now I just redid this app_info, As I used My normal app_info as a template, The only lines that are important to alter are as follows: That's absolutely right - the name doesn't matter. But I'd still strongly urge people not to rename MissPiggy_608.exe to KermitTheFrog_610.exe - that would cause endless confusion for us (and them!) if you/they ever need help/counselling.... I was going on to talk about the significance of <plan_class>, but I see MadMaC's post answers my next question, so I'll move on to that one. |
![]() ![]() Send message Joined: 4 Aug 99 Posts: 102 Credit: 3,051,091 RAC: 0 ![]() |
MadMaC, <app_name>setiathome_enhanced</app_name> Im guessing that this is where the error isNo, this string of <workunit> section is correct. It's multibeam workunit. <version_num>610</version_num>Here for fermi-oriented app 6.10 must be <plan_class>cuda_fermi</plan_class> Rescheduler 1.9 doesn't know about new version 6.10 and this new plan_class - at compilation time such new things didn't exists... |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
<app_name>setiathome_enhanced</app_name> Im guessing that this is where the error is No, that's not it - leave that one well alone. Eevery current workunit will be either "setiathome_enhanced" or "astropulse_v505" - no other choice. The problem is actually in the "result" section: <result> That's the one I was going to talk to Victor about - the extra important line, apart from <version_num>, involved in re-branding. You have obviously set up ReScheduler to use version 610 for the re-branded (new) CUDA tasks - which is right. But (so far as I know) Marius hasn't provided any mechanism for modifying the <plan_class>. One work-round would be to do a global search-and-replace on client_state.xml - or at least the SETI@home part of it (don't change any sections belonging to other projects!). Usual warnings: take backups, have BOINC fully shut down, use Notepad or another plain-text editor only. The change would be: Replace <plan_class>cuda</plan_class> with <plan_class>cuda_fermi</plan_class> But I don't recommend it: you would have to repeat the change every time you re-scheduled work from CPU to GPU. Far better to modify your app_info. Look back to post #2. Copy the whole first section (<app_version> ... </app_version>) again, and paste it as a whole duplicate section into app_info (keep it inside the <app_info> ... </app_info> tags, but clear of everything else). Then, in one of the duplicate app_version blocks, change <plan_class>cuda_fermi</plan_class> to <plan_class>cuda</plan_class>. That should give you the necessary tool for processing the results modified by ReScheduler. Let us kow how you get on. |
![]() ![]() Send message Joined: 4 Apr 01 Posts: 201 Credit: 47,158,217 RAC: 0 ![]() |
OK - this failed as boinc reported errors 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home [error] No application found for task: windows_intelx86 610 cuda; discarding 16/06/2010 10:26:56 SETI@home URL http://setiathome.berkeley.edu/; Computer ID 5424775; resource share 100 The last section is the duplicated one that I added, Im guessing that I did it wrong..... Everything was inside the app info and I have checked to make sure that setiathome_6.10_windows_intelx86__cuda_fermi.exe is where it should be.. <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_v8b_win_SSE3_AMD.exe</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_intelx86</platform> <file_ref> <file_name>AK_v8b_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>AK_v8b_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v505</name> </app> <file_info> <name>ap_5.05r409_SSE.exe</name> <executable/> </file_info> <app_version> <app_name>astropulse_v505</app_name> <version_num>505</version_num> <platform>windows_intelx86</platform> <file_ref> <file_name>ap_5.05r409_SSE.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>astropulse_v505</app_name> <version_num>505</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>ap_5.05r409_SSE.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</name> <executable/> </file_info> <file_info> <name>cudart32_30_14.dll</name> <executable/> </file_info> <file_info> <name>cufft32_30_14.dll</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <platform>windows_intelx86</platform> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <platform>windows_x86_64</platform> <flops>57462450464</flops> <plan_class>cuda_fermi</plan_class> <file_ref> <file_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <avg_ncpus>0.200000</avg_ncpus> <max_ncpus>0.200000</max_ncpus> <plan_class>cuda</plan_class> <file_ref> <file_name>setiathome_6.10_windows_intelx86__cuda_fermi.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>cufft32_30_14.dll</file_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> <coproc> <type>CUDA</type> <count>1</count> </coproc> </app_version> </app_info> ![]() |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
OK - think I have done this right... Ah - I'd forgotten you'd already modified the file with those <platform> tags for 64-bit Windows. Look at the app_info you've posted, and try to get used to seeing the 'shape' of the file: separate sections, each with a <tag> ... </tag> enclosing it. You've now got three <app_version> sections. You can probably get away with that, as the sample (rebranded) <result> you posted had the vital clue: <platform>windows_intelx86</platform> I suggest you add that to the new (third) <app_version> block, so it matches the first one. And then check again for any other differences I've missed, before you run it! |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.