Questions and Answers :
Windows :
Issue running Seti projects
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Try these options: Ageless, maybe you don't like my conversation style, but look, with number of symbols you spent to write what I should do you could explain that by yourself, don't you think so? Ok, I will do: There is readme file in project directory that explains these params. Also, there is another file in project directory that name looks like mb_cmdline_win_x86_SSE2_OpenCL_ATI.txt (I don't run stock currently so name is approximate). One need to copy/paste line with params there and restart app (or BOINC as whole). SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Sorry, Ageless I pointed Raistmer to this thread as I can't see why this problem occurs. Raistmer gives short hints and maybe expects I to elaborate on them if the user have questions "How to do that" ;) If Raistmer can better spend time and thinking to find why/how this problem occurs it will be fine with me 'expanding' his short hints. My 'hints' to him were: - What happens in the calculations at about 20% progress? - is the app stuck waiting for something? (it is not using at that moment CPU or GPU according to the user) - or it calls some external function which never returns? Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Most probably one of long-running kernel sequencies occur. All other possibilities much less probable for now. I'm awaiting OP to check system log and see how many driver restarts he actually had. For now best explanation of "stuck" is the driver restart @ ~20% and app freeze because of invalidated OpenCL context. And yes, with invalidated context first call to OpenCL runtime after invalidation will never return. It's very nasty "feature" of runtime. Much better would be if some error was returned to app... but we have what we have. SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I'm awaiting OP to check system log and see how many driver restarts he actually had. @TJ13 To do this: Option 1 - use the 'standard' way: Control Panel -> Administrative Tools -> Event Viewer -> System (on Windows 8 it may be somehow different, I did this on XP) But I prefer Option 2 - use MyEventViewer: http://www.nirsoft.net/utils/my_event_viewer.html @Raistmer Can you supply a search string (for 'Find') to easy locate the driver restarts? (in 'Event Viewer' or MyEventViewer) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
TJ13 Send message Joined: 13 Mar 06 Posts: 35 Credit: 7,970,069 RAC: 6 |
Apologies I was at work all day and only now got around to getting back to this post. I'm currently running the MB_bench_208.cmd BilBg gave and haven't finished one yet although it started about 12 mins ago. Also it did successfully snooze/pause BOINC in Windows 8. Also it turns out I've had a lot of the driver failures. As in more than I could reasonably count. First one started May 29th and I've had several when I was doing tests the past few days. Edit: as I post it the first one finished. Started at 00:42:28.424 Ended at 00:55:27:889. At this rate I'll probably let it run all night and have results in the morning. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Edit: as I post it the first one finished. "the first one" is a CPU app/result The second will be the GPU app (which may hang again at 20%) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
TJ13 Send message Joined: 13 Mar 06 Posts: 35 Credit: 7,970,069 RAC: 6 |
Well it finished. There is a file but it's really long and I'm not sure what's relevant. (Note: I'm going to bed now so don't expect a response for a few hours.) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
You can go to this site and paste the full text (and give the resulting link): http://pastebin.com/ (you can do it without account but to be able to edit and delete your pastes you need account) Example - my last Knabench test: http://pastebin.com/xA0pY8qn If the test really finished OK then there is easy solution for you - to go to apps included in Lunatics Installer (MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843.exe is exactly from it) (links to Lunatics Installer "How to" later, after we know ATi_HD5_r1843 really finished OK (you can find the info about Lunatics Installer by yourself but better wait for confirmation)) @Raistmer What is the difference between the stock 'SSEx Win32 Build 1831 (USE_OPENCL_HD5xxx)' (I don't know the name of the .exe distributed) http://setiathome.berkeley.edu/result.php?resultid=3021008523 and MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843.exe  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
TJ13 Send message Joined: 13 Mar 06 Posts: 35 Credit: 7,970,069 RAC: 6 |
http://pastebin.com/ZznGkDRP That's the full file. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
All looks good to me: R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG0395.wu.res Result : Strongly similar, Q= 99.95% R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG0395_v7.wu.res Result : Strongly similar, Q= 99.95% R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG1327.wu.res Result : Strongly similar, Q= 99.14% R2: .\ref\ref-setiathome_7.00_windows_intelx86.exe-PG1327_v7.wu.res Result : Strongly similar, Q= 99.14% (and 8 lines 'called boinc_finish' also show no app hang) ============= If you still continue to have this 'hang at 20%' problem now - you may go for 'Updated Installers, v0.41 for Windows' http://setiathome.berkeley.edu/forum_thread.php?id=71867&postid=1375943#1375943 (the only downside is the apps will not update automatically from SETI servers in the future. So [Subscribe] to the above thread to be notified if something new is posted. ) First read the link 'this thread': "Please see this thread for full release notes. READ THEM before you install." Then use any of the 2 download links: "... on Arkayn's site, Crunchers Anonymous HERE" or in the Mike's post You need file: Lunatics_Win64_v0.41_setup.exe (Installer will issue a warning about "BOINC is still running", I prefer to exit BOINC first manually) You need to select all apps (type of tasks) that you use or want to use (CPU, ATI, AstroPulse, 'MultiBeam v7' == SETI@home v7) For CPU if the AVX can be selected will be fastest. (you now don't have CPU tasks, is it incidentally or is your choice?) For ATI select MB7_win_x86_SSE_OpenCL_ATi_HD5_r1843 Also make sure you have the proper selections per your choice here: http://setiathome.berkeley.edu/prefs.php?subset=project Use CPU yes/no Use ATI GPU yes/no ... SETI@home Enhanced: yes/no SETI@home v7: yes/no AstroPulse v6: yes/no If no work for selected applications is available, accept work from other applications? yes/no ! If some app (type of tasks) is not selected (is unchecked) in 'SETI@home preferences' OR in Lunatics Installer you will not receive this type of tasks. (i.e. the app have to be selected in both places ('SETI@home preferences' and Lunatics Installer) to be 'active') Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Nothing that could explain difference in hang @20% behavior. More detailed info can be found in revision logs in sources (there were number of changes). https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Regarding r1843 ability to run shortened tasks - it proves nothing. 20% of full length task is definitely not the same. What about running with proposed options ? [It's known that A-10 per se can run stock app. I have similar APU. Just right config needed to avoid driver restarts] SETI apps news We're not gonna fight them. We're gonna transcend them. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
What about running with proposed options ? @TJ13 To do this (if you still continue to have this 'hang at 20%' problem): Go to: C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\ (Easy way to open this directory: Press Win+R ([Win] is the key between left [Ctrl] and [Alt]) Paste the above path (C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\) and hit Enter ) Find and edit the file (it have zero length by default): mb_cmdline_win_x86_SSE_OpenCL_ATi_HD5.txt Paste the following line in it and save: -period_iterations_num 40 -sbs 128 -hp To make the new options/settings in effect app have to be restarted, any of the following will do this: [Suspend]/[Resume] the stuck task (always wait 5-10 sec after any Suspend) (when you Suspend the stuck task - new task of the same kind will start) [Suspend]/[Resume] the project (SETI@home) Exit/start BOINC Restart Windows (do this if you had 'driver restart', after "driver failing and restarting successfully" OpenCL can no longer work - always Restart Windows if this happens) Check in SIV what are the Temperatures of CPU and GPU during the computations. (for CPU it shows both internal and external Temperatures, often the internal sensor is very wrong at low Temperature (when CPU is idle)) (too high Temperatures can cause 'driver restart' but usually at random points in the calculation.) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
TJ13 Send message Joined: 13 Mar 06 Posts: 35 Credit: 7,970,069 RAC: 6 |
Well I did what you said to do earlier and now I've received new tasks and finished them, but I've also received new tasks and gotten them stuck on 20% again. I've narrowed it down to having issues with tasks that are Application SETI@home v7 7.03 (opencl_ati_sah). Astropule and 7.00 Applications run fine. I'm not sure if anything I'm doing is just having the 7.03 not working but I figured I'd ask. Edit: Did what you said in the most recent post and restarted the laptop. It worked for a little bit then got the "driver failed and successfully restarted" message again. Then it stopped showing progress. SpeedFan says that the HD1 is currently at 39* C and GPU is at 54* C. I have a cooling fan under the laptop running at full speed so no idea if there's anything I can do about the laptop temperature. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I've narrowed it down to having issues with tasks that are Application SETI@home v7 7.03 (opencl_ati_sah) Yes, we know that from the beginning but why this happens on your computer is still unknown. SpeedFan says that the HD1 is currently at 39* C and GPU is at 54* C HD1 is probably the hard disk temperature which is irrelevant to this problem (39°C is in the safe range according to Google (in 2007): http://research.google.com/pubs/pub32774.html PDF: http://research.google.com/archive/disk_failures.pdf ) GPU at 54°C is also good (if measured during the GPU task progress (while GPU load in SIV was high), not after it hang at 20%) Since the GPU is in the same package (chip) with CPU I suppose the CPU temperature is not much different (you didn't say what SIV shows about it) Your 'Aborted by user' task using r1843 don't show change in options: http://setiathome.berkeley.edu/result.php?resultid=3021008858 Used GPU device parameters are: Number of compute units: 6 Single buffer allocation size: 64MB max WG size: 256 period_iterations_num=20  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
@Raistmer Do you think that running with -v 2 will give you some hint or a special debug build is needed? I downloaded one of the failed WUs of TJ13: http://setiathome.berkeley.edu/result.php?resultid=3021008523 Run it successfully (my GPU needed 26 minutes to reach and pass 20% progress) I used this line in mb_cmdline.txt -period_iterations_num 40 -sbs 128 -v 3 (-hp is causing too much lag for me so I omitted it; And (unfortunately for me) -v 3 == -v 2 (I was trying to cheat with undocumented feature ;) which works for some other apps) ) The results: http://pastebin.com/qb2G0P77 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Is this connected with the problem here? (and you want to make calls shorter by -period_iterations_num 40) Timeout Detection and Recovery of GPUs http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx TdrDelay: REG_DWORD. The number of seconds that the GPU is allowed to delay the preempt request from the scheduler. This is effectively the timeout threshold. The default value is 2 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-v <level> will just open more and more addition input as app evolves. So -v 10 (for example) can be used too, but will should only what I found useful to look at on some stage of development. No special plan for -v <level> option, new info added only when it's needed. -hp - if it increases lag... well, then it's worth to remove it from config line. For my Trinity-based desktop I also saw lag increase with CPU cores freed (but it correlated with increased throughput on that host). So, to avoid driver restarts it's worth to try with full loaded CPU also.... There is no such long runing single kernels in app (~2 seconds) so lags and driver restarts caused not single kernel or memory transfer but sequence of commands. So, some slowdown (as additional output) can fix this issue too. For example, any attempt to catch exact place of driver restart with debug (OCL_VERBOSE) build each time gave nothing. Driver restart just doesn't occur if synching after each OpenCL command performed. Increase in windows watchdog timer value or disable it completely could help with these driver restarts too. There are 2 types of MB OpenCL app: Windows/x86 7.03 (opencl_ati5_sah) 30 May 2013, 0:18:19 UTC Windows/x86 7.03 (opencl_ati_sah) that should be available for this particular host. interesting if both cause driver restarts ot not. EDIT: last cited resul recived with USE_OPENCL_HD5xxx, that is, opencl_ati_sah plan class app. Worth to check if another app will cause driver restarts too or not. With quota limited for opencl_ati_sah plan class SETI scheduler will start to issue work for another compatible plan class...) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
@TJ13 your host info shows that you able to run anonymous platform: SETI@home Enhanced 6.03 windows_intelx86 Try to re-run Lunatics installer and chose non-HD5 app for MultiBeam this time. Will it fix issue? EDIT: One of last results finished OK and has such stderr: <core_client_version>7.0.64</core_client_version> So, it's non-HD5 with proposed command line options used. Looks like this config able to finish work. Or you still have driver restarts on other tasks? If yes, try to remove -hp from config line. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TJ13 Send message Joined: 13 Mar 06 Posts: 35 Credit: 7,970,069 RAC: 6 |
Tried this and still experienced the problem.
I'm only having the problem on SETI@home v7 7.03 (opencl_ati_sah). Every other type of task has successfully run if that's what you're asking. Edit: Apparently trying a fresh project got it well above the 20% threshold, but the one stuck at 20% is still stuck. I'll check back later on it to see if it's just luck or fixed for the other projects as well. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.