Message boards :
Number crunching :
GPU tasks failing
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Darrell Send message Joined: 14 Mar 03 Posts: 267 Credit: 1,418,681 RAC: 0 |
Yes EOL, Boinc will learn, but it will take awhile. Since you have so many ati5-cat132 tasks still in your cache that we know will fail, let's try to get them running. I have not upgraded to Windows 10, so I am going to make a few assumptions here that the file manager is still called Explorer, and the basic text editor is still called Notepad. One other assumption I will make is that you are running BoincManager in the advanced view. Now we need to find the empty command line text file in the Seti project directory. In a typical Windows 7 install, this directory was located at: C:\ProgramData\BOINC\projects\setiathome.berkeley.edu Once you have found the directory using Explorer, scroll down and find the command line file for the ati5-cat132. The file name should look something like: mb_cmdline-8.22_windows_intel__opencl_ati5_cat132.txt Once you have found the file, double-click it to open it up in Notepad. Type the following parameters in on the first line. DO NOT hit the enter key. -v -sbs 64 Save the file and exit Notepad. The changes to the command line file will not affect the current task executing but will take effect on the next one that starts, be it a new task or a postponed task. Open BoincManager to the tasks tab. If tasks say in the status column "Suspended, Computer is in use." Please click on "Activity" and under GPU select "Use GPU always". Observe what happens to GPU tasks and let me know what happens. Edit: Looking further, your ati5_Sog_cat132 tasks are also failing, so please put the same parameters into their command line text file: mb_cmdline-8.22_windows_intel__opencl_ati5_SoG_cat132.txt ... and still I fear, and still I dare not laugh at the Mad Man! Queen - The Prophet's Song |
Darrell Send message Joined: 14 Mar 03 Posts: 267 Credit: 1,418,681 RAC: 0 |
Hi Dave and Grant, if you look closely at her tasks results you will find that the ati_cat132 tasks run and complete successfully with the preference settings EOL uses. The ati5_SoG_cat132 and ati5_cat132 tasks are the ones that are being postponed due to a lack of memory. After 100 postponements, the task errors out. This is the problem that EOL originally asked help for. Hopefully, with the right command line parameters, we can get the tasks to run. Once this problem is solved, we can offer EOL ideas on how to get the system running more smoothly. If we are unsuccessful in getting the tasks to run, EOL will have to abort any ati5_cat132 until the server learns to just sent ati_cast132 tasks. Or EOL will have to use an app_config or app_info file to tell the server to only send the ati_cat132 tasks. Edit: Good catch Dave, hate it when things get posted while typing a long reply:
If EOL is also running a Rosetta task or a Seti CPU task and using the screensaver which only runs when the computer is not in use. This would definitely be the cause of the lack of memory for the Seti GPU ati5_cat132 tasks. The ati_cat132 tasks are probably just squeezing in with the amount that is available. ... and still I fear, and still I dare not laugh at the Mad Man! Queen - The Prophet's Song |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I would suggest to run the Lunatics Installer and choose the non HD5 version and see if this helps. The HD 5400 only has work group size of 128 instead of 256. We have seen quite a few having issues with those cards. You could also try to tune wg size via comand line . Check for files **_comandline****.txt in your project folder and add the following line. -oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128 Save as text. Make sure to edit all versions of them. Hope this helps. With each crime and every kindness we birth our future. |
eol Send message Joined: 25 Oct 11 Posts: 8 Credit: 652,881 RAC: 1 |
Thank you all for the insight, 1. I paused all computations for rosetta. Screen saver has been always set to blank screen so i do not think the issues had to do with rosetta screensaver. 2. Following the comments of one of you i found this inside the stderr.txt file in the BOINC/seti directory: 20:18:00 (9980): Can't open init data file - running in standalone mode 20:18:00 (9980): Can't open init data file - running in standalone mode Not using mb_cmdline.txt-file, using commandline options. Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used 20:18:00 (9980): Can't open init data file - running in standalone mode WARNING: init_data.xml missing OpenCL platform detected: Advanced Micro Devices, Inc. WARNING: BOINC supplied wrong platform! BOINC assigns device 0 1 slot of 64 used for this instance WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities Info: CPU affinity mask used: 2; system mask is 3 SETI@home error -5 Can't open file (work_unit.sah) in read_wu_state() errno=2 File: ..\worker.cpp Line: 136 3. The **_comandline****.txt filein my BOINC data directory are all 0 bytes and empty. Is this normal? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
3. The **_comandline****.txt filein my BOINC data directory are all 0 bytes and empty. Is this normal? Yes, that`s normal. With each crime and every kindness we birth our future. |
eol Send message Joined: 25 Oct 11 Posts: 8 Credit: 652,881 RAC: 1 |
ok, 1. i closed BOINC added the line "-v -sbs 64" as advised in both mb_cmdline of the ati 5 soh & SG files saved relaunched BOINC and continue to have the same problem. 2. I closed again BOINC added the line "-oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128" below the line in (1) in both mb_cmdline of the ati 5 soh & SG files saved relaunched BOINC and continue to have the same problem. |
rob smith Send message Joined: 7 Mar 03 Posts: 22224 Credit: 416,307,556 RAC: 380 |
As far as I'm aware the command line must be a single line, and MUST NOT end with a carriage return (which is all to easy to add by mistake/force of habit). Mike is the command line guru so will probably put us both right! Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
There is no sign of ocllfft planning in your stderr.. Make sure its single line. With each crime and every kindness we birth our future. |
eol Send message Joined: 25 Oct 11 Posts: 8 Credit: 652,881 RAC: 1 |
so it should like this?: -v -sbs 64 -oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128 The one i made looked like this: -v -sbs 64 -oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128 Unfortunatelly as of today i will be away from the culprit computer so i will have to ask for your patience in order to give you feedback (probably 10-15 days) Thank you all for the time and effort you dedicated to the problem! |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
so it should like this?: Yes, but make sure there`s a space between each command. With each crime and every kindness we birth our future. |
Kevin Morgan Send message Joined: 1 Jul 99 Posts: 1 Credit: 2,765,186 RAC: 3 |
Hi, I've had some long standing issues running 'opencl_ati5_cat132' and 'opencl_ati5_SoG_cat132' on a HD5450 under Windows 10 and I eventually went for the easy option and upgraded to a HD6450 (upon which I have had no problems with running, completing and validating work units using these applications). However, I recently revisited the problem and found the following:- 1). Ran 'opencl_ati5_cat132' on HD5450; Result: Failed, Task Postponed. 2). Inserted '-spike_fft_thresh 2048 -tune 1 2 1 16' <WITH NO CARRIAGE RETURN> into file C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.22_windows_intel__opencl_ati5_sah.txt'; Result: Success, Work Unit ran, completed without errors and was validated. I don't think its a memory problem as both my HD5450 and HD6450 have 1 Gb of memory with 991 Mb of OpenCL Available RAM. The last three numbers, in the '-tune' command line parameter, when multiplied together, should not excede the MAX WORK GROUP SIZE (listed when you run 'clinfo') for the card. As Mike above says, for the HD5450 it is 128 and for the HD6450 it is 256. I have not tried all the permutations of these 3 numbers to see which work and which don't but you could if you wanted to and had the time. You might like to look at C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\setiathome-8.22_windows_intelx86__opencl_ati5_sah_README_OPENCL.txt for more information. 3). Ran 'opencl_ati5_SoG_cat132' on HD5450; Result: Failed, Task Postponed. ERROR: OpenCL kernel/call 'Enqueueing kernel:pc_triplet_find_cl' call failed (-54) in file ..\analyzePoT.cpp near line 1393. 4). Inserted '-spike_fft_thresh 2048 -tune 1 2 1 16' <WITH NO CARRIAGE RETURN> into file C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.22_windows_intel__opencl_ati5_SoG.txt'; Result: Failed, Task Postponed. ERROR: OpenCL kernel/call 'clEnqueueNDRangeKernel(cq,Spike_logging_HD5_kernel_cl)' call failed (-55) in file ..\analyzeFuncs.cpp near line 3530. In fact, I tried most permutations of the last three numbers, that when multiplied together equaled 128, but nothing worked.I am wondering whether this application requires a MAX WORK GROUP SIZE greater than 128 to work, can anyone here say for certain ? If I get time I might try some of the commandline parameters suggested by Mike above for the 'opencl_ati5_SoG_cat132' to see if they work for a HD5450, regards, Kevin. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.