GPU tasks failing

Message boards : Number crunching : GPU tasks failing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1889027 - Posted: 10 Sep 2017, 11:40:06 UTC - in response to Message 1889001.  
Last modified: 10 Sep 2017, 12:22:59 UTC


Sorry for the "detour" but I thought it would be better to be clear on this.
Hopefully, BOINC will figure out eventually which tasks can the specific pc handle or the problems will disappear as they came (i am dreaming right?)

Yes EOL, Boinc will learn, but it will take awhile. Since you have so many ati5-cat132 tasks still in your cache that we know will fail, let's try to get them running.
I have not upgraded to Windows 10, so I am going to make a few assumptions here that the file manager is still called Explorer, and the basic text editor is still called Notepad.
One other assumption I will make is that you are running BoincManager in the advanced view.

Now we need to find the empty command line text file in the Seti project directory. In a typical Windows 7 install, this directory was located at:
C:\ProgramData\BOINC\projects\setiathome.berkeley.edu
Once you have found the directory using Explorer, scroll down and find the command line file for the ati5-cat132. The file name should look something like:
mb_cmdline-8.22_windows_intel__opencl_ati5_cat132.txt
Once you have found the file, double-click it to open it up in Notepad. Type the following parameters in on the first line. DO NOT hit the enter key.
-v -sbs 64
Save the file and exit Notepad.

The changes to the command line file will not affect the current task executing but will take effect on the next one that starts, be it a new task or a postponed task. Open BoincManager to the tasks tab. If tasks say in the status column "Suspended, Computer is in use." Please click on "Activity" and under GPU select "Use GPU always". Observe what happens to GPU tasks and let me know what happens.

Edit: Looking further, your ati5_Sog_cat132 tasks are also failing, so please put the same parameters into their command line text file:
mb_cmdline-8.22_windows_intel__opencl_ati5_SoG_cat132.txt
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1889027 · Report as offensive
Profile Darrell
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,418,681
RAC: 0
United States
Message 1889033 - Posted: 10 Sep 2017, 12:06:17 UTC
Last modified: 10 Sep 2017, 12:48:47 UTC

Hi Dave and Grant, if you look closely at her tasks results you will find that the ati_cat132 tasks run and complete successfully with the preference settings EOL uses. The ati5_SoG_cat132 and ati5_cat132 tasks are the ones that are being postponed due to a lack of memory. After 100 postponements, the task errors out. This is the problem that EOL originally asked help for. Hopefully, with the right command line parameters, we can get the tasks to run. Once this problem is solved, we can offer EOL ideas on how to get the system running more smoothly.

If we are unsuccessful in getting the tasks to run, EOL will have to abort any ati5_cat132 until the server learns to just sent ati_cast132 tasks. Or EOL will have to use an app_config or app_info file to tell the server to only send the ati_cat132 tasks.

Edit: Good catch Dave, hate it when things get posted while typing a long reply:

Looking at your stats I see you are using Rosetta as well, is this a new project on this PC? Please try suspending Rosetta as it has a graphics display and maybe this is preventing SETI running on your GPU.

If EOL is also running a Rosetta task or a Seti CPU task and using the screensaver which only runs when the computer is not in use. This would definitely be the cause of the lack of memory for the Seti GPU ati5_cat132 tasks. The ati_cat132 tasks are probably just squeezing in with the amount that is available.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1889033 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1889038 - Posted: 10 Sep 2017, 12:29:04 UTC
Last modified: 10 Sep 2017, 13:06:04 UTC

I would suggest to run the Lunatics Installer and choose the non HD5 version and see if this helps.

The HD 5400 only has work group size of 128 instead of 256.
We have seen quite a few having issues with those cards.

You could also try to tune wg size via comand line .
Check for files **_comandline****.txt in your project folder and add the following line.

-oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128

Save as text.
Make sure to edit all versions of them.

Hope this helps.


With each crime and every kindness we birth our future.
ID: 1889038 · Report as offensive
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 652,881
RAC: 1
Greece
Message 1889091 - Posted: 10 Sep 2017, 17:24:37 UTC

Thank you all for the insight,
1. I paused all computations for rosetta. Screen saver has been always set to blank screen so i do not think the issues had to do with rosetta screensaver.

2. Following the comments of one of you i found this inside the stderr.txt file in the BOINC/seti directory:

20:18:00 (9980): Can't open init data file - running in standalone mode
20:18:00 (9980): Can't open init data file - running in standalone mode
Not using mb_cmdline.txt-file, using commandline options.
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
20:18:00 (9980): Can't open init data file - running in standalone mode
WARNING: init_data.xml missing
OpenCL platform detected: Advanced Micro Devices, Inc.
WARNING: BOINC supplied wrong platform!
BOINC assigns device 0
1 slot of 64 used for this instance
WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities
Info: CPU affinity mask used: 2; system mask is 3
SETI@home error -5 Can't open file
(work_unit.sah) in read_wu_state() errno=2

File: ..\worker.cpp
Line: 136

3. The **_comandline****.txt filein my BOINC data directory are all 0 bytes and empty. Is this normal?
ID: 1889091 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1889121 - Posted: 10 Sep 2017, 20:18:38 UTC

3. The **_comandline****.txt filein my BOINC data directory are all 0 bytes and empty. Is this normal?


Yes, that`s normal.


With each crime and every kindness we birth our future.
ID: 1889121 · Report as offensive
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 652,881
RAC: 1
Greece
Message 1889127 - Posted: 10 Sep 2017, 21:21:06 UTC - in response to Message 1889121.  

ok,
1. i closed BOINC added the line "-v -sbs 64" as advised in both mb_cmdline of the ati 5 soh & SG files saved relaunched BOINC and continue to have the same problem.
2. I closed again BOINC added the line "-oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128" below the line in (1) in both mb_cmdline of the ati 5 soh & SG files saved relaunched BOINC and continue to have the same problem.
ID: 1889127 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22224
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1889202 - Posted: 11 Sep 2017, 9:40:52 UTC

As far as I'm aware the command line must be a single line, and MUST NOT end with a carriage return (which is all to easy to add by mistake/force of habit). Mike is the command line guru so will probably put us both right!
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1889202 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1889213 - Posted: 11 Sep 2017, 12:09:53 UTC

There is no sign of ocllfft planning in your stderr..

Make sure its single line.


With each crime and every kindness we birth our future.
ID: 1889213 · Report as offensive
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 652,881
RAC: 1
Greece
Message 1889271 - Posted: 11 Sep 2017, 17:06:45 UTC - in response to Message 1889213.  

so it should like this?:
-v -sbs 64 -oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128

The one i made looked like this:
-v -sbs 64
-oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128

Unfortunatelly as of today i will be away from the culprit computer so i will have to ask for your patience in order to give you feedback (probably 10-15 days)
Thank you all for the time and effort you dedicated to the problem!
ID: 1889271 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1889316 - Posted: 11 Sep 2017, 20:39:02 UTC - in response to Message 1889271.  

so it should like this?:
-v -sbs 64 -oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128

The one i made looked like this:
-v -sbs 64
-oclfft_tune_gr 128 -oclfft_tune_lr 8 -oclfft_tune_wg 128

Unfortunatelly as of today i will be away from the culprit computer so i will have to ask for your patience in order to give you feedback (probably 10-15 days)
Thank you all for the time and effort you dedicated to the problem!


Yes, but make sure there`s a space between each command.


With each crime and every kindness we birth our future.
ID: 1889316 · Report as offensive
Kevin Morgan

Send message
Joined: 1 Jul 99
Posts: 1
Credit: 2,765,186
RAC: 3
United Kingdom
Message 1895276 - Posted: 14 Oct 2017, 12:11:49 UTC

Hi,

I've had some long standing issues running 'opencl_ati5_cat132' and 'opencl_ati5_SoG_cat132' on a HD5450 under Windows 10 and I eventually went for the easy option and upgraded to a HD6450 (upon which I have had no problems with running, completing and validating work units using these applications). However, I recently revisited the problem and found the following:-

1). Ran 'opencl_ati5_cat132' on HD5450;
Result: Failed, Task Postponed.

2). Inserted '-spike_fft_thresh 2048 -tune 1 2 1 16' <WITH NO CARRIAGE RETURN> into file C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.22_windows_intel__opencl_ati5_sah.txt';
Result: Success, Work Unit ran, completed without errors and was validated.
I don't think its a memory problem as both my HD5450 and HD6450 have 1 Gb of memory with 991 Mb of OpenCL Available RAM.
The last three numbers, in the '-tune' command line parameter, when multiplied together, should not excede the MAX WORK GROUP SIZE (listed when you run 'clinfo') for the card. As Mike above says, for the HD5450 it is 128 and for the HD6450 it is 256. I have not tried all the permutations of these 3 numbers to see which work and which don't but you could if you wanted to and had the time. You might like to look at C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\setiathome-8.22_windows_intelx86__opencl_ati5_sah_README_OPENCL.txt for more information.

3). Ran 'opencl_ati5_SoG_cat132' on HD5450;
Result: Failed, Task Postponed.
ERROR: OpenCL kernel/call 'Enqueueing kernel:pc_triplet_find_cl' call failed (-54) in file ..\analyzePoT.cpp near line 1393.

4). Inserted '-spike_fft_thresh 2048 -tune 1 2 1 16' <WITH NO CARRIAGE RETURN> into file C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.22_windows_intel__opencl_ati5_SoG.txt';
Result: Failed, Task Postponed.
ERROR: OpenCL kernel/call 'clEnqueueNDRangeKernel(cq,Spike_logging_HD5_kernel_cl)' call failed (-55) in file ..\analyzeFuncs.cpp near line 3530.

In fact, I tried most permutations of the last three numbers, that when multiplied together equaled 128, but nothing worked.I am wondering whether this application requires a MAX WORK GROUP SIZE greater than 128 to work, can anyone here say for certain ?

If I get time I might try some of the commandline parameters suggested by Mike above for the 'opencl_ati5_SoG_cat132' to see if they work for a HD5450,

regards,

Kevin.
ID: 1895276 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : GPU tasks failing


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.