GPU tasks failing

Message boards : Number crunching : GPU tasks failing
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 180,973
RAC: 1,313
Greece
Message 1887978 - Posted: 5 Sep 2017, 11:30:20 UTC

Dear all,
I am encountering continuous errors while computing seti@home v8 8.22 (opencl_ati5_cat132) WU. The pc in question is this one:

http://setiathome.berkeley.edu/results.php?hostid=8335697&offset=0&show_names=0&state=6&appid=

By clicking on the WU I see that others who have processed it successfully have done so utilizing a different application. I searched the forums and found that there was a problem with some WU back in January and is now solved.

I tried to remove and reinstall BOINC and reset the program without success. Right now all the GPU tasks run for 15-30 sec and then gets postponed.

I have not changed anything on my setup, it just started to behave like this....

Any ideas?
Thank you for your time and patience
ID: 1887978 · Report as offensive     Reply Quote
Profile DarrellProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,375,441
RAC: 542
United States
Message 1887991 - Posted: 5 Sep 2017, 23:03:54 UTC - in response to Message 1887978.  



I have not changed anything on my setup, it just started to behave like this....

Any ideas?
Thank you for your time and patience


From this statement, my guess would be that Windows 10 installed a new driver version that isn't quite compatible with your card's OpenCL capabilities. You might try rolling back the driver if possible.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1887991 · Report as offensive     Reply Quote
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 180,973
RAC: 1,313
Greece
Message 1888316 - Posted: 7 Sep 2017, 6:40:26 UTC - in response to Message 1887991.  

Dear Darrell,
Thank you for the prompt reply,

Unfortunately, this is not the case:
I just checked, I am using the same drivers from the video card that i was using 2 weeks ago with no problems that date back to 2015....
ID: 1888316 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8894
Credit: 115,369,168
RAC: 70,245
Australia
Message 1888321 - Posted: 7 Sep 2017, 7:06:51 UTC - in response to Message 1888316.  

Dear Darrell,
Thank you for the prompt reply,

Unfortunately, this is not the case:
I just checked, I am using the same drivers from the video card that i was using 2 weeks ago with no problems that date back to 2015....

Win10 can do nasty things to video drivers when it does it's updates. The last WIn10 update I had didn't change the vide driver, but it did stop my GPU from being able to crunch. Re-installing the very same driver as I was using fixed the issue.
Grant
Darwin NT
ID: 1888321 · Report as offensive     Reply Quote
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 180,973
RAC: 1,313
Greece
Message 1888427 - Posted: 7 Sep 2017, 17:51:21 UTC
Last modified: 7 Sep 2017, 17:51:52 UTC

Gentlemen,
thank you all for the insight: I have completely removed all ATI software using their tool and reinstalled the latest drivers: still no luck
I get the same postponed and error messages at the end....

Is it normal that the failing WU have been correctly crunched by different apps and only mine with the opencl_ati5_cat132 are failing?
for example:
http://setiathome.berkeley.edu/workunit.php?wuid=2665486201
http://setiathome.berkeley.edu/workunit.php?wuid=2665486164

below is a copy of the event diary do you notice anything strange?

7/9/2017 8:35:15 μμ | | cc_config.xml not found - using defaults
7/9/2017 8:35:15 μμ | | Starting BOINC client version 7.6.33 for windows_x86_64
7/9/2017 8:35:15 μμ | | log flags: file_xfer, sched_ops, task
7/9/2017 8:35:15 μμ | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
7/9/2017 8:35:15 μμ | | Data directory: E:\boinc data
7/9/2017 8:35:15 μμ | | Running under account bgeor
7/9/2017 8:35:18 μμ | | CAL: ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (CAL version 1.4.1848, 1024MB, 991MB available, 208 GFLOPS peak)
7/9/2017 8:35:18 μμ | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5400/R5 210 series (Cedar) (driver version 1800.8 (VM), device version OpenCL 1.2 AMD-APP (1800.8), 1024MB, 991MB available, 208 GFLOPS peak)
7/9/2017 8:35:18 μμ | | Host name: DESKTOP-6CH48EJ
7/9/2017 8:35:18 μμ | | Processor: 2 GenuineIntel Intel(R) Celeron(R) CPU E3500 @ 2.70GHz [Family 6 Model 23 Stepping 10]
7/9/2017 8:35:18 μμ | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 syscall nx lm vmx tm2 pbe
7/9/2017 8:35:18 μμ | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.15063.00)
7/9/2017 8:35:18 μμ | | Memory: 4.00 GB physical, 8.00 GB virtual
7/9/2017 8:35:18 μμ | | Disk: 110.38 GB total, 99.61 GB free
7/9/2017 8:35:18 μμ | | Local time is UTC +3 hours
7/9/2017 8:35:18 μμ | | VirtualBox version: 5.0.18
7/9/2017 8:35:18 μμ | Rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 3254523; resource share 100
7/9/2017 8:35:18 μμ | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8335697; resource share 100
7/9/2017 8:35:18 μμ | SETI@home | General prefs: from SETI@home (last modified 18-Apr-2012 01:23:36)
7/9/2017 8:35:18 μμ | SETI@home | Host location: none
7/9/2017 8:35:18 μμ | SETI@home | General prefs: using your defaults
7/9/2017 8:35:18 μμ | | Reading preferences override file
7/9/2017 8:35:18 μμ | | Preferences:
7/9/2017 8:35:18 μμ | | max memory usage when active: 2047.62MB
7/9/2017 8:35:18 μμ | | max memory usage when idle: 3685.71MB
7/9/2017 8:35:24 μμ | | max disk usage: 55.19GB
7/9/2017 8:35:24 μμ | | don't use GPU while active
7/9/2017 8:35:24 μμ | | suspend work if non-BOINC CPU load exceeds 25%
7/9/2017 8:35:24 μμ | | (to change preferences, visit a project web site or select Preferences in the Manager)
7/9/2017 8:35:24 μμ | | Suspending GPU computation - computer is in use
7/9/2017 8:35:42 μμ | | Suspending computation - CPU is busy
7/9/2017 8:36:34 μμ | | Resuming computation
7/9/2017 8:36:55 μμ | | Suspending computation - CPU is busy
7/9/2017 8:37:05 μμ | | Resuming computation
7/9/2017 8:37:46 μμ | | Suspending computation - CPU is busy
7/9/2017 8:38:27 μμ | | Resuming computation
7/9/2017 8:38:37 μμ | | Resuming GPU computation
7/9/2017 8:38:48 μμ | | Suspending computation - CPU is busy
7/9/2017 8:38:58 μμ | | Resuming computation
7/9/2017 8:39:28 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:39:47 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:40:08 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:40:20 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:40:39 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:40:59 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:41:12 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:41:16 μμ | | Suspending GPU computation - computer is in use
7/9/2017 8:41:24 μμ | | Resuming GPU computation
7/9/2017 8:41:50 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:42:04 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:42:23 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:42:34 μμ | | Suspending computation - CPU is busy
7/9/2017 8:43:16 μμ | | Resuming computation
7/9/2017 8:43:17 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:43:26 μμ | | Suspending computation - CPU is busy
7/9/2017 8:43:37 μμ | | Resuming computation
7/9/2017 8:43:43 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:43:47 μμ | | Suspending computation - CPU is busy
7/9/2017 8:43:57 μμ | | Resuming computation
7/9/2017 8:44:10 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:44:33 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:44:48 μμ | | Suspending computation - CPU is busy
7/9/2017 8:44:49 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:45:09 μμ | | Resuming computation
7/9/2017 8:45:30 μμ | | Suspending computation - CPU is busy
7/9/2017 8:45:40 μμ | | Resuming computation
7/9/2017 8:45:43 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:46:06 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:46:31 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:46:48 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:46:53 μμ | | Suspending computation - CPU is busy
7/9/2017 8:47:13 μμ | | Resuming computation
7/9/2017 8:47:14 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:47:34 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:47:49 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:48:11 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:48:34 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:48:49 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:49:14 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:49:17 μμ | | Suspending computation - CPU is busy
7/9/2017 8:49:27 μμ | | Resuming computation
7/9/2017 8:49:37 μμ | | Suspending computation - CPU is busy
7/9/2017 8:49:47 μμ | | Resuming computation
7/9/2017 8:49:48 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:49:57 μμ | | Suspending computation - CPU is busy
7/9/2017 8:50:05 μμ | SETI@home | task postponed 30.000000 sec:
7/9/2017 8:50:08 μμ | | Resuming computation
ID: 1888427 · Report as offensive     Reply Quote
Profile DarrellProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,375,441
RAC: 542
United States
Message 1888457 - Posted: 7 Sep 2017, 20:09:04 UTC

Which driver version for your card is installed?

for a test, please go to advanced view in BoincManager and click on Activity and try setting the GPU to run always instead of based on preferences. doubt this will work, but worth a try.

I think it is a driver problem, in that the driver may be to advanced for that card, which is why I asked for the version number, ex 13.1.2 or 15.7.1 etc

if the driver installed is 16.1.1 (ie a Crimson version) or higher, then that is the problem as the card is incapable of using them.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1888457 · Report as offensive     Reply Quote
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 180,973
RAC: 1,313
Greece
Message 1888492 - Posted: 7 Sep 2017, 22:56:42 UTC - in response to Message 1888457.  

Thanx for the feedback,
1. I have already tried to set GPU to always run, but the problem remains the same,

2. Drivers:
driver version: 15.20.1062.1004-150803a1-187674C
Catalyst version 15.7.1
2D version 8.01.01.1500
Direct3D 9.14.10.01128
OpenGL 6.14.10.13399
Mantle 9.1.10.0077
Mantle API not available
AMD Catalyst Control Center 2015.0804.21.41908

3. I just finished computation of WU 2668388672 (http://setiathome.berkeley.edu/workunit.php?wuid=2668388672) which is using the app SETI@home v8 v8.22 (opencl_ati_cat132) windows_intelx86 with the GPU

As far as i can tell all the failed WU were using the "SETI@home v8 v8.22 (opencl_ati5_cat132) windows_intelx86" app and i also have one task with "SETI@home v8 v8.22 (opencl_ati5_SoG_cat132) windows_intelx86" app which is exhibiting the same postponed behaviour (it will eventually fail).
Is there any way to block these tasks from downloading and download only "SETI@home v8 v8.22 (opencl_ati_cat132) windows_intelx86" tasks?

thank you for your time and patience

[/img]
ID: 1888492 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8894
Credit: 115,369,168
RAC: 70,245
Australia
Message 1888518 - Posted: 8 Sep 2017, 2:07:19 UTC - in response to Message 1888492.  
Last modified: 8 Sep 2017, 2:21:44 UTC

Thanx for the feedback,
1. I have already tried to set GPU to always run, but the problem remains the same,

In your account under Computing Preferences, Make sure
Use at most 100 % of the CPUs
Use at most 100 % of CPU time
are both at 100%
Under When to suspend, make sure "Suspend when computer is in use" is not selected.
make sure "Suspend GPU computing when computer is in use" is not selected.

And I would set "Suspend when non-BOINC CPU usage is above%" to blank- don't suspend when other programmes are running. If you really feel the need to do so, then set it at something like 90%.
With your current setting, opening your email programme will stop processing. Closing your email programme will stop processing. Opening your browser will stop processing, just clicking on some links will stop processing. Opening Windows Explorer would most likely stop processing.
EDIT- if you are using local preferences, you either need to change the values there, or do them on the web page & click on "Use web Prefs" in the BOINC Manager.


Looking at the readme file for the Lunatics installer, it appears for your hardware driver version 11.12 is the minimum requirement, driver restarts with v13.1 are generally sorted by upgrading to 13.4
I would suggest trying one of the older drivers, as running the problem applications with your hardware appear as though it should be possible without issue.
If not, I wouldn't bother with GPU processing using that video card as the processing time for the WU that did validate is over 3 hours. Even on the CPU it would probably result in a faster processing time.
Grant
Darwin NT
ID: 1888518 · Report as offensive     Reply Quote
GeneProject Donor

Send message
Joined: 26 Apr 99
Posts: 92
Credit: 13,027,665
RAC: 22,736
United States
Message 1888644 - Posted: 8 Sep 2017, 16:34:43 UTC

Re: GPU tasks "postponed" ...
Linux 4.9, Boinc 7.6.33, setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_SoG
Nvidia Linux driver 375.66

I am seeing a lot of events, logged in the stderr file as:
ERROR: OpenCL Kernel/call clGetEventProfilingInf call failed (-7) in file \
../../src/GPU_lock.cpp near line 550

Waiting 30 seconds before restart...

The task goes to "postponed" state when this happens; during that time, of course,
boincmgr starts another GPU task to fill the "empty" time. The "postponed" task,
after the 30-second delay, goes to "waiting to run" state. When its turn to run
comes around it restarts from the last checkpoint (which is likely t=0 for a newly
started work unit) and may, BUT NOT ALWAYS, encounter the same error. I have
saved a copy of a work unit that did this many times but eventually completed and
was validated. [27mr08an.12676.23385.7.34.199] but probably scrolled out of the
database by now. I am running that work unit in benchmarkV3 now, with other
application versions (8.00 cpu, 8.10_sah, 8.10_SoG) and I'll post to this thread if I
discover anything useful...
For my 750Ti a task runs about 14 seconds before the error, and with no checkpoint
saved it restarts from the beginning. On one occasion I had a task encounter this
error at 12+ minutes (90% progress) but resumed from its last checkpoint when it
ran again and completed (apparently) normally.

I have not seen this error in the 8.10...SoG version and I only recently began using
the 8.22...SoG version. So, I think it is strictly an 8.22 thing, maybe a particular
Nvidia driver issue(?).
ID: 1888644 · Report as offensive     Reply Quote
GeneProject Donor

Send message
Joined: 26 Apr 99
Posts: 92
Credit: 13,027,665
RAC: 22,736
United States
Message 1888665 - Posted: 8 Sep 2017, 18:28:33 UTC

Re: Benchmark on work unit that is "postponed"... due to clGetEventProfilingInf ERROR

O.K., ran the work unit mentioned in the previous post in benchmarkV3 with these results:
(Noted that the work unit IS the 4-bit (720K file size) format.)
(NO command line options for benchmark runs. Sort of eliminates that as a potential
trouble-maker.)
  Benchmark run   /     Result
     1                     10 seconds elapsed time,  ERROR  as described previously
     2                        9 seconds elapsed time,  ERROR  as described previously
     3                      10 seconds elapsed time,  ERROR  as described previously
     4                   861 seconds elapsed time,  normal completion,  Q=99.99%  ref.app 8.00 cpu

Hopefully this gives some insight to those who are more knowledgeable about the inner
workings of the "cl" drivers and kernels.
ID: 1888665 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2638
Credit: 48,823,770
RAC: 137,388
Australia
Message 1888667 - Posted: 8 Sep 2017, 19:06:56 UTC - in response to Message 1888427.  

. . Hi EOL

. . Since your log shows the repeated message "CPU is busy" you might look for something that is running on, and hogging, the CPU causing the task to become suspended. If you are running a virus checker then that might be the culprit. You might try stopping and restarting the virus checker. I have had one do something similar.

Stephen

.
ID: 1888667 · Report as offensive     Reply Quote
Profile Bill GProject Donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 653
Credit: 111,819,504
RAC: 101,096
United States
Message 1888706 - Posted: 8 Sep 2017, 22:35:02 UTC - in response to Message 1888427.  
Last modified: 8 Sep 2017, 22:35:51 UTC

Gentlemen,
...............
below is a copy of the event diary do you notice anything strange?


7/9/2017 8:35:18 μμ | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.15063.00)
7/9/2017 8:35:18 μμ | | Memory: 4.00 GB physical, 8.00 GB virtual
..................
7/9/2017 8:35:24 μμ | | don't use GPU while active
7/9/2017 8:35:24 μμ | | suspend work if non-BOINC CPU load exceeds 25%
..................:
7/9/2017 8:41:16 μμ | | Suspending GPU computation - computer is in use
....................


I would like to suggest that it is best with W10 to have a bit more memory than you do. It will run with your 4G, it would run smoother with a bit more.
Second:You are telling the system to suspend when you hit only 25%. I would eliminate that restriction and see what happens. That would certainly cause the Suspension.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1888706 · Report as offensive     Reply Quote
Profile DarrellProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,375,441
RAC: 542
United States
Message 1888757 - Posted: 9 Sep 2017, 1:53:42 UTC - in response to Message 1888706.  

Hi Bill, I didn't notice that, but the operating installed and is working. So what we are dealing with here is an old two core system with an older GPU that has just 1gb or 1024MB of video memory of which 991MB is available. The missing 33MB is probably being used to drive the monitor. We have an old system trying to run a new bloated operating system, doable, but not smoothly. From his setting of the preferences of limiting processing used by Boinc and by not using the GPU when the computer is in use, it is obvious that the system is not to be a dedicated cruncher, and our task is to help him get it running smoothly.

Looking at what the Boinc system (his computer and the Seti server) has sent his computer we can see that the CPU task for SETI@home v8 v8.08 (alt) windows_x86_64 completed in 189.6 minutes or 3.16 hours of running time. It was started and stopped many times due to the preference settings, but it completed and validated. He also has a SETI@home v8 v8.05
windows_x86_64 CPU task that hasn't returned yet but should be ok. The problem he originally posted about is the GPU tasks that were sent and are failing. He has many tasks for the SETI@home v8 v8.22 (opencl_ati5_cat132) windows_intelx86 all of which are failing and are the tasks that are generating the postponed for 30 secs tasks. The line in the task results: ERROR: OpenCL kernel/call 'Enqueueing kernel:pc_triplet_find_cl' call failed (-54) in file ..\analyzePoT.cpp near line 1393 indicates that the task is running out of OpenCL memory.

I know this since upgrading to an MSI RX480 Armor 8gb OC card when playing with the -sbs setting in the task command line file if I set it too high I get the same sort of message. My card has 8gigs of video memory, but because the Seti OpenCL apps are only 32-bit apps only 4 gigs is seen by the apps and AMD limits the OpenCL amount to 3 gigs. On my card any -sbs setting of 1536 or higher causes the task to be postponed due to the lack of memory, just like EOL's.After 100 postponements, the Boinc fails the task and it returns with the Exit Status: -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS

He has also been sent some SETI@home v8 v8.22 (opencl_ati_cat132) windows_intelx86 GPU tasks that are running ok and completing. So he could do nothing and eventually the Boinc system will figure out not to send the ati5-cat132 tasks and only send the ati-cat132 tasks. It may be possible to figure the correct command line parameters to put into the ati5-132 command line file to allow them to get to run.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1888757 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8894
Credit: 115,369,168
RAC: 70,245
Australia
Message 1888760 - Posted: 9 Sep 2017, 2:10:27 UTC - in response to Message 1888757.  

From his setting of the preferences of limiting processing used by Boinc and by not using the GPU when the computer is in use, it is obvious that the system is not to be a dedicated cruncher, and our task is to help him get it running smoothly.

And having a more sensible value for those settings would go a long way to helping things along.
As I mentioned in a previous post, "Suspend when non-BOINC CPU usage is above%" would be best left blank, and make sure "Suspend GPU computing when computer is in use" is not selected would be the best point to start from. If those settings actually impact on system usability, then setting the "Suspend when non-BOINC CPU usage is above%" to 90% would be the first thing to try.
I've run BOINC Seti on several systems over the years, some with plenty of resources, others barely any- and it's only when playing Videos that I need to suspend GPU processing to get smooth video playback. CPU crunching has never had an impact as the application process level is set so low it gives way to pretty much everything else when it needs CPU time.
Grant
Darwin NT
ID: 1888760 · Report as offensive     Reply Quote
Profile DarrellProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 Mar 03
Posts: 267
Credit: 1,375,441
RAC: 542
United States
Message 1888777 - Posted: 9 Sep 2017, 6:27:38 UTC - in response to Message 1888760.  
Last modified: 9 Sep 2017, 6:56:08 UTC

Yes, Grant, you are right that having sensible preference settings will make things smoother, even on his limited resource computer. But EOL is also having a problem with the ati5-cat132 app, in that it doesn't have enough memory on the video card with the app's default settings. The window's desktop manager, the console host manager, and Catalyst Control all use various amounts of video memory that take away available OpenCL memory. And since I have not used Windows 10, I wouldn't be surprised if there weren't new processes added that are also using video memory that competes with the OpenCL memory.

It is the error message in side each failed ati5-cat132 task result that shows the graphics card under Windows 10 does not have enough memory to run the ati5-cat132 app. Each "postponed for 30 secs" in the Event Log is caused by a different ati5-cat132 task. After 100 postponements the app fails the task and returns it with an error status of "too many exits". Boinc does not switch between GPU tasks every X number of minutes like it does with CPU tasks. A GPU task runs until it finishes, or fails, or in this case, is postponed.

It might be possible to get the ati5-cat132 app to run by playing with the -sbs setting in the command line file, but even if it does, and with adding some of the other tuning parameters, I don't think we can get it to be faster than the new Seti CPU app. This is shown in EOL's results listing in that the ati-cat132 GPU app task, that completed successfully, took about 14,000 seconds to run while the new Seti CPU app task only took 11,000 seconds. Until EOL can upgrade the GPU, a decision has to be made, to run two fast CPU tasks, or one fast CPU task and one not-so-fast GPU task.
... and still I fear, and still I dare not laugh at the Mad Man!

Queen - The Prophet's Song
ID: 1888777 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8894
Credit: 115,369,168
RAC: 70,245
Australia
Message 1888784 - Posted: 9 Sep 2017, 7:17:32 UTC - in response to Message 1888777.  

Until EOL can upgrade the GPU, a decision has to be made, to run two fast CPU tasks, or one fast CPU task and one not-so-fast GPU task.

Which would be 2 CPU, no GPU IMHO.
Not only would it process the work faster, but use a lot less power to do so (even though the video card uses bugger all, it takes so long to crunch it uses more power to do each WU).
Grant
Darwin NT
ID: 1888784 · Report as offensive     Reply Quote
eol

Send message
Joined: 25 Oct 11
Posts: 8
Credit: 180,973
RAC: 1,313
Greece
Message 1889001 - Posted: 10 Sep 2017, 8:58:11 UTC

Dear all,
I would like to thank you for the time dedicated to the issues i am encountering.
Nonetheless, i feel the urge to clear some points raised: as some of you pointed out my intention was to use BOINC related projects as was intended initially: i.e. Use the computer when idle and not create a supercomputer crunching numbers 24/7.
This is a completely different discussion which will take us away from the problems i am encountering (for example: are you really contributing to combat climate change when you leave a pc always on? has anyone calculated the cost/benefit of leaving always a computer on? what are the CO2 emmisions? etc etc).

Unfortunately, upgrading any part of the PC is out of the question right now since a new GPU would require a new PSU, more memory would require changing the MB etc. etc. with more environmental implications on the way and frankly the system as it is covers our home needs (web surfing, word processing, limited video editing & some older games). So, all in all, will i really contribute to make this world a better place if i trash a working computer that covers my needs in order to be able to crunch numbers?

Although I agree with the points raised regarding % of cpu usage and the other BOINC settings i must add that this is not my issue:
My issue is that the GPU tasks mentioned above are failing no matter what the settings are while no problems were encountered when the system was resurected and put together aprox. two months ago.

For the record i never had any problems running BOINC on my work notebook with 4Gb of RAM, no dedicated GPU and Win10. Regarding this particular system which is having the problems all i can tell you is that the dedicated GPU memory was never near the top even when it was crunching GPU tasks.

Sorry for the "detour" but i thought it would be better to be clear on this.
Hopefully BOINC will figure out eventually which tasks can the specific pc handle or the problems will disappear as they came (i am dreaming right?)
ID: 1889001 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8894
Credit: 115,369,168
RAC: 70,245
Australia
Message 1889005 - Posted: 10 Sep 2017, 9:51:08 UTC

None of the suggestions were about creating a super 24/7 cruncher, just to make better use of what you have available.
The fact is that in most case even when a computer is being used, the actual amount of CPU use is generally very low. However, with the settings you have, even that little to no use results in processing being stopped, then started, then stopped, then started, then stopped, then started, etc. And if processing is stopped less than 1 minute after the last time it was stopped, the processing that was done is lost as the default for check points is 60 seconds.
Setting the "Suspend work if non-BOINC CPU load exceeds 25%" to a more reasonable 85%, and not suspending processing while the system is in use just make better use of the system, while you have it on, for however many hours that might be. No need to leave it running 24/7; just put it to use while it is on.

If your concern is about power usage, then you would be better off disabling the use of that GPU all together- for as little power as it does consume, it takes so long to process work that it uses more power than the CPU would to process a WU.
Grant
Darwin NT
ID: 1889005 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,460,951
RAC: 24,345
United Kingdom
Message 1889022 - Posted: 10 Sep 2017, 11:29:28 UTC - in response to Message 1889001.  
Last modified: 10 Sep 2017, 11:30:13 UTC

Dear all,

My issue is that the GPU tasks mentioned above are failing no matter what the settings are while no problems were encountered when the system was resurected and put together aprox. two months ago.



What has changed in the last two months? If I understand your comments correctly GPU tasks used to work now they do not. If so what has changed, e.g. are you using graphics intensive applications now, installed a new graphics driver etc.
ID: 1889022 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,460,951
RAC: 24,345
United Kingdom
Message 1889025 - Posted: 10 Sep 2017, 11:37:29 UTC - in response to Message 1889022.  
Last modified: 10 Sep 2017, 11:39:14 UTC


What has changed in the last two months? If I understand your comments correctly GPU tasks used to work now they do not. If so what has changed, e.g. are you using graphics intensive applications now, installed a new graphics driver etc.


Looking at your stats I see you are using Rosetta as well, is this a new project on this PC? Please try suspending Rosetta as it has a graphics display and maybe this is preventing SETI running on your GPU.
ID: 1889025 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : GPU tasks failing


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.