GPU task stuck - cannot process anymore GPU work

Message boards : Number crunching : GPU task stuck - cannot process anymore GPU work
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888225 - Posted: 6 Sep 2017, 23:13:10 UTC

I have an odd state: BOINC Manager reports GPU task as 100% progress but status of aborted. If I select this task the only option I have available is Suspend. Any ideas how to clear this task please as it is stopping any more GPU work.
ID: 1888225 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888272 - Posted: 7 Sep 2017, 4:00:45 UTC - in response to Message 1888225.  

I have an odd state: BOINC Manager reports GPU task as 100% progress but status of aborted. If I select this task the only option I have available is Suspend. Any ideas how to clear this task please as it is stopping any more GPU work.

Exiting & restarting BOINC doesn't help? Manually clicking on Update?
Grant
Darwin NT
ID: 1888272 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888297 - Posted: 7 Sep 2017, 5:34:16 UTC - in response to Message 1888272.  
Last modified: 7 Sep 2017, 5:34:53 UTC

Unfortunatley restarting BOINC does not help. Running update is a little odd, first the work unit disappears and then it reappears. Looks like I will have to do a project reset which would mean losing my stats in BOINC manager. Anybody have any other suggestions please?
ID: 1888297 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888305 - Posted: 7 Sep 2017, 5:59:35 UTC

And when suspended, there's still no option to abort?

I'd suggest exiting BOINC (again), restart it, then post the messages from the Event log here. Generally you are only blocked from getting new work if current work is suspended, or if you have uploads & downloads accumulating.
Are other WUs completing & being reported without issue?
Grant
Darwin NT
ID: 1888305 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888318 - Posted: 7 Sep 2017, 6:43:02 UTC - in response to Message 1888305.  

Had a close look at the event log, I think BOINC Manager is down loading new GPU work one unit at a time and instantly aborting them which explains the effect I am seeing when doing an update where aborted disappears and then reappears.

07/09/2017 00:18:14 | SETI@home | Sending scheduler request: To fetch work.
07/09/2017 00:18:14 | SETI@home | Reporting 1 completed tasks
07/09/2017 00:18:14 | SETI@home | Requesting new tasks for NVIDIA GPU
07/09/2017 00:18:17 | SETI@home | Scheduler request completed: got 1 new tasks
07/09/2017 00:18:17 | SETI@home | [error] Missing coprocessor for task 16jn08ab.30762.83900.6.33.16_1; aborting

What can be causing the missing coprocesser error message?
ID: 1888318 · Report as offensive     Reply Quote
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1824
Credit: 107,626,093
RAC: 461,462
Canada
Message 1888319 - Posted: 7 Sep 2017, 6:59:13 UTC - in response to Message 1888318.  

It seems you have 2 threads on this issue ... You moved account (data folder) to a different computer?
https://setiathome.berkeley.edu/forum_thread.php?id=81898
ID: 1888319 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888320 - Posted: 7 Sep 2017, 7:00:31 UTC - in response to Message 1888318.  
Last modified: 7 Sep 2017, 7:03:04 UTC

07/09/2017 00:18:17 | SETI@home | [error] Missing coprocessor for task 16jn08ab.30762.83900.6.33.16_1; aborting
What can be causing the missing coprocesser error message?

With your computer hidden, I can only offer wild guesses.
If you're running Win10, it has a nasty habit of updating video drivers, with ones that don't work when it comes to GPU computing.
Downloading and installing the current driver from the video card GPU manufacturer's web site (NVidia) should sort it out- as long as that is the type of card actually on that system.

EDIT- posting the full log from when BOINC starts to when it first requests, and gets, and then errors out the work would be helpful.
Grant
Darwin NT
ID: 1888320 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888322 - Posted: 7 Sep 2017, 7:08:26 UTC - in response to Message 1888319.  
Last modified: 7 Sep 2017, 7:30:16 UTC

It seems you have 2 threads on this issue ... You moved account (data folder) to a different computer?
https://setiathome.berkeley.edu/forum_thread.php?id=81898


I tried to move to a different user account on the same computer, but that did not work. I wanted to run BOINC on a different user account (idea was it would be more secure) to my general user account but that did not work out as in Windows the GPU is only available to the user account on the console. I have since gone back to running BOINC on my main user account but now have a new problem. Don't think these are related as I dettached and reattached all projects in BOINC manager when I tried moving user accounts.
ID: 1888322 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888323 - Posted: 7 Sep 2017, 7:16:43 UTC - in response to Message 1888320.  
Last modified: 7 Sep 2017, 7:17:03 UTC

07/09/2017 00:18:17 | SETI@home | [error] Missing coprocessor for task 16jn08ab.30762.83900.6.33.16_1; aborting
What can be causing the missing coprocesser error message?

With your computer hidden, I can only offer wild guesses.
If you're running Win10, it has a nasty habit of updating video drivers, with ones that don't work when it comes to GPU computing.
Downloading and installing the current driver from the video card GPU manufacturer's web site (NVidia) should sort it out- as long as that is the type of card actually on that system.

EDIT- posting the full log from when BOINC starts to when it first requests, and gets, and then errors out the work would be helpful.


Alas I am running Windows 10. I have suspended new work and once the current running tasks are finished I will try a system reboot to see if this clears things up. If that fails I will update the Nvidia driver. I think all those GPU aborting messages have now killed me for the day as now getting " SETI@home | This computer has finished a daily quota of 1 tasks"
ID: 1888323 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15209
Credit: 252,573,501
RAC: 325,532
United Kingdom
Message 1888324 - Posted: 7 Sep 2017, 8:16:12 UTC

If Windows has decided to do an update then a re-boot will do nothing for your problem. Try, as has already been suggested, installing the drivers from the Nvidia website. Before doing this make sure to turn any windows driver update options OFF. For the installation do a "clean installation" which is hidden under the advanced option on one of the early pages.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1888324 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888327 - Posted: 7 Sep 2017, 9:25:11 UTC - in response to Message 1888324.  
Last modified: 7 Sep 2017, 9:25:29 UTC

If Windows has decided to do an update then a re-boot will do nothing for your problem. Try, as has already been suggested, installing the drivers from the Nvidia website. Before doing this make sure to turn any windows driver update options OFF. For the installation do a "clean installation" which is hidden under the advanced option on one of the early pages.


Rebooting has cleared the GPU aborted work unit issue. Windows has not installed any updtaes since 29th AUgust and BOINC has been working fine up until last night.

BOINC manager is now not downloading new GPU work units.

07/09/2017 10:17:58 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
07/09/2017 10:18:00 | SETI@home | Scheduler request completed: got 0 new tasks
07/09/2017 10:18:00 | SETI@home | No tasks sent
07/09/2017 10:18:00 | SETI@home | No tasks are available for AstroPulse v7
07/09/2017 10:18:00 | SETI@home | No tasks are available for SETI@home v8
07/09/2017 10:18:00 | SETI@home | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
07/09/2017 10:18:00 | SETI@home | Tasks for Intel GPU are available, but your preferences are set to not accept them
07/09/2017 10:18:00 | SETI@home | This computer has finished a daily quota of 1 tasks
07/09/2017 10:18:00 | SETI@home | This computer has reached a limit on tasks in progress


What do the line items in red mean please? I can see work units available on the server status web page.
ID: 1888327 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15209
Credit: 252,573,501
RAC: 325,532
United Kingdom
Message 1888329 - Posted: 7 Sep 2017, 9:42:54 UTC

Basically the bits you have highlighted are the result of having had so many errors in the last few hours the servers have decided that your computer has a problem, so to prevent you trashing any more they have cut your ration to a very low number.
If you have any tasks in progress (with your computer hidden we can't see the top level status of your work queues) as they complete you will find that you start to get new work. Additionally, in a few hours (I can't remember how many) the restriction will start to lift and more work will flow in your direction.

Thanks for highlighting the test you are asking about, it makes life so much easier to see you problem - some folks just send a several hundred line dump and expect those trying to help to find the line in question!
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1888329 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888330 - Posted: 7 Sep 2017, 9:47:27 UTC - in response to Message 1888327.  

Rebooting has cleared the GPU aborted work unit issue.

Maybe.
Won't know for sure till it downloads another WU & processes it.
Grant
Darwin NT
ID: 1888330 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888331 - Posted: 7 Sep 2017, 9:59:29 UTC - in response to Message 1888330.  

Rebooting has cleared the GPU aborted work unit issue.

Maybe.
Won't know for sure till it downloads another WU & processes it.


Very true, I'll have to wait a while before SETI@home will send me any more GPU work units. Plenty of CPU work units to keep me going.

I'll post an update when I get my next GPU work unit.
ID: 1888331 · Report as offensive     Reply Quote
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49057
Credit: 878,615,446
RAC: 201,699
United States
Message 1888333 - Posted: 7 Sep 2017, 10:03:56 UTC - in response to Message 1888331.  

Rebooting has cleared the GPU aborted work unit issue.

Maybe.
Won't know for sure till it downloads another WU & processes it.


Very true, I'll have to wait a while before SETI@home will send me any more GPU work units. Plenty of CPU work units to keep me going.

I'll post an update when I get my next GPU work unit.

If I am not mistaken, you will get one per day until 10 consecutive good WUs are returned.
Unless you have a bunch in pending, in which case the first 10 that validate would do the trick as well.
I think that's how it works.
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1888333 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888334 - Posted: 7 Sep 2017, 10:10:52 UTC - in response to Message 1888333.  

If I am not mistaken, you will get one per day until 10 consecutive good WUs are returned.
Unless you have a bunch in pending, in which case the first 10 that validate would do the trick as well.
I think that's how it works.

Pretty sure the 10 valid WUs is for the runtime estimates when you first start crunching.

For the limits due to errors, 1 valid WU gets your daily quota doubled to 2, another good WU gets that doubled to 4, and so on.
Basically you have to clobber a lot of WUs to get down to that limit, but if you've got a few Pending, it doesn't take long to lift that limit if the work you do get is Validated.
Grant
Darwin NT
ID: 1888334 · Report as offensive     Reply Quote
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49057
Credit: 878,615,446
RAC: 201,699
United States
Message 1888335 - Posted: 7 Sep 2017, 10:12:13 UTC - in response to Message 1888334.  

If I am not mistaken, you will get one per day until 10 consecutive good WUs are returned.
Unless you have a bunch in pending, in which case the first 10 that validate would do the trick as well.
I think that's how it works.

Pretty sure the 10 valid WUs is for the runtime estimates when you first start crunching.

For the limits due to errors, 1 valid WU gets your daily quota doubled to 2, another good WU gets that doubled to 4, and so on.
Basically you have to clobber a lot of WUs to get down to that limit, but if you've got a few Pending, it doesn't take long to lift that limit if the work you do get is Validated.

Ahh....you might be correct. Been a long long time since I had to think about such things here.
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1888335 · Report as offensive     Reply Quote
Profile David@home
Volunteer tester

Send message
Joined: 16 Jan 03
Posts: 680
Credit: 2,424,704
RAC: 24,850
United Kingdom
Message 1888364 - Posted: 7 Sep 2017, 12:37:51 UTC - in response to Message 1888335.  

BOINC manager keeps downloading one new CPU workunit each time a CPU workunit finishes.

Maybe it is just by chance they are CPU work units or is the scheduler clever enough to differenitate GPU and CPU work units. I.e. am I still being blocked for GPU work units?

I have lots of CPU work units so I have put SETI@home into "do not allow new tasks" mode until a large number of CPU work units have been cleared. Maybe when I then allow work units again I will a GPU work unit?
ID: 1888364 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15209
Credit: 252,573,501
RAC: 325,532
United Kingdom
Message 1888366 - Posted: 7 Sep 2017, 12:42:32 UTC

It doesn't always work like that :-(
Yo may have to wait until tomorrow before you get any tasks for the GPU. If possible, watch the first few very carefully (I know that it is not always possible to do so due to other commitments.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1888366 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,256,325
RAC: 70,810
Australia
Message 1888375 - Posted: 7 Sep 2017, 13:26:00 UTC - in response to Message 1888364.  

I have lots of CPU work units so I have put SETI@home into "do not allow new tasks" mode until a large number of CPU work units have been cleared. Maybe when I then allow work units again I will a GPU work unit?

Best just to let it return and receive work as it can get it. Generally the more you fiddle, the longer it takes things to settle down.
Once again, having your computers hidden makes things difficult to help with.
If you've got plenty of Pending or some Inconclusive work, as they Validate then the number of WUs you can receive will increase.
Grant
Darwin NT
ID: 1888375 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : GPU task stuck - cannot process anymore GPU work


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.