-9 result overflow


log in

Advanced search

Message boards : Number crunching : -9 result overflow

Author Message
Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,078,124
RAC: 51,599
India
Message 1310995 - Posted: 28 Nov 2012, 11:46:45 UTC

After the scheduler crash, one of my rigs has now slowly started to receive GPU work, and I have observed that one of the GPU has been consistently completing the work in about 15 seconds with -9 result overflow, as seen from the task list. The other seems to be crunching fine. Would it mean that there is something wrong in the hardware or has it got to do with the settings? Any help would be greatly appreciated.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,698,563
RAC: 20,014
United Kingdom
Message 1311005 - Posted: 28 Nov 2012, 12:51:12 UTC - in response to Message 1310995.

After the scheduler crash, one of my rigs has now slowly started to receive GPU work, and I have observed that one of the GPU has been consistently completing the work in about 15 seconds with -9 result overflow, as seen from the task list. The other seems to be crunching fine. Would it mean that there is something wrong in the hardware or has it got to do with the settings? Any help would be greatly appreciated.

Probably neither. After a crash, the GPU can be left in an inconsistent state, with memory corruption.

The best thing to do is to reboot the computer, as soon as possible. That normally restores normal service.

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,078,124
RAC: 51,599
India
Message 1311013 - Posted: 28 Nov 2012, 13:22:13 UTC - in response to Message 1311005.


Probably neither. After a crash, the GPU can be left in an inconsistent state, with memory corruption.

The best thing to do is to reboot the computer, as soon as possible. That normally restores normal service.


I hope that is indeed the case. I have rebooted the system, now to wait and see. Thanks Richard.

Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,078,124
RAC: 51,599
India
Message 1311014 - Posted: 28 Nov 2012, 13:22:15 UTC - in response to Message 1311005.
Last modified: 28 Nov 2012, 13:22:35 UTC

Double post, please ignore.

N9JFE David S
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 10739
Credit: 13,446,407
RAC: 13,855
United States
Message 1311035 - Posted: 28 Nov 2012, 14:31:48 UTC - in response to Message 1311005.

After the scheduler crash, one of my rigs has now slowly started to receive GPU work, and I have observed that one of the GPU has been consistently completing the work in about 15 seconds with -9 result overflow, as seen from the task list. The other seems to be crunching fine. Would it mean that there is something wrong in the hardware or has it got to do with the settings? Any help would be greatly appreciated.

Probably neither. After a crash, the GPU can be left in an inconsistent state, with memory corruption.

The best thing to do is to reboot the computer, as soon as possible. That normally restores normal service.

Uh, Richard, he said the Scheduler crash, not his rig.

Restarting won't hurt, though.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Profile Vipin Palazhi
Avatar
Send message
Joined: 29 Feb 08
Posts: 247
Credit: 103,078,124
RAC: 51,599
India
Message 1311059 - Posted: 28 Nov 2012, 15:59:40 UTC
Last modified: 28 Nov 2012, 16:59:40 UTC

Well I tried restarting but no luck. One GPU (card 2) is going through the CUDA WUs in 15 seconds while the other (GPU 1) is crunching fine. I stuck this GPU in another rig and the display was all faded out. I am now guessing that the card itself is giving out. I have taken it 'off the production' line until I can figure this out over the weekend.

Eidt: The event log says 'no usable GPUs found'.

Message boards : Number crunching : -9 result overflow

Copyright © 2014 University of California