inconsistent gpu usage

Questions and Answers : GPU applications : inconsistent gpu usage
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1773945 - Posted: 25 Mar 2016, 15:59:40 UTC
Last modified: 25 Mar 2016, 16:02:53 UTC

Well Ive developed another problem on this computer
https://setiathome.berkeley.edu/show_host_detail.php?hostid=7960082

My gpu usage will drop to 0 percent and at the same time the clock speed on all 4 cards will drop to 157. The usage will usally pick back up between 5-20 seconds but sometimes a single card at random will not.

I'll log in to team viewer to find a gpu has been crunching a single task for 3 hours the whole time with 8min to go, with the gpu utilization at 0% and a clock speed of 157mhz.

So far ive tried

Disabling ULPS by editing the registry

Reseating and cleaning the connectors of the card with 97% isopropyl alcohol

Reinstalling the drivers

Reinstalling Boinc

Reinstalling Lunatics

Resetting my project.

Enabling CrossfireX

Disabling CrossfireX - physically disconnecting the cable.

Changing power option to high performance

locking clock speed with msi after burner

Locking clock speed with CCC

I don't play video games on this computer, I downloaded them for testing purposes. Games play just fine though with no issues. Its like BOINC keeps dropping my cards. My cards seem to have no difficulty holding there speeds in other applications.




Here is my start log

3/25/2016 11:15:29 AM | | cc_config.xml not found - using defaults
3/25/2016 11:15:30 AM | | Starting BOINC client version 7.6.22 for windows_x86_64
3/25/2016 11:15:30 AM | | log flags: file_xfer, sched_ops, task
3/25/2016 11:15:30 AM | | Libraries: libcurl/7.45.0 OpenSSL/1.0.2d zlib/1.2.8
3/25/2016 11:15:30 AM | | Data directory: C:\ProgramData\BOINC
3/25/2016 11:15:30 AM | | Running under account Thomas
3/25/2016 11:16:06 AM | | CAL: ATI GPU 0: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (CAL version 1.4.1848, 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | CAL: ATI GPU 1: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (CAL version 1.4.1848, 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | CAL: ATI GPU 2: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (CAL version 1.4.1848, 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | CAL: ATI GPU 3: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (CAL version 1.4.1848, 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (driver version 1800.8 (VM), device version OpenCL 1.2 AMD-APP (1800.8), 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | OpenCL: AMD/ATI GPU 1: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (driver version 1800.8 (VM), device version OpenCL 1.2 AMD-APP (1800.8), 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | OpenCL: AMD/ATI GPU 2: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (driver version 1800.8 (VM), device version OpenCL 1.2 AMD-APP (1800.8), 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | OpenCL: AMD/ATI GPU 3: ATI Radeon HD 5800/5900 series (Cypress/Hemlock) (driver version 1800.8 (VM), device version OpenCL 1.2 AMD-APP (1800.8), 1024MB, 991MB available, 4640 GFLOPS peak)
3/25/2016 11:16:06 AM | | OpenCL CPU: AMD FX(tm)-8350 Eight-Core Processor (OpenCL driver vendor: Advanced Micro Devices, Inc., driver version 1800.8 (sse2,avx,fma4), device version OpenCL 1.2 AMD-APP (1800.8))
3/25/2016 11:16:06 AM | SETI@home | Found app_info.xml; using anonymous platform
3/25/2016 11:16:07 AM | | Host name: SETIKILLER
3/25/2016 11:16:07 AM | | Processor: 8 AuthenticAMD AMD FX(tm)-8350 Eight-Core Processor [Family 21 Model 2 Stepping 0]
3/25/2016 11:16:07 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt lwp fma4 tce tbm topx page1gb rdtscp bmi1
3/25/2016 11:16:07 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.10240.00)
3/25/2016 11:16:07 AM | | Memory: 7.95 GB physical, 9.20 GB virtual
3/25/2016 11:16:07 AM | | Disk: 1.82 TB total, 214.08 GB free
3/25/2016 11:16:07 AM | | Local time is UTC -4 hours
3/25/2016 11:16:07 AM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7960082; resource share 100
3/25/2016 11:16:08 AM | SETI@home | General prefs: from SETI@home (last modified 20-Feb-2016 19:41:09)
3/25/2016 11:16:08 AM | SETI@home | Host location: none
3/25/2016 11:16:08 AM | SETI@home | General prefs: using your defaults
3/25/2016 11:16:08 AM | | Reading preferences override file
3/25/2016 11:16:08 AM | | Preferences:
3/25/2016 11:16:08 AM | | max memory usage when active: 4070.14MB
3/25/2016 11:16:08 AM | | max memory usage when idle: 7326.26MB
3/25/2016 11:16:13 AM | | max disk usage: 214.21GB
3/25/2016 11:16:13 AM | | max CPUs used: 4
3/25/2016 11:16:13 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:21 AM | SETI@home | Sending scheduler request: To report completed tasks.
3/25/2016 11:16:21 AM | SETI@home | Reporting 3 completed tasks
3/25/2016 11:16:21 AM | SETI@home | Requesting new tasks for AMD/ATI GPU
3/25/2016 11:16:24 AM | SETI@home | Scheduler request completed: got 2 new tasks
3/25/2016 11:16:27 AM | SETI@home | Started download of 20my10aa.29524.1356.4.31.57
3/25/2016 11:16:27 AM | SETI@home | Started download of 20my10aa.29524.1356.4.31.38
3/25/2016 11:16:30 AM | SETI@home | Finished download of 20my10aa.29524.1356.4.31.57
3/25/2016 11:16:30 AM | SETI@home | Finished download of 20my10aa.29524.1356.4.31.38
[img][/img]




I'm not sure how to stop this, I only use this computer for seti at the moment and have no objections if there is a better operating system that will run the HD 5970's and handle BOINC with more stability
ID: 1773945 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1773981 - Posted: 25 Mar 2016, 18:20:04 UTC - in response to Message 1773945.  
Last modified: 25 Mar 2016, 18:24:59 UTC

Its like BOINC keeps dropping my cards.

BOINC doesn't use your GPUs. Only the project's GPU science applications use your GPUs.

But do you ever read the messages log before or after you post it?

3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file
3/25/2016 11:16:13 AM | SETI@home | [error] no project URL in task state file

These show you have problems.
Whenever BOINC starts it does a sanity check between files it handles, in this case it checks if the project and task name match between the client_state.xml file and the boinc_task_state.xml file in the different slot directories. When it finds these don't match, you'll get the above error.

This error can happen when:
1) Your computer automatically reboots upon Windows errors, while BOINC is still writing state files to disk.
2) Your computer forcibly automatically reboots after Windows Update, while BOINC is still writing state files to disk.
3) You forcibly reboot your computer, while BOINC is writing state files to disk.
4) You have hard drive / SSD problems.

Errors like these will stop the running of those tasks - as as you can imagine, running a corrupt task isn't in the best interest of anyone - until the problem has been fixed.

Coupled to your 'inconsistent GPU usage', I'd stop running BOINC for now and first run a system file checker on that system.
Start->in Search type CMD.exe and wait for Windows to show it in the pane, right-click on it, choose 'run as administrator'.
In the command line window type sfc /scannow and hit Enter.
Allow this to run to completion.

If any errors, please let me know what the error is.
You can right-click on the command line window, anywhere on the black and choose 'Mark'. Next while holding Shift, and navigating with the arrow keys you can choose all the text in this window. To copy the text, press Enter.
Then paste that text in an answer window here.

(And warning: if you get a message about the CBS log, one requires that Notepad is opened as administrator to be able to read this file, so not really needed to try to open it and post its contents here.)

Next run a complete CHKDSK on that drive: Start->in Search type CMD.exe and hit Enter. In the window that opens type chkdsk c: /f /r and hit Enter. When it tells you it cannot run the disk check at this time, but is better to run at the next reboot, press Y (and possibly Enter again).

Next exit all programs and reboot the computer. When you see the message "This disk needs to be checked for consistency" allow the timer to run out and go do that. You best go do something else now, as depending on the drive space this will take an hour or more.

When it finds errors on the drive, it will try to fix them automatically, and else mark those sectors as bad.

Allow that to run to completion and boot back into Windows. If you want to read the Chkdsk log back in Windows, follow these instructions (you're looking for the log created under Wininit).
ID: 1773981 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774011 - Posted: 25 Mar 2016, 20:57:33 UTC - in response to Message 1773981.  
Last modified: 25 Mar 2016, 21:31:08 UTC

"But do you ever read the messages log before or after you post it?"

I saw the errors, it why I posted the log. I just didn't know what they mean. I tried googling it and did a search through the forums but I'm not very good at finding things. :/

"Coupled to your 'inconsistent GPU usage', I'd stop running BOINC for now and first run a system file checker on that system."

Done,

"type sfc /scannow and hit Enter.
Allow this to run to completion.

If any errors, please let me know what the error is. "

Finished, with no errors

"type chkdsk c: /f /r and hit Enter"

Ive been stuck on scanning and repairing drive c -11% for about 2 hours. No change in % its just stuck at 11.


I guess I burned up another hard drive. Ive got the computer mounted on a test bench and the hard drive is mounded upside down under it. Ive had the HDD heat up from 40-65c. The power supply is venting its heat almost directly on top of it.

I fried my first one Monday, and added a second fan to try to cool this one better.

"test bench images"
http://imgur.com/a/WB8E7


Ive been building a new super computer case to house 3 of my 4 seti work units, the additional space for my next computer i'll be adding to the group. My computer room has been getting extremely hot. 85 degrees or so mid day and its 40f here. Ive been leaving the window open to keep the temps down but its not helping much.

http://imgur.com/a/wFuKM
[url]

I'm going to leave my unit on for another 2 hours and if it doesn't change from 11% i'm going to force power off and restart the scan.[/url]

Ive done scans like this before and I know they hang up but they don't usually take more than 4 hours to complete for me.
ID: 1774011 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774020 - Posted: 25 Mar 2016, 21:57:17 UTC - in response to Message 1774011.  

Ive been stuck on scanning and repairing drive c -11% for about 2 hours. No change in % its just stuck at 11.

This is normal. Allow it to continue, don't restart the computer.

What this check does is:
1. Check for errors on the drive.
2. Check the file indexes, etc.
3. Check the file security descriptors.
4. Check all files on the drive.
4. Check all free space on the drive.

It's stuck at 11% because you're only in part of the check, probably in checking of the files. That percentage will go up as soon as it hits doing the free allocation checking.

Don't ever reboot the system when it's doing a check disk, that can lead to disk problems fast.
ID: 1774020 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774022 - Posted: 25 Mar 2016, 22:02:56 UTC - in response to Message 1774020.  

How long should I wait then because its been stuck at 11% for nearly 4 hours now.
ID: 1774022 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774033 - Posted: 25 Mar 2016, 22:45:10 UTC - in response to Message 1774022.  
Last modified: 25 Mar 2016, 22:45:34 UTC

As long as the count of the other things is still going up, it's really not stuck.

I sat for a couple of hours looking at my HDD being at 18% and was fazed as well. But in the end it worked out. The checking of the individual files also depends on how big they are. If you have mostly files of 4KB on the drive, checking goes really fast. If they're 1GB and more, it'll go slow.
ID: 1774033 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774076 - Posted: 26 Mar 2016, 4:00:35 UTC - in response to Message 1774033.  
Last modified: 26 Mar 2016, 4:03:10 UTC

Finally finished

It only took 9 hours.



Checking file system on C:
The type of the file system is NTFS.
Volume label is ServerStorage2TB.

A disk check has been scheduled.
Windows will now check the disk.

Stage 1: Examining basic file system structure ...
375552 file records processed. File verification completed.
3135 large file records processed. 0 bad file records processed.
Stage 2: Examining file name linkage ...
440316 index entries processed. Index verification completed.
0 unindexed files scanned. 0 unindexed files recovered to lost and found.
Stage 3: Examining security descriptors ...
Cleaning up 1305 unused index entries from index $SII of file 0x9.
Cleaning up 1305 unused index entries from index $SDH of file 0x9.
Cleaning up 1305 unused security descriptors.
Security descriptor verification completed.
32383 data files processed. CHKDSK is verifying Usn Journal...
35107592 USN bytes processed. Usn Journal verification completed.

Stage 4: Looking for bad clusters in user file data ...
375536 files processed. File data verification completed.

Stage 5: Looking for bad, free clusters ...
55850291 free clusters processed. Free space verification is complete.
CHKDSK discovered free space marked as allocated in the volume bitmap.

Windows has made corrections to the file system.
No further action is required.

1953410047 KB total disk space.
1729355180 KB in 188628 files.
116448 KB in 32384 indexes.
0 KB in bad sectors.
537251 KB in use by the system.
65536 KB occupied by the log file.
223401168 KB available on disk.

4096 bytes in each allocation unit.
488352511 total allocation units on disk.
55850292 allocation units available on disk.

Internal Info:
00 bb 05 00 5f 5f 03 00 70 88 06 00 00 00 00 00 ....__..p.......
83 18 00 00 2f 00 00 00 00 00 00 00 00 00 00 00 ..../...........

Windows has finished checking your disk.
Please wait while your computer restarts.



_______________________________________________________________________

It will take me a bit to see if Boinc still is messed up as its not constant.


Well all that down time waiting on the scan gave me some time to work on my cluster case.

ID: 1774076 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774153 - Posted: 26 Mar 2016, 14:44:22 UTC - in response to Message 1774076.  


Stage 3: Examining security descriptors ...
Cleaning up 1305 unused index entries from index $SII of file 0x9.
Cleaning up 1305 unused index entries from index $SDH of file 0x9.
Cleaning up 1305 unused security descriptors.
Security descriptor verification completed.

Keep any eye out for these. If you see it happen more often that there's a check disk happening when you boot the computer or reboot the computer, where Windows seems to be throwing away tons of index entries and you do notice that it's throwing away program files as well, then you may have:
1) a corrupt Windows installation.
2) a corrupt hard drive (file allocation table).

1) can be fixed by reinstalling Windows, perhaps through a repair (in-place) install. If this is an Windows 10 update - from a previous Windows version - you may want to go this route. An in-place repair installation will leave your user account and all installed programs intact, just clean install Windows. You do need a Windows DVD or ISO on a USB stick for this.

2) can be fixed by a reformat. Problem is you'll lose Windows and have to clean install it afterwards. If problems still continue, a new hard drive is needed.
ID: 1774153 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774271 - Posted: 26 Mar 2016, 22:28:17 UTC - in response to Message 1774153.  
Last modified: 26 Mar 2016, 22:32:05 UTC

Something is still not right.

I have picture this time.



While I was sleeping device(0) went to 0% usage with a core clock of 157mhz and didn't jump back up to 725 and pick the task back up.

Notice other tasks keep dropping to 0% usage and then picking back up.

ID: 1774271 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774276 - Posted: 26 Mar 2016, 22:44:35 UTC
Last modified: 26 Mar 2016, 22:59:20 UTC

I'VE GOT IT.

I know whats doing it now.

I was watching it eating dinner and the screen flickered for a second and I got a Driver has stop responding and has been recovered message at the corner of the screen.

It looks like the driver is incompatible with seti?
Ive reinstalled it multiple times and have verified Open CL is defiantly working.

I'm running 15.7.1

It there a more stable OS I can set this up in or am I going to have to keep playing with drivers until boinc magically plays nice?
I don't game and I don't do anything else other than seti@home on it.
If there is a operating system and driver combo I can try I'm more than willing to do it.

ID: 1774276 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774294 - Posted: 26 Mar 2016, 23:22:08 UTC - in response to Message 1774276.  

It looks like the driver is incompatible with seti?

Not necessarily. The driver can stop responding due to a myriad of possibilities. One simple fix I find when it does that is to increase the time that the timeout detection and recovery takes.

Go to Start and type regedit in the Search box. In the results, double-click regedit.exe.

Browse to and then click "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers".

Click New in the Edit menu list. From the drop-down menu, select QWORD (64-bit) value for the registry value for 64 bit Windows; for 32 bit Windows, select DWORD (32-bit) value. Then type TdrDelay as the Name and click Enter. Double-click TdrDelay, add 8 for the Value data and click OK.

This allows the timeout detection and recovery to take 8 seconds to try to recover the GPU, instead of the default 2 seconds.
ID: 1774294 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774301 - Posted: 26 Mar 2016, 23:43:08 UTC - in response to Message 1774294.  

Like this?



I restarted Boinc and it still crashed.

I'm going to reboot.
ID: 1774301 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774302 - Posted: 26 Mar 2016, 23:48:27 UTC - in response to Message 1774301.  

Are you absolutely sure all GPUs are OK?
That none of them have a hardware problem? Or that the motherboard can run 4 of these GPUs at the same time, without it needing a new BIOS update (for instance)?

Is it always the same GPU that starts having this problem?
Is it always the same GPU which has the driver crash?

Are these GPUs also used for the monitor?
ID: 1774302 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774317 - Posted: 27 Mar 2016, 0:16:57 UTC - in response to Message 1774302.  
Last modified: 27 Mar 2016, 0:39:47 UTC

"Are you absolutely sure all GPUs are OK?"

Should be, they pass when stress testing them with furmark.


"the motherboard can run 4 of these GPUs at the same time"

I'm running a MSI 970 motherboard and quad fire seems to work with other applications.- I do have quad fire physically disabled right now though.



"Is it always the same GPU which has the driver crash? "

Not always but its mainly device zero



"Are these GPUs also used for the monitor?"

Yes device zero is also the gpu more than likely responsible for the monitor display.


________________________________________________________________

Upon rebooting





_________________________________________________________________

I'm going to try this

How to disable ULPS:1.Regedit > select HKEY_LOCAL_MACHINE - > SYSTEM

2.Hit Ctrl+F (= find), type in
enableulps
and hit Enter

3.each instance you find (F3 for the next search), change the value from 1 to 0.

4.Reboot the PC for the changes to take effect.


_______________________________________________________________

Well it seem to last longer before it crashed but still failed in the end.

Back to the beginning I suppose.
ID: 1774317 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774362 - Posted: 27 Mar 2016, 5:04:51 UTC
Last modified: 27 Mar 2016, 5:15:07 UTC

Ok So I moved my system to my cluster case to test for fitment.


I hooked everything up and did a complete clean install of the driver. I tried 15.7, I was running 15.7.1

I hooked the crossfire cable back up to the gpus and enabled quadfire.

all 4 gpus hovered around the 380mhz range for a minute or so then the driver crashed and device device(0) went to 157mhz and the other 3 gpus went up to 725mhz.

I switched the video cards when I moved the system. So the first card is now the second card and vice versa, so its not the gpu's that's causing the issues.

With the cards switched and device(0) still being the one crashing It has to be something else i'm overlooking.

Device (0) is more than likely my display port i'm using for my monitor.

Could BOINC not want to play nicely with my drivers and my primary display?
ID: 1774362 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774415 - Posted: 27 Mar 2016, 10:57:34 UTC

EUREKA

I seem to have fixed it.

I installed Windows 7 Ultimate and driver 14.12 enabled quadfire and boom.

Its been up for an hour so far with no driver crashing and all 8 tasks running perfect, actually my estimated times for my gpu tasks have gone down by 2 minutes?

[/list]
ID: 1774415 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774424 - Posted: 27 Mar 2016, 12:27:26 UTC - in response to Message 1774415.  

So Windows 10 isn't all that good.
Perhaps it cannot work with 4 GPUs.

But here's another thought, if it's always device 0 that does it, no matter which of the GPUs is device 0, you may be looking at the slot the videocards are in. This for future reference if it starts happening again -knocks wood it won't. :)
ID: 1774424 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774498 - Posted: 27 Mar 2016, 18:20:41 UTC - in response to Message 1774424.  
Last modified: 27 Mar 2016, 18:50:44 UTC

Well I forgot to knock on wood.

Dammit



I'm not sure if the video driver is crashing since I didn't see it this time.

3 of the 4 tasks dropped to 0% usage this time.


_________________________________________________



Is my GPU usage suppose to look like this?

My GPU's aren't crashing right away, ive been sitting here at least 30 minutes and they are chugging along nicely.

Could my PSU be the problem?

I tried enabling OC genie in the bois but the computer black screens under load, so Ive been running the FX-8350 at stock clocks

Its a 900w off brand psu I bought off ebay.
ID: 1774498 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1774511 - Posted: 27 Mar 2016, 18:50:15 UTC - in response to Message 1774498.  
Last modified: 27 Mar 2016, 18:50:28 UTC

a) GPU heat?
b) motherboard, around slot 0, how do the capacitors on there look?
c) the motherboard has 2x PCIe x16 and 2x PCIe x1 slots, plus 2x PCI. How did you fit the 4 GPUs in there? Are they on all the PCIe slots, or did you use risers?
d) could it be an interrupt problem?
ID: 1774511 · Report as offensive
ThomasRiley

Send message
Joined: 19 May 13
Posts: 31
Credit: 1,434,222
RAC: 0
United States
Message 1774516 - Posted: 27 Mar 2016, 19:01:32 UTC - in response to Message 1774511.  
Last modified: 27 Mar 2016, 19:02:41 UTC

a) GPU heat?

Fans always are at 100%
----Card A 64c max - haven't repasted it yet
----Card B 55c max - Repasted with Artic MX4

b) motherboard, around slot 0, how do the capacitors on there look?

Solid black caps, nothing looks out of place, no bad smells, or the high pitch noise a cap makes when its leaking.

c) the motherboard has 2x PCIe x16 and 2x PCIe x1 slots, plus 2x PCI. How did you fit the 4 GPUs in there? Are they on all the PCIe slots, or did you use risers?

They are HD 5970's Dual GPU boards





My gtx 980 top and a HD 5970 bottom.


d) could it be an interrupt problem?

Not sure how to I check?
ID: 1774516 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : inconsistent gpu usage


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.