Panic Mode On (102) Server Problems?

Message boards : Number crunching : Panic Mode On (102) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1782588 - Posted: 26 Apr 2016, 6:34:34 UTC - in response to Message 1782547.  
Last modified: 26 Apr 2016, 6:53:17 UTC

well the card worked great up until v8 got into my queue and that's when the whole shabang started

so i'm more inclined to blame either boinc or teh lunatics app


Oh anything's possible. Boinc and applications have known quirks (though none that I' know of that manifest like that yet).

Simply confirming it behaves exactly the same way at 200 Mhz lower will confirm your suggestion that your factory overclocked GPU is (probably) stable, and the problem likely lies elsewhere. Then it's easy enough to put it back again and remove the applications involved.

But first there are many components involved here other than just applications and GPUs: Given the increased precision in v8 raises the CPU demand (no GPU code changes at all actually), Are you quite sure that Pentium edition processor is enough to feed a 780 ?

If running multiple tasks and pegging the CPU cores, you may have better luck dialling the (CPU) load down a notch, reducing instance count (if applicable), and raising the GPU app process priority by one level.

FWIW my 3Ghz Core2Duo does manage to fill the GTX 980, but never been quick enough for Full loading and pegging the CPUs as well.

[Edit:]
SETI@home using CUDA accelerated device GeForce GTX 780
pulsefind: blocks per SM 4 (Fermi or newer default)
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully


With that 2.6GHz CPU, A free CPU core for 1-3 GPU instances, blocks per SM 8, periods per launch 200, and processpriority normal, IMO would both give the CPU a bit more breathing room, load the GPU a bit more, while still not pushing that GPU. always room to raise or lower the settings as well though if the opposite effect is noticed.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1782588 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1782601 - Posted: 26 Apr 2016, 7:56:22 UTC - in response to Message 1782431.  

the card is at factory default i got no idea why nobody reads it i have said it many times IT IS AT DEFAULT

The problem is that it's not at the factory default.

Base Clock: 967 MHZ
Boost Clock: 1020 MHz

Those are the factory defaults; higher than the reference card defaults, but still much lower than your card is reporting.
Grant
Darwin NT
ID: 1782601 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1782607 - Posted: 26 Apr 2016, 8:37:59 UTC - in response to Message 1782582.  
Last modified: 26 Apr 2016, 8:38:13 UTC

Sorry if I caused confusion. It looks like we have two different issues on two different boards being discussed. All the 750ti stuff I mentioned is moot in relation to the board in question, a 780.


Yeah I twigged into that bit :)

Blood alcohol level was dangerously low at the time ... :)
ID: 1782607 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1782608 - Posted: 26 Apr 2016, 8:39:32 UTC - in response to Message 1782601.  

the card is at factory default i got no idea why nobody reads it i have said it many times IT IS AT DEFAULT

The problem is that it's not at the factory default.

Base Clock: 967 MHZ
Boost Clock: 1020 MHz

Those are the factory defaults; higher than the reference card defaults, but still much lower than your card is reporting.

Are we sure? Whose board is it? There's a bunch of folks who make these ...
ID: 1782608 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1782609 - Posted: 26 Apr 2016, 8:39:41 UTC - in response to Message 1782607.  

Sorry if I caused confusion. It looks like we have two different issues on two different boards being discussed. All the 750ti stuff I mentioned is moot in relation to the board in question, a 780.


Yeah I twigged into that bit :)

Blood alcohol level was dangerously low at the time ... :)


That's a Panicworthy situation, one I plan to rectify here shortly :D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1782609 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1782611 - Posted: 26 Apr 2016, 8:54:54 UTC - in response to Message 1782601.  
Last modified: 26 Apr 2016, 8:57:29 UTC

the card is at factory default i got no idea why nobody reads it i have said it many times IT IS AT DEFAULT

The problem is that it's not at the factory default.

Base Clock: 967 MHZ
Boost Clock: 1020 MHz

Those are the factory defaults; higher than the reference card defaults, but still much lower than your card is reporting.

Lot of variation in those numbers depending on which model and whose board ... in some cases as much as 100 Mhz.
Whose 780 is it? Which submodel??
ID: 1782611 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1782612 - Posted: 26 Apr 2016, 8:55:42 UTC - in response to Message 1782609.  

Sorry if I caused confusion. It looks like we have two different issues on two different boards being discussed. All the 750ti stuff I mentioned is moot in relation to the board in question, a 780.


Yeah I twigged into that bit :)

Blood alcohol level was dangerously low at the time ... :)


That's a Panicworthy situation, one I plan to rectify here shortly :D

Solved ... :)
ID: 1782612 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1782621 - Posted: 26 Apr 2016, 9:28:23 UTC - in response to Message 1782611.  

Whose 780 is it? Which submodel??

If it's a GTX xxx SC it's an EVGA.

The GTX 780 SC specs I gave are from their web site for that card.
http://www.evga.com/Products/Specs/GPU.aspx?pn=C811FD6B-7E4F-4C02-9839-9C581D7B338C

The GTX 780Ti SC specs are
Base Clock: 1006 MHZ
Boost Clock: 1072 MHz
http://www.evga.com/Products/Specs/GPU.aspx?pn=E4755EA0-12DF-4F22-ABA3-650C494D83DC

The GTX 770 SC specs are
Base Clock: 1111 MHZ
Boost Clock: 1163 MHz
http://www.evga.com/Products/Specs/GPU.aspx?pn=FA93EFC9-A44B-448C-BCEA-AFF0B5BD7AEB


And the GTX 980 SC Gaming card.
It's specs are
Base clock: 1241
Boost clock: 1342
http://www.evga.com/Products/Specs/GPU.aspx?pn=472a80bc-9a78-45b1-8560-6fa1916330e8


Zombu2 is adamant it's a GTX 780 SC, and the clock speed is 1201MHz.
And that's what's being reported in his Stderr_ouput.

However that clock speed doesn't match up with the official specs for the GTX 780 SC, and it doesn't match up with the next 3 closest models either.
Hence my curiosity as to what GPUz reports, are the PCIe clocks on spec, and is Precision X running at all on that system?
Grant
Darwin NT
ID: 1782621 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1782623 - Posted: 26 Apr 2016, 9:34:16 UTC - in response to Message 1782621.  

Hence my curiosity as to what GPUz reports, are the PCIe clocks on spec, and is Precision X running at all on that system?

Gotcha. Guess I missed where it showed SC in stderr. My .002 on Precision x 16 is that it gets so funky you end up not having any idea where you really are ... Thanks for clarifying ...
ID: 1782623 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1782624 - Posted: 26 Apr 2016, 9:37:03 UTC

There is also an EVGA GTX780 Classified, which is what I am running on this rig right now.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1782624 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1782627 - Posted: 26 Apr 2016, 9:40:33 UTC - in response to Message 1782624.  

There is also an EVGA GTX780 Classified, which is what I am running on this rig right now.


They're certainly built for 1200+. Yours isn't spitting any invalids is it yet ? (I appreciate you haven't had it long yet, so could be difficult to tell.)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1782627 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1782629 - Posted: 26 Apr 2016, 9:43:43 UTC - in response to Message 1782623.  

Gotcha. Guess I missed where it showed SC in stderr.

Nah, you didn't miss it. I'm going on what Zombu2 posted earlier.
Grant
Darwin NT
ID: 1782629 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1782631 - Posted: 26 Apr 2016, 9:46:10 UTC - in response to Message 1782627.  

There is also an EVGA GTX780 Classified, which is what I am running on this rig right now.


They're certainly built for 1200+. Yours isn't spitting any invalids is it yet ? (I appreciate you haven't had it long yet, so could be difficult to tell.)

None so far, and it's been OCd to 1228 Mhz since last night.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1782631 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1782632 - Posted: 26 Apr 2016, 9:47:02 UTC - in response to Message 1782624.  

There is also an EVGA GTX780 Classified, which is what I am running on this rig right now.


Base Clock: 993 MHZ
Boost Clock: 1046 MHz
http://www.evga.com/Products/Specs/GPU.aspx?pn=F1A51A74-30C8-4135-8456-969ED9B0F77A
Grant
Darwin NT
ID: 1782632 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1782638 - Posted: 26 Apr 2016, 9:55:47 UTC
Last modified: 26 Apr 2016, 9:56:52 UTC

Now 1241, +100Mhz.
I don't think it's even breaking a sweat yet.
58c with dual fans at full speed.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1782638 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1782640 - Posted: 26 Apr 2016, 10:10:31 UTC - in response to Message 1782638.  

Now 1241, +100Mhz.
I don't think it's even breaking a sweat yet.
58c with dual fans at full speed.



They have Cherry picked parts (GPU die, VRAM chips, Capacitors etc), beefed up power circuitry and thermals, special firmware.

I wouldn't be surprised if you can pass 1400 with volts etc, and end up faster than at least reference 980s (if at some relative power/thermal disadvantage, it should be able to take more)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1782640 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1782646 - Posted: 26 Apr 2016, 10:30:56 UTC - in response to Message 1782557.  
Last modified: 26 Apr 2016, 10:32:26 UTC

I looked at your last AP and it shows the clock rate as 1019, which is about where it should be. I then went to the same time frame as that AP and found the cuda tasks around a "normal" clock rate. Looking ahead and backwards from that point it appears that when the machine is restarted it uses a Different clock rate. It will use that rate until it is again restarted. I looked at other 780s and noticed their rate was pretty consistent. So, the question is what is changing your clock rate after a reboot?

Looking as far back as possible it seems it was working fine with Version 8 CUDA;
Received 29 Feb 2016, 16:04:27 UTC, GPU current clockRate = 1123 MHz
1123/24 seems to be a consistent rate on a few machines.
This is where the trouble begins, Received 19 Apr 2016, 3:23:31 UTC, GPU current clockRate = 1215 MHz
That task began as 1123 and after a restart it was 1215.
It continued as 1215 until it was restarted here at 1097;
Received 21 Apr 2016, 19:35:13 UTC, GPU current clockRate = 1097 MHz
Then it worked fine until it was restarted here, https://setiathome.berkeley.edu/result.php?resultid=4879467266
Until the next restart it was bad news while clocked at 1201, https://setiathome.berkeley.edu/result.php?resultid=4879629574
Here it was restarted at 1136, https://setiathome.berkeley.edu/result.php?resultid=4883014586


now here is something i can go on ....
Indeed it is quiet weird why it would do that ...i wonder if precision is going heywire ... i checked everything kboost oberboost overvoltage etc everything is off and at default exept the fan it is running at 100% i'm gonna reboot it a couple times and see if that changes it and actually do a couple turn on then shutdown pull power from it and reinstall a clean driver

@Jason_gee
The early titan cards are based on the 780 chip so i bet if you get a good one you could ...gotta look at the asic
I came down with a bad case of i don't give a crap
ID: 1782646 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1782678 - Posted: 26 Apr 2016, 14:07:13 UTC - in response to Message 1782646.  
Last modified: 26 Apr 2016, 14:10:04 UTC

I looked at your last AP and it shows the clock rate as 1019, which is about where it should be. I then went to the same time frame as that AP and found the cuda tasks around a "normal" clock rate. Looking ahead and backwards from that point it appears that when the machine is restarted it uses a Different clock rate. It will use that rate until it is again restarted. I looked at other 780s and noticed their rate was pretty consistent. So, the question is what is changing your clock rate after a reboot?

Looking as far back as possible it seems it was working fine with Version 8 CUDA;
Received 29 Feb 2016, 16:04:27 UTC, GPU current clockRate = 1123 MHz
1123/24 seems to be a consistent rate on a few machines.
This is where the trouble begins, Received 19 Apr 2016, 3:23:31 UTC, GPU current clockRate = 1215 MHz
That task began as 1123 and after a restart it was 1215.
It continued as 1215 until it was restarted here at 1097;
Received 21 Apr 2016, 19:35:13 UTC, GPU current clockRate = 1097 MHz
Then it worked fine until it was restarted here, https://setiathome.berkeley.edu/result.php?resultid=4879467266
Until the next restart it was bad news while clocked at 1201, https://setiathome.berkeley.edu/result.php?resultid=4879629574
Here it was restarted at 1136, https://setiathome.berkeley.edu/result.php?resultid=4883014586


now here is something i can go on ....
Indeed it is quiet weird why it would do that ...i wonder if precision is going heywire ... i checked everything kboost oberboost overvoltage etc everything is off and at default exept the fan it is running at 100% i'm gonna reboot it a couple times and see if that changes it and actually do a couple turn on then shutdown pull power from it and reinstall a clean driver

@Jason_gee
The early titan cards are based on the 780 chip so i bet if you get a good one you could ...gotta look at the asic

Based on the figures others have posted & what I am seeing with my 750 ti FTW running well over the spec'd clock rate. I would have to guess that EVGA has not implemented a clock cap for the boost. Likely only being throttled by thermal and voltage limits.
While my 750ti FTW is running at 1345MHz GPUz reports "GPU performance is being limited due to insufficient voltage". If you have a tool that allows you to adjust the voltage you could try lowering it until the clock is closer to the value in the specs.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1782678 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1782693 - Posted: 26 Apr 2016, 22:03:09 UTC
Last modified: 26 Apr 2016, 22:04:23 UTC

The SSP shows we are still down but I was able to report 15 tasks, get 13 in return and the message boards are up. Now the SSP shows it is up, maybe the SSP is a bit lazy.
ID: 1782693 · Report as offensive
Profile Zombu2
Volunteer tester

Send message
Joined: 24 Feb 01
Posts: 1615
Credit: 49,315,423
RAC: 0
United States
Message 1782701 - Posted: 26 Apr 2016, 22:16:36 UTC
Last modified: 26 Apr 2016, 22:28:31 UTC

well it seems the issue fixed itself not an invalid one since the 24th and i have not touched the configs since i was quiet busy the last couple days

EDIT:
it seems i might have found the issue looking at the logs by accident
it seems v7 tried to run v8 tasks ....why i dunno but i stumpled over it on accident
I came down with a bad case of i don't give a crap
ID: 1782701 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 · Next

Message boards : Number crunching : Panic Mode On (102) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.