Message boards :
Number crunching :
Panic Mode On (102) Server Problems?
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 25 · Next
Author | Message |
---|---|
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I'm about 600 tasks short of a full boat here myself at the moment. Kitties are not getting their proper ration of kibbles. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
It appears that the boys in the lab have done something to prevent downloads of (Nvidia) GPU WUs. If not all, then at least the GB ones and there are none available from Arecibo. Downloads have been working normal for me so I do not think I have done anything to prevent them from downloading. I have currently 100 CPU WUs in progress but only 80 GPU WUs despite asking for them. As we know VLAR tasks are not sent to GPUs. GBT data is expected to be mostly VLARs. If we are on a data sets from Arecibo that generate mostly VLARs as well. Then there will be few tasks for GPUs. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
First thing I would do with that GPU is set it back to the Nvidia defaults: the card is at factory default i got no idea why nobody reads it i have said it many times IT IS AT DEFAULT I came down with a bad case of i don't give a crap |
rob smith Send message Joined: 7 Mar 03 Posts: 22186 Credit: 416,307,556 RAC: 380 |
...because you keep on having the same problem :-( The other thing you might consider is evicting the dust bunnies - they breed when we aren't looking. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
...because you keep on having the same problem :-( yeah dust bunnys do sneak in but this machine is blown out on a weekly basis so are all the other machines i have been running the msi burnin test now for 6 hours and no artifacts driver crashes or anything else for that matter card is running at a nice 60C I came down with a bad case of i don't give a crap |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
So... when you combine 4 of these guppi's, do you get a gouldfish then? ;-) |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
Maybe a dead one with 3 eyes I came down with a bad case of i don't give a crap |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
All i can tell you is that none of the cards are overclocked they all on default I think it's really hard to know for sure what those core clock readings really mean. One of the cards on my T7400 is an NVIDIA reference GTX780. It runs at default settings, no overclocking. Looking at the Graphics Card tab in GPU-Z the GPU Clock = 863 MHz and the Boost Clock = 902 MHz. However, looking at the Sensors tab, the GPU Core Clock speed is shown as 1019 MHz. That's the same as what shows up in the Stderr for a Cuda50 task. Precision X and Open Hardware Monitor also report the 1019 MHz value. I remember an exchange with Jason a couple of months ago that touched on this sort of discrepancy, and I don't think any conclusions were reached. Oh, and my GTX780 doesn't appear to be throwing any errors at all, even on MESSIER031 tasks, so I would think you do have a hardware problem to deal with. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I think the suspicions of something triggerring excessive boost clock (either some software issue, firmware/hardware, or manual setting in precision etc) are likely. A >20% overclock on those isn't out of the realm of possibility (factory, auto boost or not), though you'd be expecting watercooling, beefed up power, and increased GPU voltages to achieve it, rather than an out of the box boost clock to do it with fan cooling. The application reading is taken right in the middle of a computationally intense portion of code that is sufficient to trigger the (normal) boost functionality. That uses the same API as available to GPUz and Precision, so they should read the same peak (unless there is something wrong) The eventual boost value is determined by a complex array of GPU internal sensors (I think about 23 different metrics IIRC), and a curve in the firmware set by the manufacturer, with overrides by the monitoting/OC software (like precision and others) Aside from the possible software/settings issues, It's possible the particular GPU model, being factory OC'd, has an aggressive/optimistic boost curve, one or more sensors is dicey, or some other element of the GPU is weak. It'd be impossible to know which, if any of these, would be to blame, other than just manually reducing the clock offset so that boost drops to factory or even reference card spec. If results/normal operation come good, then you can just say 'it was software, GPU manufacturer, or something else, but works now' It is [very] unlikely the MHz reading is incorrect, so getting that frequency to drop to normal levels will be the thing to prove something has been forcing the clock inappropriately high. (assuming it comes good) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
My 750Ti's are factory clocked at 1228, someone told me they should be at 1200. It never runs invalids though. My problem is, when I run AP tasks (and using the computer) it frequently crashes :( If I'm not using the computer It runs all night with no problems. It is on my to do list to try down clocking during the nest AP run. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
My 750Ti's are factory clocked at 1228, someone told me they should be at 1200. It never runs invalids though. Could still possibly be a similar situation, aggressive/optimistic factory boost curve. The reasons factories do these relate to marketing/competition for what is sold as a gaming card, high volume in the 750ti case. (HPC Tesla Compute devices are built, binned and specced differently, with no vendor tweaks allowed, IIRC Still only by nVidia themselves) Under stress (such as when using the host while crunching) every GPU+host will behave a bit differently, Assuming available application settings have already been explored (reducing pressure), one possible mechanism for AP or MB crashes under contention with the user/display, could be GPU memory or PCIe Saturation inducing driver crashes through OS timeouts in the latter case, or excessive memory errors in the first case. If reducing the GPU memory+core boost offsets doesn't help, then sometimes a small voltage bump can be all that's needed (if temps/cooling and power allow) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
My 750Ti's are factory clocked at 1228, someone told me they should be at 1200. It never runs invalids though. All depends on whose board and which flavor of 750ti. I have 4 machines, each with 2x EVGA 750ti. 6 are SCs, 2 are FTWs.
1 , SC(0) ,1320 1 , SC(1) ,1333 2 , FTW(0),1360 2 , SC(1) ,1306 3 , FTW(0),1345 3 , SC(1) ,1320 4 , SC(0) ,1333 4 , SC(1) ,1333
|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Those clocks suggest the previous example 750ti isn't clocked all that high compared to many. the Freeing of CPUs (effectively reducing contention) seems logical, Settings for the crashing AP app also. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Zombu2 Send message Joined: 24 Feb 01 Posts: 1615 Credit: 49,315,423 RAC: 0 |
well the card worked great up until v8 got into my queue and that's when the whole shabang started so i'm more inclined to blame either boinc or teh lunatics app I came down with a bad case of i don't give a crap |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
@Jason Yea I have tried shutting down cores and only running 1 AP task, removing/changing command line, Without success. It only seems to freeze/lock up when I move the mouse. I did finally get my iGPU working (it was a stubborn bugger) so I want to try using that as my main display, and turn off the 750. That should relieve some strain. if that doesn't work then it's downclocking time, or maybe power as you suggested. It would be nice to have a reliable supply of APs for testing, just frustrating that it works great for the few hours I'm here, nest time it crashes, then forget what I changed before I ever see another task. So in the mean time, I have been trying to let them kick in at night. |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
I think there's no argument with the fact that V8 is more demanding of resources than V7 was. I know I had to dial it back a bit on my weakest machine. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I looked at your last AP and it shows the clock rate as 1019, which is about where it should be. I then went to the same time frame as that AP and found the cuda tasks around a "normal" clock rate. Looking ahead and backwards from that point it appears that when the machine is restarted it uses a Different clock rate. It will use that rate until it is again restarted. I looked at other 780s and noticed their rate was pretty consistent. So, the question is what is changing your clock rate after a reboot? Looking as far back as possible it seems it was working fine with Version 8 CUDA; Received 29 Feb 2016, 16:04:27 UTC, GPU current clockRate = 1123 MHz 1123/24 seems to be a consistent rate on a few machines. This is where the trouble begins, Received 19 Apr 2016, 3:23:31 UTC, GPU current clockRate = 1215 MHz That task began as 1123 and after a restart it was 1215. It continued as 1215 until it was restarted here at 1097; Received 21 Apr 2016, 19:35:13 UTC, GPU current clockRate = 1097 MHz Then it worked fine until it was restarted here, https://setiathome.berkeley.edu/result.php?resultid=4879467266 Until the next restart it was bad news while clocked at 1201, https://setiathome.berkeley.edu/result.php?resultid=4879629574 Here it was restarted at 1136, https://setiathome.berkeley.edu/result.php?resultid=4883014586 |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
Sorry if I caused confusion. It looks like we have two different issues on two different boards being discussed. All the 750ti stuff I mentioned is moot in relation to the board in question, a 780. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
@Tbar, To my knowledge (unless something changed), the AP OpenCL App does not use the NVAPI detection as does Seti Cuda MB, GPUz, and Precision-X, but instead standard figures reported before the application even initialises the device, so it's not a measurement. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Sorry if I caused confusion. It looks like we have two different issues on two different boards being discussed. All the 750ti stuff I mentioned is moot in relation to the board in question, a 780. Yeah I twigged into that bit :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.