Panic Mode On (94) Server Problems?

Message boards : Number crunching : Panic Mode On (94) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 22 · Next

AuthorMessage
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1632076 - Posted: 24 Jan 2015, 6:07:37 UTC - in response to Message 1632065.  

My GPU's run above 30C, GPU0 runs at 50-54C on standard tasks, GPU1 runs 45-49C on the sanme tasks, both run higher with AP, GPU0 at 60C and GPU1 at 55C..

Mine generally run at 60°c, sometimes around 65°c when the temperature indoors gets up in to the high 30s. That's with the fan speeds at no more than 75%.
From memory on the older card the temps were often around 70°c with the fans running at 85% of their maximum.

Hi Grant,

Well my fans have a steep curve set with afterburner, so they run variable but pretty fast mostly.

When the temps indoors get to 30C I'll move to Alaska :-) Temps tend to max out at about 26C outdoors, less indoors. In summer that is:-) Winter temps run from -4C to +14C usually, but these days who knows what they will reach.

Fortunately I live in the SE, up norf temps have been below -14C.

I'm not too happy running GPU's at 60+C The max is supposed to be 70C+, but that's an intermittent rating, and given the heat gremlins seem to lurve my rigs I like to keep things coolish.

Cheers,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1632076 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632091 - Posted: 24 Jan 2015, 6:44:39 UTC - in response to Message 1632076.  

I'm not too happy running GPU's at 60+C The max is supposed to be 70C+,

The maximum operating temperature for the GTX 970 GPU is 98°c, so even 70°c isn't particularly close to that.

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-970/specifications
Grant
Darwin NT
ID: 1632091 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1632092 - Posted: 24 Jan 2015, 6:47:39 UTC - in response to Message 1632091.  

Running nothing but APs my 980s are around 61-62C..

With MBs they run 65-66C. But I have 4 stacked on top of each other (only 2 are that high, the others are at 45&60)
ID: 1632092 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632170 - Posted: 24 Jan 2015, 10:25:30 UTC
Last modified: 24 Jan 2015, 10:29:34 UTC

I'll be glad when there aren't any more of those 19ap11ad WUs left.

EDIT- the forum's are getting sluggish & I'm getting sticky downloads again. Hopefully things will settle down again.

The AP Results awaiting validation & WU awaiting assimilation are still slowly but steadily clearing.
Grant
Darwin NT
ID: 1632170 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1632244 - Posted: 24 Jan 2015, 16:52:48 UTC - in response to Message 1632091.  

I'm not too happy running GPU's at 60+C The max is supposed to be 70C+,

The maximum operating temperature for the GTX 970 GPU is 98°c, so even 70°c isn't particularly close to that.

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-970/specifications

Hi Grant,

I notice NVidia doesn't mention for how long at 98C, my guess is about 60 secs.
Care to put it to the test? Because I most certainly will not:-)

I tend to the view that marketing info will put the best face on almost anything, so if 1 particular GPU chip managed to survive at 98C for a few seconds, that's the figure that will be quoted, with no mention of the other 200 chips that died at a lot less:-)

Anyway its moot, my GPU's will never be allowed to go to much over 60C for any length of time.

I also notice that if NVI is used to restore P2 clock speeds when doing compute work that that also pushes temps up a bit, but not sufficiently to be a problem if the tasks complete fairly fast, say under 25 mins.

Regards,
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1632244 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1632285 - Posted: 24 Jan 2015, 18:16:02 UTC

Rig down for 7+ hours....
Now that just won't do.

Reboot.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1632285 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1632351 - Posted: 24 Jan 2015, 21:00:19 UTC - in response to Message 1632347.  
Last modified: 24 Jan 2015, 21:05:10 UTC

Oh, oh, oh. Something is struggling now. Not good, not good at all...

I'm hoping that the purging of the AP workunits that were assimilated yesterday has started.

Edit: My total AP valid has just dropped about 150. So looks like purging is now occurring!
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1632351 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34054
Credit: 18,883,157
RAC: 18
Belgium
Message 1632357 - Posted: 24 Jan 2015, 21:26:26 UTC

Forum pages are working slow.
rOZZ
Music
Pictures
ID: 1632357 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1632360 - Posted: 24 Jan 2015, 21:30:56 UTC - in response to Message 1632358.  

Oh, oh, oh. Something is struggling now. Not good, not good at all...

I'm hoping that the purging of the AP workunits that were assimilated yesterday has started.

Edit: My total AP valid has just dropped about 150. So looks like purging is now occurring!

Well, I can't even get to my task pages. Everything task related is not loading for the moment.

Luckily I had mine open & refreshing it only took 4 or 5 min.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1632360 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632363 - Posted: 24 Jan 2015, 21:49:52 UTC - in response to Message 1632347.  
Last modified: 24 Jan 2015, 21:50:08 UTC

Oh, oh, oh. Something is struggling now. Not good, not good at all...

Network traffic has plummeted.
For the last half hour Scheduler requests have been failing with "Scheduler request failed: Couldn't connect to server", with the occasional one getting through.
The AP clean up has also slowed to a stop, even though everything that was green on the Server Status page is still that way.
Things are coming to a halt again.
Grant
Darwin NT
ID: 1632363 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632369 - Posted: 24 Jan 2015, 22:00:56 UTC - in response to Message 1632244.  

I notice NVidia doesn't mention for how long at 98C, my guess is about 60 secs.
Care to put it to the test? Because I most certainly will not:-)

Not necessary- they have stated it. Failure of a video card to run with the core at that temperature would be a breach of consumer protection legislation. Big fines to follow.
Of course just because it should be able to run at that temperature doesn't mean it would be good to do so, for days on end, or even a minute or 2.
As for putting it to the test- when the fans on my previous card died it ran at around 90-92°c for over a week till I noticed the problem & lubricated the fans. It was still alive when I retired it.

Anyway its moot, my GPU's will never be allowed to go to much over 60C for any length of time.

You've chosen a particular number for no good reason I can see, but that's your choice to do so. I've had CPUs & video cards running at temperatures touching 80°c on occasion for over 7 years with no failures, so I've no concerns with them running at 70°c or less. 75°c is my particular number i choose to limit them to.

I also notice that if NVI is used to restore P2 clock speeds when doing compute work that that also pushes temps up a bit, but not sufficiently to be a problem if the tasks complete fairly fast, say under 25 mins.

On both of my systems from being idle to maximum load the temperatures max out within 5 minutes.
Grant
Darwin NT
ID: 1632369 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632371 - Posted: 24 Jan 2015, 22:02:35 UTC - in response to Message 1632363.  

Oh, oh, oh. Something is struggling now. Not good, not good at all...

Network traffic has plummeted.
For the last half hour Scheduler requests have been failing with "Scheduler request failed: Couldn't connect to server", with the occasional one getting through.
The AP clean up has also slowed to a stop, even though everything that was green on the Server Status page is still that way.
Things are coming to a halt again.


Network traffic has now rebounded- 200mb/s before it died to over 300mb/s now.
Hopefully the AP clean up has restarted as well.
Grant
Darwin NT
ID: 1632371 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 20357
Credit: 5,308,449
RAC: 0
United States
Message 1632387 - Posted: 24 Jan 2015, 22:54:47 UTC

I got a quick question. V7 7.03 (opencl_ati5_nocal). Is that a ap or mb? I just dropped a R7 260X in my rig and these started to run.

ID: 1632387 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1632395 - Posted: 24 Jan 2015, 23:23:03 UTC - in response to Message 1632387.  

I got a quick question. V7 7.03 (opencl_ati5_nocal). Is that a ap or mb? I just dropped a R7 260X in my rig and these started to run.


That's a MB task for the GPU, AP should be ati_100.
ID: 1632395 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 20357
Credit: 5,308,449
RAC: 0
United States
Message 1632400 - Posted: 24 Jan 2015, 23:40:10 UTC - in response to Message 1632395.  

I got a quick question. V7 7.03 (opencl_ati5_nocal). Is that a ap or mb? I just dropped a R7 260X in my rig and these started to run.


That's a MB task for the GPU, AP should be ati_100.

Thanks.

ID: 1632400 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1632404 - Posted: 24 Jan 2015, 23:57:13 UTC - in response to Message 1632351.  

Oh, oh, oh. Something is struggling now. Not good, not good at all...

I'm hoping that the purging of the AP workunits that were assimilated yesterday has started.

Edit: My total AP valid has just dropped about 150. So looks like purging is now occurring!

I can confirm that it is definitely happening.

An hour ago, I had 321 valids listed, and now there are 264.

I'm thinking the backlog should all be cleared up before the Tuesday DB maintenance. Should make for a smaller, more efficient backup and better performance after defragging the DB.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1632404 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1632470 - Posted: 25 Jan 2015, 5:21:49 UTC - in response to Message 1632219.  

They really should turn on the ap assimilator v6 too. I still have 6 AP v6 tasks that needs to be assimilated.

I'm sure there are still thousands of AP v6 tasks in need of assimilation.

Edit, added:

Not good:

5471 SETI@home 2015-01-24 16:21:58 Sending scheduler request: To fetch work.
5472 SETI@home 2015-01-24 16:21:58 Requesting new tasks for GPU
5473 SETI@home 2015-01-24 16:22:41 Scheduler request failed: HTTP internal server error

I have 26 APv6 that are ones stuck in limbo & will require the clean up script to be set free.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1632470 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1632486 - Posted: 25 Jan 2015, 6:32:46 UTC - in response to Message 1632472.  
Last modified: 25 Jan 2015, 6:35:49 UTC

And now, all AP assimilators went RED, Not Running.

And the graphs are reflecting that- now flatlined instead of trending down.
The fix AP assimilator 1 is still running so hopefully it'll be able to get them up again.

EDIT- I notice a couple of PFB splitters are down now as well. Even with the 1 that's disabled there should be 5 still running, but the splitter status only shows 4.
Will have to see what happens when the ready-to-send buffer drops to 280,00 or lower.
Grant
Darwin NT
ID: 1632486 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1632487 - Posted: 25 Jan 2015, 6:39:34 UTC
Last modified: 25 Jan 2015, 7:34:53 UTC

Apparently the (non)Assimilation Process is 'Losing' Files. I've seen a few of these Errors pop up recently;
Too many errors (may have bug)
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>ap_19oc14ab_B3_P0_00157_20150101_12321.wu</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>

   Task         Computer 	          Sent 	                     Time reported                       Status 	      Run time  CPU time  Credit 	      Application
3885062387 	6703788 	1 Jan 2015, 10:22:51 UTC 	2 Jan 2015, 12:30:52 UTC 	Completed, can't validate     6,531.81    635.91   0.00   AstroPulse v7 v7.04 (opencl_ati_mac)
3885062388 	7289488 	1 Jan 2015, 10:22:51 UTC 	25 Jan 2015, 1:59:26 UTC 	Completed, can't validate    12,055.76  11,927.81  0.00   AstroPulse v7 v7.05 (opencl_nvidia_100)
3932675796 	7430874 	25 Jan 2015, 1:59:34 UTC 	25 Jan 2015, 2:04:42 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 Anonymous platform (NVIDIA GPU)
3932682909 	7258715 	25 Jan 2015, 2:04:50 UTC 	25 Jan 2015, 2:09:58 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 Anonymous platform (ATI GPU)
3932691704 	7251930 	25 Jan 2015, 2:10:06 UTC 	25 Jan 2015, 2:15:18 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 v7.04 (opencl_ati_mac)
3932700678 	3143580 	25 Jan 2015, 2:15:24 UTC 	25 Jan 2015, 3:16:49 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 v7.04 (opencl_ati_mac)
3932788685 	7237681 	25 Jan 2015, 3:16:54 UTC 	25 Jan 2015, 3:22:04 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 v7.04 (sse2)
3932794148 	2301233 	25 Jan 2015, 3:22:11 UTC 	25 Jan 2015, 4:22:38 UTC 	Error while downloading 	0.00 	  0.00 	    ---   AstroPulse v7 v7.04 (opencl_ati_mac)

Of course this resets your Consecutive valid tasks to ZERO and reduces your Max tasks per day...

Here's another one,
<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>ap_01dc10aa_B0_P0_00207_20150104_07283.wu</file_name>
  <error_code>-224 (permanent HTTP error)</error_code>
  <error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
ID: 1632487 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1632523 - Posted: 25 Jan 2015, 12:50:10 UTC

Is anybody getting cookies from google that allow heaps of adds and turns your links GREEN

I am only getting this problem when I come to the Berkley home page or log in through the browsers . First it started happening in Chrome then I had to use explorer and delete Cookies , It really stuffs things up some sort of malware or some thing and only happens when I use the Bionic client .

If I don't open Bionic it does not happen .
ID: 1632523 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (94) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.