Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next
Author Message
zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45737
Credit: 36,374,263
RAC: 8,161
Message 1323267 - Posted: 1 Jan 2013, 16:16:15 UTC - in response to Message 1323258.

Oh, great.
A new year, the coldest night of the winter so far here (-3f), and the servers go down.
GPUs are starting to cool down already.

Meowsigh.

But, Happy New Year anyway!


Here is hoping all have a Happy New Year.

Hopefully the Kitties have all your machines running Einstein for their backup project to get that heat flowing.

Maybe we can get Einstein to 1 Petaflop during the outages.

Started a couple of rigs back on Einstein, but they don't generate as much heat as optimized Seti.
Had to start the furnace this morning for the first time this year. (Was -6f here this morning.)

I know what You mean there Mark, it's 27F(-3C) outside here, Happy New Years.
____________

Team kizb
Send message
Joined: 8 Mar 01
Posts: 219
Credit: 3,709,162
RAC: 0
Germany
Message 1323277 - Posted: 1 Jan 2013, 16:26:45 UTC
Last modified: 1 Jan 2013, 16:29:04 UTC

Got my rigs setup on Einstein as well now. Can you run more than one WU at a time like with S@H?

And is there an optimized app for it?
____________
My Computers:
Blue Offline
Green Offline
Red Offline

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 578
Credit: 130,662,059
RAC: 112,646
United Kingdom
Message 1323284 - Posted: 1 Jan 2013, 16:44:46 UTC - in response to Message 1323258.


Had to start the furnace this morning for the first time this year. (Was -6f here this morning.)

Posted: 1 Jan 2013 | 15:51:12 UTC

Well, duh!
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45737
Credit: 36,374,263
RAC: 8,161
Message 1323289 - Posted: 1 Jan 2013, 16:52:25 UTC - in response to Message 1323277.
Last modified: 1 Jan 2013, 16:53:14 UTC

Got my rigs setup on Einstein as well now. Can you run more than one WU at a time like with S@H?

And is there an optimized app for it?

1. Yes.
2. No.

On the 590 I process Einstein WU's 2 per gpu, each one takes about 36-37 minutes to crunch through, so You might not want to do more than 1 per gpu, it's faster to do 1 at a time, but if You have 1.5GB to 2GB You could try it and You have to set the parameters over at Einstein here from 1.0 to 0.5, this can be found at this line below.

GPU utilization factor of BRP apps
DANGEROUS! Only touch this if you are absolutely sure of what you are doing!

____________

Team kizb
Send message
Joined: 8 Mar 01
Posts: 219
Credit: 3,709,162
RAC: 0
Germany
Message 1323293 - Posted: 1 Jan 2013, 16:59:48 UTC - in response to Message 1323289.

Sounds like I should just let my 295s do 1 at a time then. Thanks for the information.
____________
My Computers:
Blue Offline
Green Offline
Red Offline

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38150
Credit: 555,520,686
RAC: 604,627
United States
Message 1323296 - Posted: 1 Jan 2013, 17:01:34 UTC - in response to Message 1323284.


Had to start the furnace this morning for the first time this year. (Was -6f here this morning.)

Posted: 1 Jan 2013 | 15:51:12 UTC

Well, duh!

Well, duh?
If my GPUs were not out of Seti work, I would not have had to.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12039
Credit: 6,366,189
RAC: 8,597
United States
Message 1323300 - Posted: 1 Jan 2013, 17:05:25 UTC

You might want to check this discussion out next time there is an outage here and you are feeling lost because you can't find the post button.
http://boinc.berkeley.edu/dev/forum_thread.php?id=8105
Of course this is also on a computer at the SSL so it will be down during the Jan 4-6 power outage, but perhaps for less time than Seti is down.
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 578
Credit: 130,662,059
RAC: 112,646
United Kingdom
Message 1323304 - Posted: 1 Jan 2013, 17:29:10 UTC - in response to Message 1323296.


Had to start the furnace this morning for the first time this year. (Was -6f here this morning.)

Posted: 1 Jan 2013 | 15:51:12 UTC

Well, duh!

Well, duh?
If my GPUs were not out of Seti work, I would not have had to.

But it was also the first chance you had to do it this year...
____________

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 30984
Credit: 11,171,087
RAC: 19,925
United Kingdom
Message 1323305 - Posted: 1 Jan 2013, 17:34:34 UTC

Well, duh? If my GPUs were not out of Seti work, I would not have had to.

Oh c'mon Mark, -6F is damn cold, even kitties need a bit of warmth :-)

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38150
Credit: 555,520,686
RAC: 604,627
United States
Message 1323306 - Posted: 1 Jan 2013, 17:35:40 UTC - in response to Message 1323304.


Had to start the furnace this morning for the first time this year. (Was -6f here this morning.)

Posted: 1 Jan 2013 | 15:51:12 UTC

Well, duh!

Well, duh?
If my GPUs were not out of Seti work, I would not have had to.

But it was also the first chance you had to do it this year...

LOL...got me there. I didn't realize I was being so punny.
I should have said it was the first time this WINTER I had to run the furnace.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38150
Credit: 555,520,686
RAC: 604,627
United States
Message 1323308 - Posted: 1 Jan 2013, 17:36:46 UTC - in response to Message 1323305.
Last modified: 1 Jan 2013, 17:38:10 UTC

Well, duh? If my GPUs were not out of Seti work, I would not have had to.

Oh c'mon Mark, -6F is damn cold, even kitties need a bit of warmth :-)

No worries there.
They're normally snoozing on the nice warm waterbed. The temp controller is set to 82f, so they can stay as cozy as can be.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38150
Credit: 555,520,686
RAC: 604,627
United States
Message 1323349 - Posted: 1 Jan 2013, 19:03:03 UTC

Hmmmmmmmm.....
Seems to be a disturbance in the uploads on the Cricket graph.
Either something else fell over in the server closet, or..., could it be..., the Lone Ranger poking about on New Year's Day???
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 787,262,308
RAC: 57,808
United States
Message 1323354 - Posted: 1 Jan 2013, 19:18:00 UTC - in response to Message 1323308.

A water bed....
I wonder if that would be a large enough thermal mass to act as a radiator for a water cooling system?
____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38150
Credit: 555,520,686
RAC: 604,627
United States
Message 1323355 - Posted: 1 Jan 2013, 19:21:42 UTC - in response to Message 1323354.

A water bed....
I wonder if that would be a large enough thermal mass to act as a radiator for a water cooling system?

I'm sure it could be, although you would not want the water to get up to 82f for most efficient water cooling.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 578
Credit: 130,662,059
RAC: 112,646
United Kingdom
Message 1323381 - Posted: 1 Jan 2013, 21:59:30 UTC - in response to Message 1323349.

Hmmmmmmmm.....
Seems to be a disturbance in the uploads on the Cricket graph.
Either something else fell over in the server closet, or..., could it be..., the Lone Ranger poking about on New Year's Day???

Three hours later -- it seems like someone's in the control room but comments here have dried up. May we live in interesting times...
____________

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1323383 - Posted: 1 Jan 2013, 22:05:17 UTC

most of synergy`s proceses are not running,
gon all red,
and as for the cricket even the thin blue line has falen off the page.
`we have been normalized`
much the same as asymilated but less work gets done

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 40,834,828
RAC: 114,295
United States
Message 1323390 - Posted: 1 Jan 2013, 22:24:29 UTC

I've got an experiment I'd really like to try, but I'm scared... :-)

My nVidia card has been out of work since early this morning, however, I've all these CPU 603s lying around. I checked, the files labeled 609 (cuda23) are identical to the ones labeled 603. So, I added this entry to the end of my app_info file;

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.100000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>Lunatics_x41g_win32_cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart32_32_16.dll</file_name>
</file_ref>
<file_ref>
<file_name>cufft32_32_16.dll</file_name>
</file_ref>
</app_version>

After restarting BONIC I received this;
1/1/2013 2:42:04 PM | SETI@home | Found app_info.xml; using anonymous platform
1/1/2013 2:42:04 PM | SETI@home | [error] State file error: duplicate app version: setiathome_enhanced windows_intelx86 603

This would appear that it might work if I removed my CPU 603 entry from the top of my app_info file. Then again, I might lose all my 603 files including the ones waiting to be reported. I have quite a few 603s already uploaded and waiting reporting....

I also have a few of those nasty vlars that I've been suspending least the nVidia card try working on one. So, what do you think. If I suspend CPU work & vlars, stop BONIC, remove the CPU entry, then restart, will I have success or will I lose all my 603 files? I was hoping to at least report the completions before trying this, however, I'm growing inpatient.

Profile Chris S
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 30984
Credit: 11,171,087
RAC: 19,925
United Kingdom
Message 1323400 - Posted: 1 Jan 2013, 22:48:21 UTC

The feeder and the transitioners are not running...

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12039
Credit: 6,366,189
RAC: 8,597
United States
Message 1323405 - Posted: 1 Jan 2013, 23:16:07 UTC - in response to Message 1323400.

The feeder and the transitioners are not running...


The Cardinal is beating the Badger at the Rose Bowl. As Eric is a Badger, you had better root for them or it is likely to stay offline for a while. ;-)

____________

TBar
Volunteer tester
Send message
Joined: 22 May 99
Posts: 1177
Credit: 40,834,828
RAC: 114,295
United States
Message 1323413 - Posted: 1 Jan 2013, 23:56:49 UTC - in response to Message 1323390.

I've got an experiment I'd really like to try, but I'm scared... :-)

My nVidia card has been out of work since early this morning, however, I've all these CPU 603s lying around. I checked, the files labeled 609 (cuda23) are identical to the ones labeled 603. So, I added this entry to the end of my app_info file;
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.100000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>Lunatics_x41g_win32_cuda32.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart32_32_16.dll</file_name>
</file_ref>
<file_ref>
<file_name>cufft32_32_16.dll</file_name>
</file_ref>
</app_version>

After restarting BONIC I received this;
1/1/2013 2:42:04 PM | SETI@home | Found app_info.xml; using anonymous platform
1/1/2013 2:42:04 PM | SETI@home | [error] State file error: duplicate app version: setiathome_enhanced windows_intelx86 603

This would appear that it might work if I removed my CPU 603 entry from the top of my app_info file. Then again, I might lose all my 603 files including the ones waiting to be reported. I have quite a few 603s already uploaded and waiting reporting....

I also have a few of those nasty vlars that I've been suspending least the nVidia card try working on one. So, what do you think. If I suspend CPU work & vlars, stop BONIC, remove the CPU entry, then restart, will I have success or will I lose all my 603 files? I was hoping to at least report the completions before trying this, however, I'm growing inpatient.

Success!

I chose a safer route and merely moved the CPU entry to the bottom of my app_info file, below the nVidia entry. Then suspended all remaining 603s and restarted BOINC. Once again I received;
1/1/2013 6:12:00 PM | SETI@home | Found app_info.xml; using anonymous platform
1/1/2013 6:12:00 PM | SETI@home | [error] State file error: duplicate app version: setiathome_enhanced windows_intelx86 603

I then resumed one non-vlar 603 and the nVidia app started the task. I then resumed another non-vlar 603 to see if a CPU would start the task, it didn't. The first 603, with an estimate of 2 hours, finished in 27 minutes. It's on the 2nd 603 now. The only down side I see is this might have a negative effect on the CPU Estimated times for the 603s. The up side is, your nVidia card now has twice it's imposed limit of 100 units.

There has to be something I'm missing...
:-)

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2014 University of California