Panic Mode On (103) Server Problems?

Message boards : Number crunching : Panic Mode On (103) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 34 · Next

AuthorMessage
JLDun
Volunteer tester
Avatar

Send message
Joined: 21 Apr 06
Posts: 573
Credit: 196,101
RAC: 0
United States
Message 1793984 - Posted: 6 Jun 2016, 15:36:58 UTC

Rough day coming up?

SSP 15:30:05 UTS has "Replica Behind Master: 0s" "As Of: 32m"
ID: 1793984 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1793993 - Posted: 6 Jun 2016, 15:59:19 UTC - in response to Message 1786530.  

What are your run times with those?

Between 7.5 and 9 minutes per vlar, depends on AR
_\|/_
U r s
ID: 1793993 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1793997 - Posted: 6 Jun 2016, 16:18:14 UTC - in response to Message 1791140.  

They work on my Linux box with an Opteron 1210 CPU and an AMD HD 7770.
Tullio

So the atiapu Wu's don't have to be run only on a APU?

These days the on die APU is generally a cut down version of a low end standalone video card.
Being low end the application would run on most standalone cards of that architecture type without issue, although one released for the standalone card would run faster with the default installation without tweaking the settings.


It's all true but APU also have one important difference from standalone card - it shares memory with CPU. Physically shares.
Attempt to use that was made in APU build.
ID: 1793997 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1794000 - Posted: 6 Jun 2016, 16:32:29 UTC - in response to Message 1793997.  
Last modified: 6 Jun 2016, 16:33:20 UTC

I am running two CPU SETI@home 8.05 tasks on my Opteron 1210, plus two GPU 8.10 tasks on the HD 7770. I have built an app_config.xml file with gpu 0.50 and cpu 0.3. It runs nicely on the GPU, but CPU only tasks have a glitch. Sometimes they do not make any checkpoint and seem to go on forever, so I have to abort them.
Tullio
ID: 1794000 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1794006 - Posted: 6 Jun 2016, 16:37:20 UTC - in response to Message 1794000.  
Last modified: 6 Jun 2016, 16:39:09 UTC

I am running two CPU SETI@home 8.05 tasks on my Opteron 1210, plus two GPU 8.10 tasks on the HD 7770. I have built an app_config.xml file with gpu 0.50 and cpu 0.3. It runs nicely on the GPU, but CPU only tasks have a glitch. Sometimes they do not make any checkpoint and seem to go on forever, so I have to abort them.
Tullio

that's Linux ones

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
aborted by user
</message>
<stderr_txt>
setiathome_v8 8.00 Revision: 3335 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is : 0.426377
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_vGetPowerSpectrum 0.000510 0.00000

</stderr_txt>
]]>
seems to stuck on inner benchmark
ID: 1794006 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1794008 - Posted: 6 Jun 2016, 16:42:12 UTC - in response to Message 1794000.  
Last modified: 6 Jun 2016, 16:44:46 UTC

...but CPU only tasks have a glitch. Sometimes they do not make any checkpoint and seem to go on forever, so I have to abort them.
Tullio


If this is on one of your Linux machines, I have the same problem only on this computer. However you may not have to abort the the stuck work units. When I quit the BOINC Manager including terminating apps. and relaunch BOINC they complete normally; I checked them and they are valid and get credited. I need to do this a couple of times a day to clear one or two stuck ones. I was wondering when I would see someone else with the same issue.

Are you using the AVX MB client?
ID: 1794008 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1794010 - Posted: 6 Jun 2016, 16:48:22 UTC - in response to Message 1794008.  

You don not have to restart BOINC, just suspend/resume the task and it will run fine on next run.
ID: 1794010 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1794016 - Posted: 6 Jun 2016, 17:03:52 UTC - in response to Message 1794008.  

No, I am using a SSE2 client.
Tullio
ID: 1794016 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1794017 - Posted: 6 Jun 2016, 17:08:22 UTC - in response to Message 1794016.  

No, I am using a SSE2 client.


Thanks... that was the only thing I did differently from this machine than several others so no idea why it's the only one with this problem.
ID: 1794017 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1794102 - Posted: 6 Jun 2016, 22:31:23 UTC - in response to Message 1794010.  
Last modified: 6 Jun 2016, 22:31:42 UTC

You don not have to restart BOINC, just suspend/resume the task and it will run fine on next run.


Didn't work. Um... Linux!
ID: 1794102 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1794122 - Posted: 6 Jun 2016, 23:51:35 UTC - in response to Message 1794102.  

It happens only on the 64-bit SuSE Leap 42.1, not on the 32-bit SuSE 13.2 running MB tasks.
Tullio
ID: 1794122 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1794137 - Posted: 7 Jun 2016, 0:33:11 UTC - in response to Message 1794122.  

It happens only on the 64-bit SuSE Leap 42.1, not on the 32-bit SuSE 13.2 running MB tasks.
Tullio

This might not be related to the distro, but to the fact of running two many tasks on GPU and CPU.
Your Opteron 1210 is only a dual core, so it could help to reduce the number of CPU tasks to one, while running tasks on your GPU.
On 32bit Linux you don't see that problem, because there are no GPU tasks running.

How can you see if CPU resource is overcommitted ?
When running tasks continously (for example 24/7) on a dedicated host with not much else running, check the difference between CPU runtime and elapsed time for your CPU tasks.
If it is more than a few hundred seconds better check your setup. (example task from tullios task list)
Your results show ten times higher difference between elapsed time and CPU runtimes.
_\|/_
U r s
ID: 1794137 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1794172 - Posted: 7 Jun 2016, 6:36:59 UTC

Just noticed that validated tasks are not removed. Usually they are removed after appr. 24 hours, but I now have lots of tasks (both MB and AP) that were validated over 48 hours ago that are still visible. I do not see anything that could explain this on the server status page. Anyone else notice this ?

Tom
ID: 1794172 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1794780 - Posted: 9 Jun 2016, 19:12:54 UTC

Interesting - scheduler (only) down for maintenance. I wonder what that straw in the wind portends?
ID: 1794780 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1794782 - Posted: 9 Jun 2016, 19:23:26 UTC - in response to Message 1794780.  

A lot more than just the scheduler is down.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1794782 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1794783 - Posted: 9 Jun 2016, 19:27:50 UTC

SSP is glowing orange :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1794783 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1794793 - Posted: 9 Jun 2016, 20:02:58 UTC - in response to Message 1794785.  

Lol Well, maybe it's Tuesday in Sweden, but over here it's Thursday!

ID: 1794793 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1794794 - Posted: 9 Jun 2016, 20:21:29 UTC

It looks as though they just loaded a horde of BLC2 files. From my records the BLC2 tasks were about 50% longer/slower than the current BLC6/7 tasks. So, I suggest you gird your loins, the Slow ones are coming back.
ID: 1794794 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1795167 - Posted: 10 Jun 2016, 18:04:12 UTC

AP split, AP database down -> All errors?

Sleepy
ID: 1795167 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1795223 - Posted: 10 Jun 2016, 22:11:42 UTC - in response to Message 1795169.  

AP split, AP database down -> All errors?

Sleepy

Well, who expects any AP's anyhow? :-)

Or maybe AP DB is down because these tapes have already been split for SAH V6/7 and AP, and are simply being rerun for SAH v8? Dunno ...
ID: 1795223 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 34 · Next

Message boards : Number crunching : Panic Mode On (103) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.