Panic Mode On (98) Server Problems?

Message boards : Number crunching : Panic Mode On (98) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 30 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1699061 - Posted: 6 Jul 2015, 19:38:41 UTC - in response to Message 1699056.  

Uh, oh...
I see the SSP has not updated in over an hour, and the loading of stats seems to be crawling again.

I am going to assume that boinc.berkeley.edu suddenly being offline as well is unrelated.

And....my bad, folks.
I just recovered my daily driver after being down for several days, and the clock somehow got off by an hour.
Might be a false alarm, although for some reason, I seem to be losing cache.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1699061 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1699085 - Posted: 6 Jul 2015, 20:24:42 UTC - in response to Message 1699056.  

I am going to assume that boinc.berkeley.edu suddenly being offline as well is unrelated.

It isn't down for me at this time, but I have seen database errors, and pages claiming the whole thing was down. Then 2 seconds later it would load normally. Something or someone is hitting the domain hard, has been doing that for a couple of days now.
ID: 1699085 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1699092 - Posted: 6 Jul 2015, 20:38:59 UTC - in response to Message 1699085.  
Last modified: 6 Jul 2015, 20:40:17 UTC

I am going to assume that boinc.berkeley.edu suddenly being offline as well is unrelated.

It isn't down for me at this time, but I have seen database errors, and pages claiming the whole thing was down. Then 2 seconds later it would load normally. Something or someone is hitting the domain hard, has been doing that for a couple of days now.

I've seen the site hiccup before. It had been down for a while & the eternal site places were indicating it was down as well.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1699092 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1699145 - Posted: 7 Jul 2015, 0:44:15 UTC
Last modified: 7 Jul 2015, 0:45:13 UTC

http://setiathome.berkeley.edu/gpu_list.php

this is what I got

The following lists show the most productive GPU models on different platforms. Relative speeds, measured by average elapsed time of tasks, are shown in parentheses. Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 208
NVIDIA
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 209
ATI/AMD
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 210
Intel
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 211

Generated --- 

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1699145 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1699169 - Posted: 7 Jul 2015, 2:30:23 UTC - in response to Message 1699145.  

http://setiathome.berkeley.edu/gpu_list.php

this is what I got

The following lists show the most productive GPU models on different platforms. Relative speeds, measured by average elapsed time of tasks, are shown in parentheses. Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 208
NVIDIA
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 209
ATI/AMD
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 210
Intel
Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 175 No GPU tasks reported Notice: Trying to get property of non-object in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 211

Generated --- 

And the noticed at the top:
Notice: unserialize(): Error at offset 24625 of 45995 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/user/gpu_list.php on line 200

A PHP error normally indicating something is corrupt.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1699169 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 21210
Credit: 5,308,449
RAC: 0
United States
Message 1699216 - Posted: 7 Jul 2015, 4:57:46 UTC - in response to Message 1699061.  
Last modified: 7 Jul 2015, 4:59:23 UTC

Uh, oh...
I see the SSP has not updated in over an hour, and the loading of stats seems to be crawling again.

I am going to assume that boinc.berkeley.edu suddenly being offline as well is unrelated.

And....my bad, folks.
I just recovered my daily driver after being down for several days, and the clock somehow got off by an hour.
Might be a false alarm, although for some reason, I seem to be losing cache.

I noticed that too. Clicked the update button a couple time cycles. A nice number of wu's feel from the tree.

ID: 1699216 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1699268 - Posted: 7 Jul 2015, 11:59:20 UTC - in response to Message 1698898.  

How the hell can this validate?

Stderr output
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>


http://setiathome.berkeley.edu/workunit.php?wuid=1819553789

I have more than a few tasks doing this

And I haven't figured out yet why I'm shooting blanks on MB GPU tasks :(([/url]


I think I see a pattern to this now, and only seems to affect MB.

- new task starts
- pending upload waiting
- cpu hits load limit (90%)
- cpu shuts tasks down until upload is done
- then everything fires up again

I'm seeing this over and over if 2 things are happening at once.


2015-07-07 5:09:20 AM | SETI@home | Starting task 31dc12ad.30837.459788.438086664208.12.109_2
2015-07-07 5:09:21 AM | | Suspending computation - CPU is busy
2015-07-07 5:09:22 AM | SETI@home | Started upload of 13ja15ab.7158.1193235.438086664198.12.242_2_0
2015-07-07 5:09:24 AM | SETI@home | Finished upload of 13ja15ab.7158.1193235.438086664198.12.242_2_0
2015-07-07 5:09:31 AM | | Resuming computation


I have removed the 'suspend' when busy setting, and freed a core when running MB, and it's still complaining .... GRRRRR

Funny thing is my SLOW AMD is just fine, it's my i5 that is doing it .. GRRRR
ID: 1699268 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1699292 - Posted: 7 Jul 2015, 13:58:26 UTC - in response to Message 1699268.  

I'm wondering ... I had report tasks immediately set on my i5. (I changed it)

So does BOINC think the slot is empty after it is uploaded and just assume it has free access to the slot files to overwrite? Thus blanking them out?
ID: 1699292 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1699312 - Posted: 7 Jul 2015, 14:48:47 UTC - in response to Message 1699292.  
Last modified: 7 Jul 2015, 14:55:43 UTC

I'm wondering ... I had report tasks immediately set on my i5. (I changed it)

So does BOINC think the slot is empty after it is uploaded and just assume it has free access to the slot files to overwrite? Thus blanking them out?

No. We went into this in some detail (Ivan will confirm) while we were fixing LHC's CMS-dev application and the problems it was causing at other projects. Enable the <slot_debug> logging flag, and you will see that the slots are cleansed both when the old task finishes, and again before the new task starts. The only problem would be when stderr.txt grows to be more than 4 GB in size. If that happens, upgrade to BOINC v7.6.2, which can handle it.

Edit, and before any of that happens, stderr.txt for the old task is read back from disk and copied into the statefile structure in memory. From there, it is written out peridically into client_state.xml, just in case BOINC is restarted before the report is accepted - that way, there's a persistent record so the report can be retried after maintenance is complete.
ID: 1699312 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1699318 - Posted: 7 Jul 2015, 19:54:17 UTC - in response to Message 1699312.  

and we're back
ID: 1699318 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 21210
Credit: 5,308,449
RAC: 0
United States
Message 1699336 - Posted: 7 Jul 2015, 20:56:24 UTC

Is there any advantage to raising the CPU% used in ati5_nocal other than doing the wu faster?

ID: 1699336 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1699355 - Posted: 7 Jul 2015, 21:31:50 UTC - in response to Message 1699336.  

That is really an open ended question ... Yes freeing up a core does help GPU performance, but does that empty core do more work than you gained?

My i5 doesn't seem to boost the GPU enough to warrant a core shut down for 2 MB tasks. If I was running 3 or 4, then yes.

If CPU heat is an issue, then shutdown a core.

I think I do see an improvement in AP GPU tasks with 1 core shutdown, but it's hard to tell since those buggers are hard to get your hands on.

One hint ... stop your network communication, run it for a few hours, make a change, run for a few hours ... it will give you a good idea of what your run times are changing.
ID: 1699355 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34365
Credit: 79,922,639
RAC: 80
Germany
Message 1699359 - Posted: 7 Jul 2015, 21:41:08 UTC

It always depends on the CPU/GPU combo.
With a high end GPU you always benefit freeing a CPU core.


With each crime and every kindness we birth our future.
ID: 1699359 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1699557 - Posted: 8 Jul 2015, 16:42:51 UTC
Last modified: 8 Jul 2015, 16:43:10 UTC

For nearly 24 hours i'm not getting *ANY* work for the CPU, only work for GPU!
I didn't change anything, i don't get it... :/
Aloha, Uli

ID: 1699557 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1699564 - Posted: 8 Jul 2015, 16:51:24 UTC - in response to Message 1699557.  
Last modified: 8 Jul 2015, 16:52:23 UTC

For nearly 24 hours i'm not getting *ANY* work for the CPU, only work for GPU!
I didn't change anything, i don't get it... :/

The 'tasks limit in progress' reached for the GPU (or set WU cache)?
ID: 1699564 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1699576 - Posted: 8 Jul 2015, 17:11:16 UTC - in response to Message 1699564.  

The 'tasks limit in progress' reached for the GPU (or set WU cache)?
I have 2 CPUs and 2 GPUs and 139 tasks in progress.
This is what i get:

08/07/2015 18:57:40 | SETI@home | Computation for task 19ja15ab.29055.20324.438086664197.12.134_1 finished
08/07/2015 18:57:40 | SETI@home | Starting task 19ja15ab.29055.20324.438086664197.12.199_1
08/07/2015 18:57:40 | SETI@home | [cpu_sched] Starting task 19ja15ab.29055.20324.438086664197.12.199_1 using setiathome_v7 version 700 (cuda50) in slot 2
08/07/2015 18:57:42 | SETI@home | Started upload of 19ja15ab.29055.20324.438086664197.12.134_1_0
08/07/2015 18:57:48 | SETI@home | Finished upload of 19ja15ab.29055.20324.438086664197.12.134_1_0
08/07/2015 18:57:50 | SETI@home | Sending scheduler request: To report completed tasks.
08/07/2015 18:57:50 | SETI@home | Reporting 1 completed tasks
08/07/2015 18:57:50 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
08/07/2015 18:57:53 | SETI@home | Scheduler request completed: got 1 new tasks
08/07/2015 18:57:55 | SETI@home | Started download of 10fe15aa.9888.9065.438086664197.12.123
08/07/2015 18:57:59 | SETI@home | Finished download of 10fe15aa.9888.9065.438086664197.12.123

Of course the single downloaded WU is for the GPU!? :(
So my CPU crunches Milkyway, the backup project.
Things get worse and worse...
Aloha, Uli

ID: 1699576 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1699579 - Posted: 8 Jul 2015, 17:23:22 UTC - in response to Message 1699015.  

I attribute it to BOINC 7.2.33 which seems to be the best version I've come across.

Well, take a look at my active hosts and see which BOINC version I'm running....on all of them. :^) I still average a couple of truncations a day, I think, though as I mentioned, only(!) about 5 a month end up Invalid.

I gave it some more thought and realized sometime between using the Commode version and the regular version I had moved the BOINC Data folder to a 2nd hard drive. Maybe having the Data folder on a different HD than the OS makes a difference? I dunno, just trying to guess why I don't seem to be having the problem anymore...
ID: 1699579 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1699582 - Posted: 8 Jul 2015, 17:29:48 UTC

08/07/2015 19:14:55 | SETI@home | Finished upload of 19ja15ab.29055.20324.438086664197.12.199_1_0
08/07/2015 19:19:52 | SETI@home | Sending scheduler request: To report completed tasks.
08/07/2015 19:19:52 | SETI@home | Reporting 1 completed tasks
08/07/2015 19:19:52 | SETI@home | Requesting new tasks for CPU
08/07/2015 19:19:55 | SETI@home | Scheduler request completed: got 0 new tasks
08/07/2015 19:19:55 | SETI@home | No tasks sent
08/07/2015 19:19:55 | SETI@home | No tasks are available for AstroPulse v7
08/07/2015 19:25:15 | Milkyway@Home | Computation for task de_80_DR8_Rev_8_5_00004_1434551187_10506855_0 finished
08/07/2015 19:25:15 | Milkyway@Home | Sending scheduler request: To report completed tasks.
Look at this, i have selected to download other WUs, if AP is not available, which obviously works for the GPU but *NOT* for the CPU. This is server side related, *NOT* on my end!
Aloha, Uli

ID: 1699582 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1699589 - Posted: 8 Jul 2015, 18:18:07 UTC

Don't look now, but we got a new dataset splitting off some AP work..........
Meow alert.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1699589 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1699595 - Posted: 8 Jul 2015, 18:57:26 UTC - in response to Message 1699579.  

I attribute it to BOINC 7.2.33 which seems to be the best version I've come across.

Well, take a look at my active hosts and see which BOINC version I'm running....on all of them. :^) I still average a couple of truncations a day, I think, though as I mentioned, only(!) about 5 a month end up Invalid.

I gave it some more thought and realized sometime between using the Commode version and the regular version I had moved the BOINC Data folder to a 2nd hard drive. Maybe having the Data folder on a different HD than the OS makes a difference? I dunno, just trying to guess why I don't seem to be having the problem anymore...

Do you have a different setting for "write caching" on the 2nd drive? I remember speculating that write caching might be a factor, back when we were initially trying to diagnose the problem. Unfortunately, after trying it with the setting both enabled (my default) and disabled, I didn't find any difference in behavior so I pretty much discarded that theory. Maybe there's another drive setting that could be in play.
ID: 1699595 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 30 · Next

Message boards : Number crunching : Panic Mode On (98) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.