Panic Mode On (98) Server Problems?

Message boards : Number crunching : Panic Mode On (98) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 30 · Next

AuthorMessage
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1699911 - Posted: 9 Jul 2015, 19:05:24 UTC - in response to Message 1699901.  

Well, initially I'll try just adding <slot_debug> to cc_config under 7.2.33 and see if anything interesting pops. I might be upgrading that xw9400 from XP to Win 7 this weekend and, if it doesn't go as smoothly as I'd like, I might have to reinstall everything else, too, in which case perhaps I'll go ahead and try v7.6.2.

Heh, I see I'll have to increase the size of the log file on the xw9400 when I start it back up again tonight (it's only running on a limited schedule). My daily driver just finished its first task with <slot_debug> turned on and it spewed out an extra 21 lines of messages. The xw9400 typically has 16 tasks running concurrently and would probably blow through that log file in a matter of hours with the current allocation.

BTW, regarding the slot juggling that's occurring on Keith's machine, when we first started kicking this around a year and a half ago, in Strange Invalid MB Overflow tasks with truncated Stderr outputs..., it initially seemed like that slot juggling was consistent. After more observation, however, it seemed some truncations did occur without changing slots for the succeeding task, so there did seem to be some exceptions to that "rule".
ID: 1699911 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1699915 - Posted: 9 Jul 2015, 19:21:04 UTC - in response to Message 1699911.  

Well, initially I'll try just adding <slot_debug> to cc_config under 7.2.33 and see if anything interesting pops. I might be upgrading that xw9400 from XP to Win 7 this weekend and, if it doesn't go as smoothly as I'd like, I might have to reinstall everything else, too, in which case perhaps I'll go ahead and try v7.6.2.

Heh, I see I'll have to increase the size of the log file on the xw9400 when I start it back up again tonight (it's only running on a limited schedule). My daily driver just finished its first task with <slot_debug> turned on and it spewed out an extra 21 lines of messages. The xw9400 typically has 16 tasks running concurrently and would probably blow through that log file in a matter of hours with the current allocation.

BTW, regarding the slot juggling that's occurring on Keith's machine, when we first started kicking this around a year and a half ago, in Strange Invalid MB Overflow tasks with truncated Stderr outputs..., it initially seemed like that slot juggling was consistent. After more observation, however, it seemed some truncations did occur without changing slots for the succeeding task, so there did seem to be some exceptions to that "rule".

Yes, the second of the example logs I attached to the boinc_dev report exhibited that behaviour. In the last iteration round this problem (What causes a Blank stderr?), I posted the flowchart:

1) Delete everything in the slot folder on task exit
2) Delete everything - i.e. anything remaining - in the slot before reuse
3) Don't reuse the slot if (2) fails

In the first example today, BOINC couldn't delete stderr.txt at either step (1) or step (2) - so a different, higher-numbered, slot was used for the next task. When that task finished, BOINC retried the original slot again - and found the file could now be deleted, so the original slot was reused.

In the second example, BOINC couldn't delete stderr.txt at step (1), but succeeded at step (2) - so there was no need to take the escape route at step (3).
ID: 1699915 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1699982 - Posted: 9 Jul 2015, 23:37:52 UTC
Last modified: 9 Jul 2015, 23:53:24 UTC

I feel 2 June AC maybe be stuck as it is sitting at (14) and has been for at least the last hour more like 2 on the AP side holding a splitter up

Edit: tape has now been cleared
ID: 1699982 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 371
Credit: 20,533,537
RAC: 0
United States
Message 1699988 - Posted: 10 Jul 2015, 0:12:36 UTC - in response to Message 1699756.  
Last modified: 10 Jul 2015, 0:13:50 UTC

(...)
It looks like your PC got now AP WUs for the CPU...
Do you changed something in app_info.xml file, there is still a fine SETI entry for CPU?

It seems it sorted out. The only thing i did was, to select all apps and submit the changes. Then i waited a few minutes and selected AP7 only and other apps yes, if AP7 is not available.
It is like sometimes in Windows: An option is selected, but doesn't work. So you unselect it and hit "ok". Then you select the option again and hit "ok" and "magically" now the option works...


That might be the case or it might be something else. I have noted that when I have the cache set for 8 days and AP only, if I select ½ day cache and also to accept both AP and MBs at the same time, I seem to get big slug of MBs that I really did not want as if the cache was still set at 8 days. If on the other hand I set the cache for ½ day, force a client update and then ask for both MBs and APs after that, then I get just a half a days worth of MBs. I have been attributing that behavior to the possibility that the server and client interact and decide what needs to be downloaded before the new configuration I asked for is noted. It appears there are certain configurations that only seem to work after the second client contact. Of course I could be adding 2 and 2 and getting 5 but that is what I have found.
ID: 1699988 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36400
Credit: 261,360,520
RAC: 489
Australia
Message 1700114 - Posted: 10 Jul 2015, 9:01:14 UTC

I take it that there must be a VLAR storm going on seeing as I'm finding it hard to keep my GPU's cache full?

OTOH my CPU's have a 169 AP's between them so far. :-)

Cheers.
ID: 1700114 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1700604 - Posted: 11 Jul 2015, 21:06:09 UTC

AP assimilators appear to have called a halt.
3,600 & growing rapidly.
Grant
Darwin NT
ID: 1700604 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1700611 - Posted: 11 Jul 2015, 21:39:53 UTC

I noticed that too, I have a feeling they are merging databases.
ID: 1700611 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1701279 - Posted: 14 Jul 2015, 2:37:50 UTC

I hate to be a KillJoy, however, has anyone looked at the SSP recently?
It appears ALL the splitters except 2 MB splitters are Dead, All the AP cache is gone, the MB cache is rapidly falling, and the Current result creation rate is in the pits.

It may be time to Panic...
ID: 1701279 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1701282 - Posted: 14 Jul 2015, 2:41:12 UTC - in response to Message 1701279.  

Yes I noticed it about 6 hours ago but was trying to avoid panic by not saying anything. There was surplus of APs at the time. I had hoped they just turned them off for some maintenance but looks like i was wrong, AGAIN!!! lol
ID: 1701282 · Report as offensive
Profile Cactus Bob
Avatar

Send message
Joined: 19 May 99
Posts: 209
Credit: 10,924,287
RAC: 29
Canada
Message 1701298 - Posted: 14 Jul 2015, 3:39:57 UTC
Last modified: 14 Jul 2015, 4:12:51 UTC

Ya, I also noticed the creation rate was under 1 but hoped it wasn't a trend. OH well!!!

PANIC !!!!!

Bob

edit: The good news (for me) Is I have 200 AP's I just crunched my last MB

edit 2: arrgghh !! no tasks available for astropuilse V7 Down to 194 now --- sniff
Sometimes I wonder, what happened to all the people I gave directions to?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SETI@home classic workunits 4,321
SETI@home classic CPU time 22,169 hours
ID: 1701298 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22457
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1701326 - Posted: 14 Jul 2015, 5:09:06 UTC

...at least its only a few hours to the weekly outrage when hopefully they will kick the splitters into life again.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1701326 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1701358 - Posted: 14 Jul 2015, 6:44:39 UTC - in response to Message 1701326.  

Ready to send = 0.
Splitter output = 0.

I knew it had to happen; every time my RAC recovers & is almost back to it's usual levels, things fall over to clobber it again.
Grant
Darwin NT
ID: 1701358 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36400
Credit: 261,360,520
RAC: 489
Australia
Message 1701376 - Posted: 14 Jul 2015, 7:47:16 UTC

That would be right, just as my main rig has almost caught my 2nd rig again. :-(

Oh well, my GPU backup project is ready to go so I won't get cold. ;-)

Cheers.
ID: 1701376 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 21116
Credit: 5,308,449
RAC: 0
United States
Message 1701409 - Posted: 14 Jul 2015, 12:04:32 UTC

GPU is almost dry. Have enough CPU to make it to shutdown. After that. My CPU is going to start sucking air.

ID: 1701409 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36400
Credit: 261,360,520
RAC: 489
Australia
Message 1701412 - Posted: 14 Jul 2015, 12:18:12 UTC

My main rig's GPU's will be out of work by the time the outrage gets started, the other 1's GPU's maybe lucky and make it through.

Cheers.
ID: 1701412 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1701439 - Posted: 14 Jul 2015, 19:28:04 UTC

Woke up this AM, pushed the power button on my main rig only to hear the lovely chorus of Beeeep......Beep,Beep,Beep,Beep. !!!!!!!!GRRRRRR. It's only 15 months old for God's sake! Reseated the GTX 750 and the memory, no change. No display, and 4 beeps. Reseated all again, rebooted and got display went into bios and found the message "CPU Fan Error".

I replaced the Hyper 212 heat sink and fan and all is good. It was making noise yesterday and I guess a bearing went in the fan motor, the blade would not spin freely. After replacing the entire unit, I discovered the little clip-on mounts that attach the fan to the heat sink were removable and a standard 120mm fan could be replaced without removing the sink...........!!!!!!!!!!!Grrrrrrrrrr.

Oh well, it's fixed and SIV says the CPU temps are running between 55-62C.

And that's the reason I don't fix other people's PCs anymore, it's too frustrating for a man my age.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1701439 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1701448 - Posted: 14 Jul 2015, 20:05:31 UTC
Last modified: 14 Jul 2015, 20:06:10 UTC

Well, although Main is back up, downloads don't appear to be working properly.
They're taking an age to come down..

P.
ID: 1701448 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 21116
Credit: 5,308,449
RAC: 0
United States
Message 1701458 - Posted: 14 Jul 2015, 20:33:08 UTC

Any interesting but short wu's project? My backup is Climate prediction. I do not want to get a data set from them that will take up to a week to finish. Let alone three I know they will give me.

ID: 1701458 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1701463 - Posted: 14 Jul 2015, 20:41:47 UTC - in response to Message 1701458.  

Any interesting but short wu's project? My backup is Climate prediction. I do not want to get a data set from them that will take up to a week to finish. Let alone three I know they will give me.

My main backup projects are
For CPU:
MilkyWay@Home or Asteroids@home
For ATI GPUs:
PrimeGrid or Collatz Conjecture
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1701463 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1701486 - Posted: 14 Jul 2015, 21:32:15 UTC - in response to Message 1701439.  

I replaced the Hyper 212 heat sink and fan and all is good. It was making noise yesterday and I guess a bearing went in the fan motor, the blade would not spin freely. After replacing the entire unit, I discovered the little clip-on mounts that attach the fan to the heat sink were removable and a standard 120mm fan could be replaced without removing the sink...........!!!!!!!!!!!Grrrrrrrrrr.

You can take that fan that won't spin and bring it back to life most likely. Peel the sticker back on the back-side of it and there may or may not be a rubber seal/plug under the sticker. Being a 120mm fan, it can easily take 2-3 drops of motor oil (like you put in the engine of your car). Add the oil, wipe up any mess you made on the surface the sticker sticks to, put the rubber plug back in and put the sticker back on. Spin the fan by hand a bit and you'll feel it loosen up. Then it should be a usable fan again.

The rear exhaust fan on my case has been oiled twice. I haven't had to oil it in almost 3 years now though, so I expect that it is probably going to be due before long.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1701486 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 30 · Next

Message boards : Number crunching : Panic Mode On (98) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.