Panic Mode On (102) Server Problems?

Message boards : Number crunching : Panic Mode On (102) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 25 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1768816 - Posted: 1 Mar 2016, 13:29:19 UTC - in response to Message 1761851.  

Before Felipe asks his monthly question ... ;)

Looks like we reprocessed just over 50 of the old 2011 tapes in February, and started on the 2010 backlog - I saw 21 of those. We also did 76 brand new 2015 tapes, making 150 in all - rather fewer than we normally get through. Still, it was a short month, and v8 processing is slower (because more precise) than before.

Recorded	TOTAL		Processed with		Processed with 
				SaH v7/8 (since		Sah v6 only
				launch June 2013)	(derived)

2007		 350		   4			 346
2008		 916		 874			  42
2009		 548		 456			  92
2010		 762		 159			 603
2011		1148		1082			  66
2012		 846		 819			  27
2013		 590		 585			   5
2014		 260		 260			 n/a
2015		 292		 292			 n/a

Grand total	5712		4531			1181
ID: 1768816 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1768826 - Posted: 1 Mar 2016, 15:23:35 UTC
Last modified: 1 Mar 2016, 15:27:38 UTC

Before Felipe asks his monthly question ... ;)


Hello Richard. Just came to see if you post your monthly update. As there is not much official information, your input is really valuable for us crunchers.


Thank you once again to taking the time to update the data history processed for us.

It actually helps a lot on keeping interest on the project

Filipe
ID: 1768826 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1768868 - Posted: 1 Mar 2016, 23:22:28 UTC

Back after the outage, but:

Replica seconds behind master 20,549

...is rising! :O
Aloha, Uli

ID: 1768868 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1768878 - Posted: 1 Mar 2016, 23:55:25 UTC

See:

Replica seconds behind master 22,349

:(
Aloha, Uli

ID: 1768878 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1768892 - Posted: 2 Mar 2016, 0:56:40 UTC

Panic Mode off......replica now 10,261 behind.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1768892 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1768943 - Posted: 2 Mar 2016, 5:54:45 UTC

It was behind after an outrage and went further behind for a bit? :-O

That's just typical of what happens these days after an outrage. ;-)

Cheers.
ID: 1768943 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1769348 - Posted: 3 Mar 2016, 23:04:24 UTC - in response to Message 1768943.  

I've noticed a spike in inconclusives, now around 6.4% (it was around 4%).

Seems to be 2 main modes,
1 My GPUs v x86_64-apple-darwin systems. Mostly against opencl_nvidia_mac & opencl_intel_gpu_sah applications.

2 My CPUs v windows-intelx86 CUDA42
Mixed in both 1&2 are several results where the spikes, pulses etc number match, but they've been decided not to be similar enough to validate.


I notice that a lot of the inconclusives I've got, have come from 2-5 WUs that I've done all being done by the same machine.
So the main cause of the sudden high number of inconclusives appears to be that when people get WUs they come as a group, so you get systems checking multiple WUs against each other. Also when a group of WUs are completed, and then come up as inconclusive they are released for re-processing as a bunch. So when they are re-allocated, they all go to the one system.

I expect once this batch of inconclusives clears, my inconclusive rate will drop back down to around 4%, till the next time a batch goes to a machine.


I notice that the splitters still tend to bunch up on particular files.
If each splitter worked on just the 1 file, this clumping of WUs would be less of an issue and the allocation of WUs would tend to be more random; there would be less of 2 machine comparing multiple WUs against each other.
Grant
Darwin NT
ID: 1769348 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1769356 - Posted: 3 Mar 2016, 23:38:40 UTC - in response to Message 1769348.  

Both those OS X apps have a tendency to spit out inconclusives. There is a beta OS X CUDA app (http://www.arkayn.us/forum/index.php?topic=191.msg4411;topicseen#new) that fixes a lot of that and is twice as fast. Maybe some of your wingmen will see this message...

The most of the Intel iGPU's just barely work on OS X for this. Some work fine and others just spew inconclusives. The same chip can be fine on one machine and not on another.

Chris
ID: 1769356 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1769360 - Posted: 3 Mar 2016, 23:58:28 UTC - in response to Message 1769356.  

The most of the Intel iGPU's just barely work on OS X for this. Some work fine and others just spew inconclusives. The same chip can be fine on one machine and not on another.

I don't think they're much better on Windows machines.
It appears that Intel really need to sort out their drivers; they've given them OpenCL support but it would appear they haven't done much work in actually checking that their output is even remotely accurate.
Grant
Darwin NT
ID: 1769360 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1769518 - Posted: 4 Mar 2016, 17:59:22 UTC
Last modified: 4 Mar 2016, 18:12:15 UTC

Question: since last evening sometime, I noticed a gradual decline in the number of tasks on my 2 machines, which normally have 600 WUs on them at any time. It's not like they have been totally shut out - they have been getting WUs in dribs and drabs, just not enough to keep stable or fill up.

Any idea why this might be happening? Is anyone else noticing similar? All I did was switch from stock to Lunatics 44, but that shouldn't affect requests for new work, should it...or does it give me fewer until the new apps are figured out by the servers on my machines?

Thanks for any advice...

EDIT: I just noticed that at least some of the new v8s d/l for GPU have about 1/10 the GFLOPs in the estimate that they had before. Huh????
ID: 1769518 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1769542 - Posted: 4 Mar 2016, 19:39:10 UTC - in response to Message 1769518.  

or does it give me fewer until the new apps are figured out by the servers on my machines?

IIRC until the server sees how long the new apps run it vastly over estimates the run times, hence you will get fewer work units.
ID: 1769542 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1769545 - Posted: 4 Mar 2016, 19:54:50 UTC - in response to Message 1769542.  

It will take 10 completed tasks to average and then things will go back to the way they were. Just have to wait for that 11th one.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1769545 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1771734 - Posted: 15 Mar 2016, 15:08:03 UTC

Now that was an Uncalled for Pre-Outrage Panic...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1771734 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1771741 - Posted: 15 Mar 2016, 15:40:34 UTC - in response to Message 1771734.  

Now that was an Uncalled for Pre-Outrage Panic...

As an added bonus Einstein also was down at 7 am PDT.
ID: 1771741 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1771964 - Posted: 16 Mar 2016, 19:41:09 UTC

Has the resend lost tasks feature been turned off? I have lost one WU while trying to get stock apps to run on NativeBOINC and the servers didn't send it back to me so far...
ID: 1771964 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1772056 - Posted: 17 Mar 2016, 4:01:31 UTC - in response to Message 1771964.  

Has the resend lost tasks feature been turned off?

From what I can recall it's been disabled here at Seti@home for some time.
As for other projects, no idea.
Grant
Darwin NT
ID: 1772056 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1772087 - Posted: 17 Mar 2016, 8:04:26 UTC - in response to Message 1772056.  

OK, than I know... thanks.
ID: 1772087 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1774155 - Posted: 26 Mar 2016, 15:06:52 UTC - in response to Message 1774151.  

Did you notice that the Astropulse science database on marvin is disabled?

9 days without any panic, but now that AP's are being split again, I think it would be a good idea to start the AP assimilators. They've been in state "Not running" for a day or so now. The number of Workunits waiting for assimilation for AP is growing and growing, and with new AP's being split and crunched again, if the AP assimilators aren't running, something will break, sooner or later.

Donald
Infernal Optimist / Submariner, retired
ID: 1774155 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1774169 - Posted: 26 Mar 2016, 16:14:01 UTC - in response to Message 1774155.  
Last modified: 26 Mar 2016, 16:15:15 UTC

And many errors in the splitting process, probably for the very same reason...

Cheers.
ID: 1774169 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1774250 - Posted: 26 Mar 2016, 20:35:01 UTC - in response to Message 1774238.  

Did you notice that the Astropulse science database on marvin is disabled?

No, I forgot to open my eyes that wide :-)
Geeze, help me, I'm blind.
Stoopid me....



Damn, all those AP's NOT being produced, my poor gpu is starving ;-)

Back to Einstein methinks, for now.

P.
ID: 1774250 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (102) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.