Panic Mode On (88) Server Problems?

Message boards : Number crunching : Panic Mode On (88) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next

AuthorMessage
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1511740 - Posted: 3 May 2014, 11:18:22 UTC - in response to Message 1511672.  
Last modified: 3 May 2014, 11:20:41 UTC

Aah, lovely verbage... being American, I am of course, confounded by English.

One of my favourite phrases, brought to you courtesy of the United States Air Force:
The (mechanism in question) experienced catastrophic non-linear structural exasperation leading to energetic disassembly.
Use it carefully.

Cheers
Member of the 20 Year Club



ID: 1511740 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1511746 - Posted: 3 May 2014, 11:48:18 UTC - in response to Message 1511740.  

One of my favourite phrases, brought to you courtesy of the United States Air Force:
The (mechanism in question) experienced catastrophic non-linear structural exasperation leading to energetic disassembly.


Lol, funny part is that makes absolute sense to me. Either the structural member in question was over-exerted, or the observer was a cunning linguist :-O
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1511746 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1511803 - Posted: 3 May 2014, 15:59:42 UTC - in response to Message 1511746.  

One of my favourite phrases, brought to you courtesy of the United States Air Force:
The (mechanism in question) experienced catastrophic non-linear structural exasperation leading to energetic disassembly.

Lol, funny part is that makes absolute sense to me. Either the structural member in question was over-exerted, or the observer was a cunning linguist :-O

Or both (8{)
Donald
Infernal Optimist / Submariner, retired
ID: 1511803 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1511858 - Posted: 3 May 2014, 17:42:15 UTC

All this credit talk.. I figure this is a decent moment to pop in with a recent observation I've made.

So I'm still in the habit of keeping my spreadsheet of all the APs I've crunched, and aside from creditNew's obviously random nature per-task, I did notice something not-so-obvious.

So most of my APs take ~44,000 seconds to complete and depending on blanking, pulses found, interference from the Cosmic Microwave Background.. the typical credit ends up being between 650-730, with a strong bias toward the upper 600s. Every now and then, for reasons unknown to me, a task ends up stretching out to 48,000 seconds or even into the low 50s. Just one task does that..and the other two complete in a normal amount of time. After reporting, that task didn't even have high blanking.. in fact, often it usually has zero blanking..and also zero pulses found.

The interesting new observation is that this long-running task gets the typical credit granted..but then the next few end up being granted a bit less than expected (high 400s, low-mid 500s, typically). This only lasts for 2-3 validations and then goes back to the "normal" range.

I understand granted credit takes the recent average processing rate into consideration somehow, so it isn't unexpected to get less credit/second.. but the task that runs long gets normal credit and the next few get a bit less than they should. I assume that indicates the long task becomes the new "baseline" so to speak, and the next couple of tasks technically had less processing, therefore they should get less credits.

That's just another example of the oscillation that happens within one kind of task (not just when you switch from MB-only to AP-only). As far as I can tell, there isn't really much oscillation in RAC that I can discern. I did notice a long time ago that occasionally, if I get shorted on credits for a task (like say.. down in the 300s), some time within the next 10 tasks or so, I'll get one that grants 900+ to balance the overall average out.

Just my observations.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1511858 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1511943 - Posted: 3 May 2014, 22:24:05 UTC - in response to Message 1511858.  
Last modified: 3 May 2014, 22:29:57 UTC

...That's just another example of the oscillation that happens within one kind of task (not just when you switch from MB-only to AP-only). As far as I can tell, there isn't really much oscillation in RAC that I can discern.


Yeah even filtering specifically to CPU MB shorties you see the oscillations. I believe the uncounted blanking plus the extra running time is enough to push it off balance. With MB the inherent variability of the tasks is one small contributor, though not that huge compared to the other issues. The bigger difference between MB and AP here is that stock CPU has AVX as well, so the perturbations are larger by giving the scale a harder shove (downwards) more frequently.

[Eric pointed out recently that the pfc_scales were going below 1, which theoretically is impossible unless applications use magic. Of course they don't use magic, but instead there are logic holes in the credit system that completely ignore SSE and AVX vectorisation]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1511943 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1511956 - Posted: 3 May 2014, 22:53:33 UTC - in response to Message 1511943.  

there are logic holes in the credit system

Holes you could fit Olympus Mons through IMHO.
The abortion that is Credit New.
Grant
Darwin NT
ID: 1511956 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1511986 - Posted: 3 May 2014, 23:54:07 UTC - in response to Message 1511956.  
Last modified: 3 May 2014, 23:57:10 UTC

there are logic holes in the credit system

Holes you could fit Olympus Mons through IMHO.
The abortion that is Credit New.


LoL. In some respects true (which is what makes it funny :P), but on the other hand the major issues are simple stability ones, so quite fixable without removing the intended benefits of self-scaling.

Having verified the overall high level design 'intent', as read from the code which does not match the whitepaper documentation, The things we're testing are lower level stability:

#1) coarse scaling error: caused primarily by omission of SIMD from the operation 'counts'. This causes drift, mostly downward, and AP-MB discrepancy. Fixable by taking SIMD and other future parallelism into account.
#2) oscillations (short term): caused by using sampled averages with no damping. That's basically the same problem DAC chips in cheap $5 CD players have. Fixing #1 then applying damping fixes this. The CD player equivalent modification would be the use of a much better DAC, with signal conditioning, resulting in a professional grade CD player.
#3) long term drift &/or slow response (too many tasks needed to respond to hardware change). Caused by averaging/filtering blurring out change. probably caused by using too many samples for averaging, which would have been done in a misguided attempt to address #2. After fixing #1 and #2, Add a delta term and fine-tune.

So yeah, it's a cluster, but fixable. Biggest challenge now is organising people across 4 continents, to get it all systematically tested for a well tuned/tested patch ( via albert@home , then probably beta here after). A bit like herding cats at the moment, and nothing to do with coding... just have to go with it.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1511986 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1511989 - Posted: 4 May 2014, 0:03:48 UTC - in response to Message 1511986.  

Jason I am encouraged by your post, these people who keep slaming credit new with no solutions I find to be dispiriting. You and others are at least working on it. If mended Boinc will benefit and hopefully the madness with credit on some projects will cease.
ID: 1511989 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1511991 - Posted: 4 May 2014, 0:09:40 UTC - in response to Message 1511986.  

So yeah, it's a cluster, but fixable. Biggest challenge now is organising people across 4 continents, to get it all systematically tested for a well tuned/tested patch ( via albert@home , then probably beta here after). A bit like herding cats at the moment, and nothing to do with coding... just have to go with it.

Good luck, with that & with new apps for Maxwell GPUs (not so subtle hint :-))
Grant
Darwin NT
ID: 1511991 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1511996 - Posted: 4 May 2014, 0:17:17 UTC - in response to Message 1511989.  

Jason I am encouraged by your post, these people who keep slaming credit new with no solutions I find to be dispiriting. You and others are at least working on it. If mended Boinc will benefit and hopefully the madness with credit on some projects will cease.


Yeah, it's a bit of a case where the overall design intent (as read from code) is fine, just naive implementation. So the detractors are 'right' and so are the ones who don't lean that way... which makes for a good recipe for mayhem, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1511996 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1511997 - Posted: 4 May 2014, 0:19:28 UTC - in response to Message 1511991.  

So yeah, it's a cluster, but fixable. Biggest challenge now is organising people across 4 continents, to get it all systematically tested for a well tuned/tested patch ( via albert@home , then probably beta here after). A bit like herding cats at the moment, and nothing to do with coding... just have to go with it.

Good luck, with that & with new apps for Maxwell GPUs (not so subtle hint :-))


Had some interesting insights there, and started some 'special tools' to help speed development. Basic idea revolves around building the infrastructure such that when the new full maxwell's come out, they optimise themselves. Mobile phone apps do it, so why not us ? lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1511997 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1511998 - Posted: 4 May 2014, 0:25:57 UTC

Thank you Jason for telling us what is going on behind the scenes. My hat is off to you and the others who are fighting windmills:).
[/quote]

Old James
ID: 1511998 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1512004 - Posted: 4 May 2014, 0:37:59 UTC - in response to Message 1511996.  

[quote] So the detractors are 'right' and so are the ones who don't lean that way... which makes for a good recipe for mayhem, lol.

This project in some ways reminds me of "The amature electrical hour" from Firesign Theatre from late 1960s and early 1970s
ID: 1512004 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1512087 - Posted: 4 May 2014, 4:48:01 UTC - in response to Message 1511998.  

Thank you Jason for telling us what is going on behind the scenes. My hat is off to you and the others who are fighting windmills:).

+1 :)
ID: 1512087 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1512102 - Posted: 4 May 2014, 5:16:27 UTC - in response to Message 1512087.  

Thank you Jason for telling us what is going on behind the scenes. My hat is off to you and the others who are fighting windmills:).

+1 :)


Any resemblance between myself and Don Quixote is entirely coincidental.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1512102 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1512106 - Posted: 4 May 2014, 5:43:16 UTC - in response to Message 1512102.  

Thank you Jason for telling us what is going on behind the scenes. My hat is off to you and the others who are fighting windmills:).

+1 :)


Any resemblance between myself and Don Quixote is entirely coincidental.

Excellent Jason ! And who plays the role of Sancho Panza ? :p
ID: 1512106 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1512151 - Posted: 4 May 2014, 9:08:14 UTC

Maybe Williams... LOL...
ID: 1512151 · Report as offensive
FeK9

Send message
Joined: 20 May 99
Posts: 40
Credit: 61,229,677
RAC: 26
South Africa
Message 1512154 - Posted: 4 May 2014, 9:32:01 UTC

And 'El Rucio'.. :)
Noli tangere circulos meos...
ID: 1512154 · Report as offensive
Eric Findley
Avatar

Send message
Joined: 28 Mar 03
Posts: 72
Credit: 8,674,945
RAC: 0
United States
Message 1512250 - Posted: 4 May 2014, 18:35:44 UTC

Anyone know what happened to Bionc stats page? my current credit went from 2,993,272 down to 17,631 on Seti
ID: 1512250 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (88) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.