Panic Mode On (98) Server Problems?

Message boards : Number crunching : Panic Mode On (98) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

AuthorMessage
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33186
Credit: 79,922,639
RAC: 80
Germany
Message 1703709 - Posted: 21 Jul 2015, 12:39:40 UTC

Richard, please.
With each crime and every kindness we birth our future.
ID: 1703709 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1703723 - Posted: 21 Jul 2015, 13:30:19 UTC

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.
ID: 1703723 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703727 - Posted: 21 Jul 2015, 13:35:55 UTC - in response to Message 1703723.  
Last modified: 21 Jul 2015, 13:36:48 UTC

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.


Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703727 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1703730 - Posted: 21 Jul 2015, 13:43:55 UTC - in response to Message 1703727.  

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.

Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.

When did I collect estimated runtimes? I'd forgotten that.

Outurns, yes. But estimates would be tricky.
ID: 1703730 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703734 - Posted: 21 Jul 2015, 13:48:50 UTC - in response to Message 1703730.  

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.

Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.

When did I collect estimated runtimes? I'd forgotten that.

Outurns, yes. But estimates would be tricky.


What's an "outturn" ?,You may or may not be describing a number literally representable in engineering terms. My closest english language description would be 'best estimate', whether it be an exact figure or some noise function. In the cases you presented at the time, they were Gaussian in appearance, though most recently appear as log-normal like Eirc had projected.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703734 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 33186
Credit: 79,922,639
RAC: 80
Germany
Message 1703735 - Posted: 21 Jul 2015, 13:51:38 UTC - in response to Message 1703727.  
Last modified: 21 Jul 2015, 13:52:05 UTC

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.


Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.


Or just a little bit of cheating.
I`m using some sort of mirroring technique to fix this.
With each crime and every kindness we birth our future.
ID: 1703735 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703736 - Posted: 21 Jul 2015, 13:54:16 UTC - in response to Message 1703734.  

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.

Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.

When did I collect estimated runtimes? I'd forgotten that.

Outurns, yes. But estimates would be tricky.


What's an "outturn" ?,You may or may not be describing a number literally representable in engineering terms. My closest english language description would be 'best estimate', whether it be an exact figure or some noise function. In the cases you presented at the time, they were Gaussian in appearance, though most recently appear as log-normal like Eirc had projected.


Nevermind, I get it. Yeah the numbers you yield ( I guess outurns) stimulate the system. The predictive behaviiour is the domain of creditnew, and engineering. That should cover it I think.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703736 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703737 - Posted: 21 Jul 2015, 13:55:50 UTC - in response to Message 1703735.  

Maybe, just maybe, they should do their own "credit" system, like vLHC has for example.


Well that's the illusion, that it's about credit. It's actually about the task estimates the servers make, so responsible for the control of how many and when what type of tasks you get. Best numbers collected so far (incidentally by Richard) seem to suggest > +/- 30% variability on a rapid timescale under best conditions. I suspect that'll work for watering orange trees just fine, but afaik I'm less predictable than an orange tree, so better established engineering methods are needed.


Or just a little bit of cheating.
I`m using some sort of mirroring technique to fix this.


Yep, I think Raistmer might agree there is no black or white here too :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703737 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1703739 - Posted: 21 Jul 2015, 14:11:08 UTC - in response to Message 1703734.  

What's an "outturn" ?

How things turn out after the event, compared to what you estimated before you started. Perhaps more familiar in a financial control context: "It cost how much ??? - you said it would only be tuppence-halfpenny".
ID: 1703739 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703741 - Posted: 21 Jul 2015, 14:16:24 UTC - in response to Message 1703739.  
Last modified: 21 Jul 2015, 14:19:56 UTC

What's an "outturn" ?

How things turn out after the event, compared to what you estimated before you started. Perhaps more familiar in a financial control context: "It cost how much ??? - you said it would only be tuppence-halfpenny".


So estimate quality ? OK yep. Under our controlled Albert conditions, you saw > 37% variation in awarded credit, despite relatively constant runtimes. That reflects directly on the remaining variability, the estimate quality. I don't know for certain, as far as certainty goes in these times, but I'm pretty sure that you could predict the runtimes and proper credit heuristically on the back of an envelope, better than a robot should be able to do. at least as good or better. That's an algorithm, better than creditnew, achieving the intended purpose.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703741 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14456
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1703742 - Posted: 21 Jul 2015, 14:25:11 UTC - in response to Message 1703741.  

What's an "outturn" ?

How things turn out after the event, compared to what you estimated before you started. Perhaps more familiar in a financial control context: "It cost how much ??? - you said it would only be tuppence-halfpenny".

So estimate quality ? OK yep. Under our controlled Albert conditions, you saw > 37% variation in awarded credit, despite relatively constant runtimes. That reflects directly on the remaining variability, the estimate quality. I don't know for certain, as far as certainty goes in these times, but I'm pretty sure that you could predict the runtimes and proper credit heuristically on the back of an envelope, better than a robot should be able to do. at least as good or better. That's an algorithm, better than creditnew, achieving the intended purpose.

Ah, credit - yes (that's an outturn). I thought you were guiding us to concentrate on runtime estimates instead - which would be easier to do at Albert, where tasks were essentially identical. Initial runtime estimates here are further complicated (for MultiBeam only) by the AR --> fpops calibration curve only being fine-tuned (and not very fine, even then) for stock CPU apps.
ID: 1703742 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5516
Credit: 528,817,460
RAC: 242
United States
Message 1703765 - Posted: 21 Jul 2015, 19:03:36 UTC - in response to Message 1703742.  
Last modified: 21 Jul 2015, 19:04:40 UTC

And the free fall continues, still struggling to get any work for 2 of my computers. Maybe 20 here and there but no where near a full cache...

edit..

Ok, looks like 1 struck gold, the other still is struggling
ID: 1703765 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703785 - Posted: 21 Jul 2015, 20:52:09 UTC - in response to Message 1703742.  
Last modified: 21 Jul 2015, 21:25:29 UTC

Ah, credit - yes (that's an outturn). I thought you were guiding us to concentrate on runtime estimates instead - which would be easier to do at Albert, where tasks were essentially identical. Initial runtime estimates here are further complicated (for MultiBeam only) by the AR --> fpops calibration curve only being fine-tuned (and not very fine, even then) for stock CPU apps.


Well yes. that too. At the risk of seeming very Californian, I'll wave my arms wide and say "It's all connected.... dude", lol

[Edit:] to be fair on the system, closed loop control well tuned seems to be capable of giving pretty good estimates (sometimes to the second) without the addition of a transfer function by workunit parameters, which is also a reflection on the base estimate quality (which for MB does include a transfer function by AR already). I suspect with good tuning and that kindof compensation added it could seem downright spooky to a casual observer, but isn't really more complex. It'd just reflect a deeper understanding of the system, to the point that it might not be ready for yet.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703785 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1703799 - Posted: 21 Jul 2015, 21:51:53 UTC - in response to Message 1703526.  

Now I'd be happy to get VLARs to my NVIDIA cards.
Is there a way to say in app_info.xml that my cards could take a try? (2000 seconds one at a time)
(Fake they are ATI/AMD ...)


you could pretend they are CPU instead.
And revive my old @teammod@ to process both CPU and GPU apps via CPU-only BOINc scheduling. Good old days before BOINC even know what GPU is :DDD


I thought of that. I remember seeing a bit of code that determined what GPU to use by looking which one had most free memory or something similar.

I coded once a solution that had a variable in shared (CPU) memory and an increment counter modulo N (N=number of GPUs) to put the next task to that GPU..

But none of that is not necessary since the normal MB work is flowing in again.

I used the down-time to upgrade my Linux (Ubuntu) version to a newer one.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1703799 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703802 - Posted: 21 Jul 2015, 21:59:54 UTC - in response to Message 1703799.  

I thought of that. I remember seeing a bit of code that determined what GPU to use by looking which one had most free memory or something similar.
That's cool :) I think adaptive and heterogeneous things are still a bit scary for lots of people, but to me if it saves micromanaging so I have more special beer time, then it's good.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703802 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1703811 - Posted: 21 Jul 2015, 22:16:32 UTC - in response to Message 1703487.  

My version has still some accuracy problems.


Still didn't find the complete story there, though have the full team winding up to put each bit in and see what breaks (watch that Github soon). The Chirp explains a little, but not all of the issue. It'll be interesting what falls out the next few weeks in the background.


My answer is off topic. Not a server problem. I'll pm.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1703811 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1703812 - Posted: 21 Jul 2015, 22:17:37 UTC - in response to Message 1703811.  

My version has still some accuracy problems.


Still didn't find the complete story there, though have the full team winding up to put each bit in and see what breaks (watch that Github soon). The Chirp explains a little, but not all of the issue. It'll be interesting what falls out the next few weeks in the background.


My answer is off topic. Not a server problem. I'll pm.


cheers! could save some work :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1703812 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1703841 - Posted: 22 Jul 2015, 1:47:20 UTC
Last modified: 22 Jul 2015, 1:56:31 UTC

So it appears that we're getting closer to another new milestone of sorts. 2^32 tasks. I know the DB has had a few adjustments and tweaks over the years to deal with these special numbers, but I'm wondering if it is already able to handle this one. [edit: if memory serves me correctly, I think I recall 2^31-1 was a problem and Matt changed that field in the DB from being a signed 4-byte integer over to being an unsigned long (8-byte) integer, meaning 2^64 will be the next time that is a problem.]

It hasn't been created yet, but the time is drawing near. Let's see who the lucky person ends up being. I'm hedging my bet that it's going to be some incredibly slow machine, or someone who loads up a cache full of WUs on that machine's first-ever contact and is never heard from again.

Result 4294967296
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1703841 · Report as offensive
Bill Butler
Avatar

Send message
Joined: 26 Aug 03
Posts: 101
Credit: 4,270,697
RAC: 0
United States
Message 1703993 - Posted: 22 Jul 2015, 15:23:51 UTC - in response to Message 1703841.  

So it appears that we're getting closer to another new milestone of sorts. 2^32 tasks.

Hey Cosmic_Ocean, I am trying to keep up with you!
How did you find we are getting close to that number before losing count?

In round numbers I am reading that the Master Science Data Base is stuffed with 1.76 * 10^9 workunits. Also, for convenience I note that 2^32 ~= 4.29 * 10^9.

Dividing 1.76 / 4.29 the result is ~ 41% full.

This is not yet ominous. But maybe this is not what you are talking about.
"It is often darkest just before it turns completely black."
ID: 1703993 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6533
Credit: 196,805,888
RAC: 57
United States
Message 1704037 - Posted: 22 Jul 2015, 16:37:08 UTC - in response to Message 1703993.  
Last modified: 22 Jul 2015, 16:40:03 UTC

So it appears that we're getting closer to another new milestone of sorts. 2^32 tasks.

Hey Cosmic_Ocean, I am trying to keep up with you!
How did you find we are getting close to that number before losing count?

In round numbers I am reading that the Master Science Data Base is stuffed with 1.76 * 10^9 workunits. Also, for convenience I note that 2^32 ~= 4.29 * 10^9.

Dividing 1.76 / 4.29 the result is ~ 41% full.

This is not yet ominous. But maybe this is not what you are talking about.

When you are viewing your tasks look at the far left column labeled "Tasks". The number there reflects how many tasks how many tasks have been generated. Note the number of tasks exceeds the number of workunits by a minimum of 2 to 1. There are at least 2 tasks per workunit & up to 10.
That integer is what tends to be the issue. I would have to reread Matt's previous posts, but I recall that they must define the integer length in the table. Previously when we ran into a number larger than the table could accept. So work generation comes to a stop until they modify the table to accept the larger integer. Which seems to be done by creating a new table and then copying the data into the new table.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the BP6/VP6 User Group today!
ID: 1704037 · Report as offensive
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

Message boards : Number crunching : Panic Mode On (98) Server Problems?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.