Rescheduling - final attempt

Message boards : Number crunching : Rescheduling - final attempt
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1479590 - Posted: 20 Feb 2014, 12:11:08 UTC

Ok, guys. You're going to get another chance to discuss this controversial topic.
Nicely, civil and EMOTIONFREE. I mean it.
I don't want anything that smacks even remotely of flaming, insults, accusations or similar.
You are perfectly entitled to your various opinions, but please state them rationally.
This thread is going to see hard moderation. If you can't post without turning off your emotions first, don't. You'll only get modded.

I reserve the right to have any posts removed that don't fit my criteria of an emotionfree debate. I realise this is difficult for some people. You just will have to try very hard if you wish to contribute.

I reserve the right to have this thread closed if it turns out that you just cannot discuss this topic civilly and without getting upset.

Think of it as a panel discussion. People who disturb the proceedings will be escorted off the premises. If the panel comes to blows, the proceedings will be halted without further ado.

Before everybody starts talking, can we please have one post from each side - in favour and against - laying out your postion and reasoning in a few words.

First person to use the c-word owes me a drink ;)
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1479590 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1479599 - Posted: 20 Feb 2014, 12:52:22 UTC - in response to Message 1479590.  
Last modified: 20 Feb 2014, 13:06:37 UTC

First person to use the c-word owes me a drink ;)

LOL - I´m doomed... :)

First of all, we need to separate "good and bad reschedulers"

The first group is composed of those who need to use the rescheduling to keep their hosts working, for several real reasons, the best example of this group is when no MB apps where avaiable to his OS. They have no other option than rescheduler their jobs. So is good for the project in this case (keep more hosts crunching).

There are several other who realy knows how to rescheduling and uses the rescheduler just to store more WU on their hosts but crunch the WU on the right device (GPU WU on GPU only) so in theory this still have no impact on the project.

From my POV both groups are "good reschedulers", if they stay within the 200WU limit of course, and if you look with imparciality what they doing is good for the project (applause to them)

And there are a third group, those who only rescheduling to optain more RAC, the "allready called cheaters". They are the ones who we need to worry about.

Their host could crunch MB with no problems, then why they stockpilling to many AP WU and artificialy create a 15 or more days cache? if that is not needed after COLO. That i need to better understand, but my only clue is to obtain more RAC.

Some of them (mostly anonymous) could even return all the WU crunched in time, so they could say, we are doing no harm to the project, but they forget one point, we live on a community, they have no more right than the others to DL WU. Just the 3 top anonymous reschedulers host have in their caches about 2% of the entire project avaiable WU and produces only 0,1% of the project output, that´s can´t be right!

And there are the 100+100 WU limit, if they are in place, they are for a reason, to keep the DB small as possible (or at least what was noticied) so build large caches who could extrapolate this limits are clearely "against" the project roules, so that is clearely wrong. In my opinion, there are no excuses to break this limits (i agree this limits s@#@!! and i allready ask to rise the a little several times on this form but that is for another thread), so anyone who break this limit is wrong no mather if is for a good cause, roules are roules and are there to follow.

Just imagine, i´m not telling anyone will going do that, if any of the top crunchers decide to do the same, just 1 like me could acumulate 2-5% of the entire project AP WU, some could acumulate up to 10% (just do the math), so where this will end? But we are lucky almost all the top cruncher not cheating, thats is the nice part, congrats to all.

Hope i was able to start a peacefull exchange of words.
ID: 1479599 · Report as offensive
FeK9

Send message
Joined: 20 May 99
Posts: 40
Credit: 61,229,677
RAC: 26
South Africa
Message 1479637 - Posted: 20 Feb 2014, 15:55:43 UTC - in response to Message 1479599.  

Here in ZA the c-word is not used, instead the d-word... :)

Jokes apart, I agree with 'juan BFB'. What the third group is doing is not 'Cool'... :(

First person to use the c-word owes me a drink ;)

LOL - I´m doomed... :)

First of all, we need to separate "good and bad reschedulers"

The first group is composed of those who need to use the rescheduling to keep their hosts working, for several real reasons, the best example of this group is when no MB apps where avaiable to his OS. They have no other option than rescheduler their jobs. So is good for the project in this case (keep more hosts crunching).

There are several other who realy knows how to rescheduling and uses the rescheduler just to store more WU on their hosts but crunch the WU on the right device (GPU WU on GPU only) so in theory this still have no impact on the project.

From my POV both groups are "good reschedulers", if they stay within the 200WU limit of course, and if you look with imparciality what they doing is good for the project (applause to them)

And there are a third group, those who only rescheduling to optain more RAC, the "allready called cheaters". They are the ones who we need to worry about.

Their host could crunch MB with no problems, then why they stockpilling to many AP WU and artificialy create a 15 or more days cache? if that is not needed after COLO. That i need to better understand, but my only clue is to obtain more RAC.

Some of them (mostly anonymous) could even return all the WU crunched in time, so they could say, we are doing no harm to the project, but they forget one point, we live on a community, they have no more right than the others to DL WU. Just the 3 top anonymous reschedulers host have in their caches about 2% of the entire project avaiable WU and produces only 0,1% of the project output, that´s can´t be right!

And there are the 100+100 WU limit, if they are in place, they are for a reason, to keep the DB small as possible (or at least what was noticied) so build large caches who could extrapolate this limits are clearely "against" the project roules, so that is clearely wrong. In my opinion, there are no excuses to break this limits (i agree this limits s@#@!! and i allready ask to rise the a little several times on this form but that is for another thread), so anyone who break this limit is wrong no mather if is for a good cause, roules are roules and are there to follow.

Just imagine, i´m not telling anyone will going do that, if any of the top crunchers decide to do the same, just 1 like me could acumulate 2-5% of the entire project AP WU, some could acumulate up to 10% (just do the math), so where this will end? But we are lucky almost all the top cruncher not cheating, thats is the nice part, congrats to all.

Hope i was able to start a peacefull exchange of words.

Noli tangere circulos meos...
ID: 1479637 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1479670 - Posted: 20 Feb 2014, 17:39:09 UTC
Last modified: 20 Feb 2014, 17:54:27 UTC

I see 2 science projects competeing, SETI vs Boinc (computer science). Boinc is a great idea and if they can get credit new fixed it will be a better idea. A problem I see with rescheduling is that messes up the validity of the credits so the boinc developers don't get as good of data as they can to fix credit new.
So a question is should the science of SETI precide over BOINC development?
ID: 1479670 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1479672 - Posted: 20 Feb 2014, 17:43:30 UTC

I'd like to repeat a few worthy concepts from the previous threads that would attempt to remove the need to Reschedule.
1) Slow down the AstroPulse Splitters to lengthen the time AP tasks are available. This could be accomplished by simply changing one AP splitter to MB. If you haven't noticed, the MB splitters are once again falling behind and the ready to send number will soon be Zero.
2) Do not load more than around 200-300 channels in the splitters at once. This will shorten the periods when AstroPulses are not available. It will also help in times when the MB splitters fall behind, as they are currently. The last time the MB ready to send was approaching Zero bringing more AP channels on line solved the problem. Just as it was before, there are currently around 250 MB channels remaining. If fewer channels had been loaded, it would be time to load more channels containing APs which would help the MB splitters maintain a buffer.
3) Change the limits to allow more GPU tasks than CPU tasks. For most hosts, 100/100 is not realistic. Ratios such as 150/50 would better suit actual workflow. Some Dual core hosts would be fine with 20 CPU tasks and 180 GPU tasks.
My 2¢.
ID: 1479672 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1479677 - Posted: 20 Feb 2014, 18:03:55 UTC

I haven't joined in the previous rescheduler discussions, or even read very much of them, because I didn't think I had any interest in the topic. However, when the GPUs on my top cruncher ran out of work during this week's Tuesday outage (because of all the shorties in the queue), I realized that I do have an interest. While the GPUs burned through their allotted 100 tasks in about 3.5 hours, the CPUs (which never push the 100 task limit to begin with) still had over 30 tasks ready to start. If I could've reassigned those tasks to the GPUs, it would have kept them crunching for close to another hour, just enough to last through the outage.

It's not about APs vs. MBs, or exceeding the mandated limits, or even about buying William a drink (which I see betreger has already volunteered to do), it's just about rescheduling tasks in a way that would have helped maximize my contribution to S@H by keeping my GPUs from going idle.
ID: 1479677 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1479688 - Posted: 20 Feb 2014, 18:32:36 UTC - in response to Message 1479677.  

even about buying William a drink (which I see betreger has already volunteered to do)

Huh?
ID: 1479688 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1479693 - Posted: 20 Feb 2014, 18:40:26 UTC - in response to Message 1479688.  

even about buying William a drink (which I see betreger has already volunteered to do)

Huh?

First person to use the c-word owes me a drink ;)

Perhaps I didn't understand William correctly, but I thought the c-word he referred to showed up 3 times in your post. :^)
ID: 1479693 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1479703 - Posted: 20 Feb 2014, 18:52:19 UTC - in response to Message 1479688.  
Last modified: 20 Feb 2014, 19:01:36 UTC

Don't worry, it's probably one of those virtual drinks.
I'll take care of it for you;



There, problem solved.

:-)
ID: 1479703 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1479714 - Posted: 20 Feb 2014, 19:07:24 UTC - in response to Message 1479703.  

Don't worry, it's probably one of those virtual drinks.
I'll take care of it for you;



There, problem solved.

:-)

Come over to Rocky's in the Café and get all the free drinks you want, mixed by a roo and served by a koala.

In fact, I'm going to repost this nice image there.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1479714 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1479746 - Posted: 20 Feb 2014, 20:01:10 UTC

I wanted to post this in Raistmer's thread but then it got locked so I sent him this PM yesterday (Now that this thread has opened up I'm posting it here)

Since you are making a lot of sense, I had a quick think to figure out why no-one understands what you are saying. I may have found one of the reasons....

Boinc inability to enforce limits on a per-app basis. You know this well but it hasn't been mentioned? (Didn't follow Juan's thread)

Example:
SETI@home v7 7.00 windows_intelx86 : 50 tasks
SETI@home v7 7.00 windows_intelx86 (cuda23): 200 tasks
AstroPulse v6 6.04 windows_intelx86 (opencl_nvidia_100):100 tasks
etc.

Of course I just made the numbers up. They are not important. The fact that they should be treated differently is what is important. You and Jason (I think) have complained about this before but it was probably over on Beta... Maybe you should explain it here?

Sorry if I misunderstood something!
ID: 1479746 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34747
Credit: 261,360,520
RAC: 489
Australia
Message 1479753 - Posted: 20 Feb 2014, 20:13:02 UTC
Last modified: 20 Feb 2014, 20:16:32 UTC


From my POV both groups are "good reschedulers", if they stay within the 200WU limit of course, and if you look with imparciality what they doing is good for the project (applause to them)

If they stayed within the limits there would not be any of this friction.

The thing is that they're not. 1 rig has been regularly seen with over 1000 AP's in progress, 2 others regularly with over 700 AP's and a couple with over 300 AP's, so far.

Cheers.
ID: 1479753 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1479804 - Posted: 20 Feb 2014, 21:25:59 UTC - in response to Message 1479753.  


From my POV both groups are "good reschedulers", if they stay within the 200WU limit of course, and if you look with imparciality what they doing is good for the project (applause to them)

If they stayed within the limits there would not be any of this friction.

The thing is that they're not. 1 rig has been regularly seen with over 1000 AP's in progress, 2 others regularly with over 700 AP's and a couple with over 300 AP's, so far.

Cheers.

I think that Juan's point is that a lot of them DO stay within the limits. It's only a few bad apples that are tarnishing the reputation of the rest. (Kind of like the debate over gun control.)
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1479804 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1479821 - Posted: 20 Feb 2014, 22:17:25 UTC
Last modified: 20 Feb 2014, 22:24:47 UTC

Yes, if they stay within the limits and just use the resheduler to keep their hosts running it´s my personal opinion they are not doing nothing bad for the project on the contrary, they just add more computing power to the project wich is good. It´s a simple better use of the resources each one have avaiable, some like us who use optimized apps vs stock. So i have nothing against that, on the contrary.

The big argument against that was the credit mess it could cause, but the credit system is allready screwed and we all know that now, so screw a little more will make little impact on the project itself. Do not forget, there are some examples posted on the forums that shows the credit mess from resheduling is not confirmated, at least on AP WU.

I just not sure if the bad apples are realy few, i find some more examples just in the 40 top hosts, but quantity not matter, always remember just one rotten apple could trash an entire bag of good apples.
ID: 1479821 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1479824 - Posted: 20 Feb 2014, 22:20:35 UTC - in response to Message 1479804.  


From my POV both groups are "good reschedulers", if they stay within the 200WU limit of course, and if you look with imparciality what they doing is good for the project (applause to them)

If they stayed within the limits there would not be any of this friction.

The thing is that they're not. 1 rig has been regularly seen with over 1000 AP's in progress, 2 others regularly with over 700 AP's and a couple with over 300 AP's, so far.

Cheers.

I think that Juan's point is that a lot of them DO stay within the limits. It's only a few bad apples that are tarnishing the reputation of the rest. (Kind of like the debate over gun control.)

And I believe it was Raistmers point as well, although something may have got lost in translation.

From what I have read here most seem to be in agreement, rescheduling within limits is OK

Outside of that it is bad for the project and considered cheating and there should perhaps be a way of stopping it.

So what is the way forward.
ID: 1479824 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1479826 - Posted: 20 Feb 2014, 22:26:46 UTC

I'm left wondering if there's a way to solve this through the BOINC framework. Would it be possible to modify the BOINC client to look at the entire cache of workunits as a single pool, and assign work to whatever resource needs it at the moment it's ready to be worked on?

I understand that this kind of change would be much larger than my simple question, involving lots of re-designing of several of BOINC's calculations for work-fetch, etc., but it could be done. Is this something that should be suggested to the developers (David) as a goal to work toward or something to implement in the next major release?
ID: 1479826 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1479827 - Posted: 20 Feb 2014, 22:32:37 UTC - in response to Message 1479826.  
Last modified: 20 Feb 2014, 22:33:16 UTC

Don´t belive it´s to hard to implement, since the servers actualy knows how many WU each hosts have pending, then block cheating it´s simple.

Think, the @¨&@%!!! limits uses almost the same function, so you just need to block the DL of new WU form a host who allready exceded the 200WU limit in their pending cache. Some tweeks could be needed to handle the crashed hosts, etc. but in the deep it´s a very simple function.
ID: 1479827 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1479828 - Posted: 20 Feb 2014, 22:33:22 UTC

My experience with making suggestions to David is that they come back and bite you in the face!
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1479828 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1479829 - Posted: 20 Feb 2014, 22:34:38 UTC - in response to Message 1479828.  
Last modified: 20 Feb 2014, 22:36:20 UTC

My experience with making suggestions to David is that they come back and bite you in the face!

Looks like we are doomed x 2!!! So the best i could do now is going to smoke a c-word and drink some beers. Cheers.
ID: 1479829 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1479832 - Posted: 20 Feb 2014, 22:41:13 UTC - in response to Message 1479827.  

Don´t belive it´s to hard to implement, since the servers actualy knows how many WU each hosts have pending, then block cheating it´s simple.

Think, the @¨&@%!!! limits uses almost the same function, so you just need to block the DL of new WU form a host who allready exceded the 200WU limit in their pending cache. Some tweeks could be needed to handle the crashed hosts, etc. but in the deep it´s a very simple function.



Right, I'll craft a lengthy post detailing the way I see the whole situation. It'll take a while to present the things needed in a diplomatic, non-inflammatory way, but I think it'll help at this point, and explain why I've stayed out of this particular circus (until now)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1479832 · Report as offensive
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : Rescheduling - final attempt


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.