To Whomever this concerns: Backoffs and error -6

Message boards : Number crunching : To Whomever this concerns: Backoffs and error -6
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887387 - Posted: 22 Apr 2009, 21:39:34 UTC - in response to Message 887375.  

This thread is starting to teach me a few things about computing that I dont know. I guess its making me want to learn more and consider upgrading/trying an optimized app. Im a rookie at that kind of thing (Hell rookies would know more) so I guess Im asking where to read more about it? Ive read the sticky thread on optimized apps and as it states there optimized apps are for advanced users only so I guess Im looking for the Optimizing Apps for Dummies Handbook! Any suggestions or am I an old dog whos just Pipe Dreaming?

Thanks

Can I offer you The Illustrated Guide to Installing an optimised application? It's a bit elderly, and doesn't cover the topics covered in this thread, but the basic principles are still relevant and may give you a background glimpse into what the other threads are talking about.
ID: 887387 · Report as offensive
Profile Rick B

Send message
Joined: 6 Mar 01
Posts: 299
Credit: 1,532,791
RAC: 0
Canada
Message 887395 - Posted: 22 Apr 2009, 21:49:28 UTC - in response to Message 887387.  

Can I offer you The Illustrated Guide to Installing an optimised application? It's a bit elderly, and doesn't cover the topics covered in this thread, but the basic principles are still relevant and may give you a background glimpse into what the other threads are talking about.[/quote]

Thanks I will start there
ID: 887395 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 887399 - Posted: 22 Apr 2009, 21:59:55 UTC

I wonder if its not possible to get the scheduler (on the server) to only send VLAR's in response to a cpu work request. That way they get branded as 6.03's and run on the cpu. The "normal" ones could be sent in response to either a cpu or cuda work request. This would also allow those with cpu's to actually get some MB's seeing as they nearly always get gobbled up by the cuda hosts.

I understand that it means more work for the scheduler but it could solve a couple of "features" of the current sent up.
BOINC blog
ID: 887399 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887400 - Posted: 22 Apr 2009, 22:04:27 UTC - in response to Message 887399.  

I wonder if its not possible to get the scheduler (on the server) to only send VLAR's in response to a cpu work request. That way they get branded as 6.03's and run on the cpu. The "normal" ones could be sent in response to either a cpu or cuda work request. This would also allow those with cpu's to actually get some MB's seeing as they nearly always get gobbled up by the cuda hosts.

I understand that it means more work for the scheduler but it could solve a couple of "features" of the current sent up.

I doubt you realise just how much more work for the scheduler. I've not looked at the code but I imagine the scheduler merely receives WU's and sends them out without looking at the contents at all. The extra code (and time) to investigate the AR of every WU to be sent out, and then direct them accordingly, would be a killer.

F.
ID: 887400 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887416 - Posted: 22 Apr 2009, 23:05:48 UTC - in response to Message 887400.  

I wonder if its not possible to get the scheduler (on the server) to only send VLAR's in response to a cpu work request. That way they get branded as 6.03's and run on the cpu. The "normal" ones could be sent in response to either a cpu or cuda work request. This would also allow those with cpu's to actually get some MB's seeing as they nearly always get gobbled up by the cuda hosts.

I understand that it means more work for the scheduler but it could solve a couple of "features" of the current sent up.

I doubt you realise just how much more work for the scheduler. I've not looked at the code but I imagine the scheduler merely receives WU's and sends them out without looking at the contents at all. The extra code (and time) to investigate the AR of every WU to be sent out, and then direct them accordingly, would be a killer.

F.

Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.
ID: 887416 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887421 - Posted: 22 Apr 2009, 23:24:27 UTC - in response to Message 887416.  

Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

Nice one (again). I stand corrected. Just goes to prove you are never to old to learn :))

F.
ID: 887421 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 887426 - Posted: 23 Apr 2009, 0:02:53 UTC - in response to Message 887416.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.
ID: 887426 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887434 - Posted: 23 Apr 2009, 0:19:21 UTC - in response to Message 887426.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

Well, the scheduler is handling the MB/CPU and MB/CUDA cases already, so there's nothing extra there. And for v6.6 clients, it's actually the <plan_class> which determines where the processing takes place: the version number is irrelevant. (It's perfectly possible to write an app_info which specifies version 608 for CPU work, and version 608 plus <plan_class>CUDA for the GPU. I've done it, and it works throughout the cycle - fetch, crunch, report. But I strongly suggest we stick with the current convention of using 603 to refer exclusively to CPU processing, and 608 exclusively to CUDA processing. Less scope for confusion that way.)

The point is, all the handling mechanisms are already in place. No further server stress there. The only change would be one further conditional in the allocate/don't allocate test.
ID: 887434 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65755
Credit: 55,293,173
RAC: 49
United States
Message 887460 - Posted: 23 Apr 2009, 2:07:43 UTC - in response to Message 887434.  
Last modified: 23 Apr 2009, 2:08:14 UTC


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

Well, the scheduler is handling the MB/CPU and MB/CUDA cases already, so there's nothing extra there. And for v6.6 clients, it's actually the <plan_class> which determines where the processing takes place: the version number is irrelevant. (It's perfectly possible to write an app_info which specifies version 608 for CPU work, and version 608 plus <plan_class>CUDA for the GPU. I've done it, and it works throughout the cycle - fetch, crunch, report. But I strongly suggest we stick with the current convention of using 603 to refer exclusively to CPU processing, and 608 exclusively to CUDA processing. Less scope for confusion that way.)

The point is, all the handling mechanisms are already in place. No further server stress there. The only change would be one further conditional in the allocate/don't allocate test.

I agree, I use 603(cpu) and 608(cuda) exclusively, Of course doing six tasks is rather interesting as I had to set the cache at 0 days from 3 days and set the Boinc here on the PC at NNT to get down to near 2 or 3 worth of work as I'm having trouble reporting with 6.6.20, Uploading works as usual, But since before I set Boinc to NNT Boinc wasn't doing any reporting of work and the days in the caches goes up and down depending cause of the cpu(a Q9300 Intel cpu) and a GTX295 and depending on which uploads at any moment in time, One will shorten It and the other seems to stretch It out. Right now It's at 5 days and 22hours and some mins/secs and earlier It was at about 6 days and 22hours and some mins/secs, Later It might be higher again.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887460 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 887463 - Posted: 23 Apr 2009, 2:09:34 UTC - in response to Message 887426.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

The scheduler is already dealing with three different apps, setiathome_enhanced, astropulse, and astropulse_v5.

Adding a fourth is possible but probably not the best way to go. BOINC already has a mechanism for steering work based on other considerations, Homogeneous Redundancy. The criteria by which hosts are judged to be in a particular HR class can be selected by the project, and the splitters (also project specific) set what HR class should be used for each WU they produce.

The only version setting for work is the <min_version> required for all tasks of a type; IOW they could require 6.08 for all setiathome_enhanced work but that would not be helpful.
                                                              Joe
ID: 887463 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65755
Credit: 55,293,173
RAC: 49
United States
Message 887622 - Posted: 23 Apr 2009, 16:04:42 UTC
Last modified: 23 Apr 2009, 16:07:52 UTC

Nvidia from what I've read has left the Seti closet and departed for other tasks and left Seti hanging in the wind. VLAR will remain a problem and there are 3 potential solutions:

1. There is the fpops idea someone here put up to keep VLARs out of 6.08, But so far I think the server guys don't want to do that either, Even though they could and It would be ideal as then the VLAR killer wouldn't be needed anymore.

2. One some HATE(VLAR killer that generates an error -6 and the Communication Delays that come with -6 errors)

3. One that has to be done when Boinc is shut down and when one has less than 500 WU's in ones Cache or else It won't work and It has to be done everyday and is in a PERL script(No automation yet or integration), lovely.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887622 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887629 - Posted: 23 Apr 2009, 16:34:10 UTC - in response to Message 887622.  

Nvidia from what I've read has left the Seti closet and departed for other tasks and left Seti hanging in the wind. VLAR will remain a problem and there are 3 potential solutions:

1. There is the fpops idea someone here put up to keep VLARs out of 6.08, But so far I think the server guys don't want to do that either, Even though they could and It would be ideal as then the VLAR killer wouldn't be needed anymore.

2. One some HATE(VLAR killer that generates an error -6)

3. One that has to be done when Boinc is shut down and when one has less than 500 WU's in ones Cache or else It won't work and It has to be done over and over again everyday and is a PERL script(No automation yet or integration), lovely.

That's about the size of it. I agree, not pretty.

The trouble is, getting anything changed.

(0) requires us to re-engineer a strategic decision by a major manufacturer during a recession. Not easy, although I'll keep nagging them whenever the opportunity presents itself.

(1) is closer to home, but still requires us to beg for favours - we can't control the project staff. I wonder whether Blurf has any feedback from the second post in this thread yet?

(2) is ugly, and I can think of at least one person (apart from me) who hates it!

Which leaves (3). That will always require BOINC to be shut down, but the shut down/restart can be made automatic (even put on a timer/scheduler) and will only take 10 seconds or less. The 500 limit is just something Raistmer's put in as a precaution in the particular optimisation he's decided to pursue - it isn't needed in the simpler case of only kicking VLARs into the long grass (CPU).

The beauty of (3) is that it is entirely within the control of the volunteer community: anybody can write a script. The drawback is that we need to find a volunteer from within our own ranks with the skills and motivation to write a good script.

I had a PM overnight from a potential volunteer with claimed vbs skills who might be interested in helping Fred's baby grow up into a big, strong adult. I'm waiting for re-contact with an email address so I can send him the file attachments [edit - just arrived - I'll send them as soon as I've finished this]. And Jason has pointed out some tools which may help to make it more robust and easy to use.

SJ, do you run your BOINC as a service, or under a user account? Part of the difficulty with automation is handling both cases at the same time. We could maybe work on an interim one for just your setup first, and get you to test it?
ID: 887629 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 887633 - Posted: 23 Apr 2009, 16:45:35 UTC - in response to Message 887629.  

I had a PM overnight from a potential volunteer with claimed vbs skills who might be interested in helping Fred's baby grow up into a big, strong adult. I'm waiting for re-contact with an email address so I can send him the file attachments [edit - just arrived - I'll send them as soon as I've finished this]. And Jason has pointed out some tools which may help to make it more robust and easy to use.

The hard part in this case is the design and "proof of concept."

Turning that idea into a WinApp (or a Linux app.) is trivially easy, on the order of two or three minutes.

Of course, whenever a programmer says how long something will take, you double the number and go to the next higher unit.

It shouldn't be that hard to use the BOINC API to tell BOINC to stop, process the files, and let it restart.

ID: 887633 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887656 - Posted: 23 Apr 2009, 17:29:02 UTC - in response to Message 887633.  

I had a PM overnight from a potential volunteer with claimed vbs skills who might be interested in helping Fred's baby grow up into a big, strong adult. I'm waiting for re-contact with an email address so I can send him the file attachments [edit - just arrived - I'll send them as soon as I've finished this]. And Jason has pointed out some tools which may help to make it more robust and easy to use.

The hard part in this case is the design and "proof of concept."

Turning that idea into a WinApp (or a Linux app.) is trivially easy, on the order of two or three minutes.

Of course, whenever a programmer says how long something will take, you double the number and go to the next higher unit.

It shouldn't be that hard to use the BOINC API to tell BOINC to stop, process the files, and let it restart.

Pitfalls, pitfalls.

Stopping BOINC is easy - "boinccmd --quit". But where is boinccmd? Possibly on another drive. I don't know how to get back from from the data drive/path to the binary drive/path without querying the registry. (You need to start in the data directory to have access to client_state.xml). And before that, I don't even know how to determine whether I'm dealing with a service install or a user install unless I installed it myself. Ah - lightbulb moment as I type - registry again.

I agree the first hard part is concept and design. We're lucky - Fred has already done that, and it's been 100% reliable in the time I've been using it. But there's a second hard part: generalising and armour-plating it to run reliably on everybody else's computer, with the minimum of installation and the maximum of flexibility. Only then do you get to do the easy bit, which as you say takes minutes, or even seconds, to generate an executable form.
ID: 887656 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 887658 - Posted: 23 Apr 2009, 17:32:14 UTC - in response to Message 887463.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

The scheduler is already dealing with three different apps, setiathome_enhanced, astropulse, and astropulse_v5.

Adding a fourth is possible but probably not the best way to go. BOINC already has a mechanism for steering work based on other considerations, Homogeneous Redundancy. The criteria by which hosts are judged to be in a particular HR class can be selected by the project, and the splitters (also project specific) set what HR class should be used for each WU they produce.

The only version setting for work is the <min_version> required for all tasks of a type; IOW they could require 6.08 for all setiathome_enhanced work but that would not be helpful.
                                                              Joe


Agreed, but isn't the show stopper here that HR is a DB resource 'hog' which they can't 'afford' here?

I seem to recall that being the reason the idea of using it has been shot down in the past.

Alinator
ID: 887658 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887660 - Posted: 23 Apr 2009, 17:39:34 UTC - in response to Message 887658.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

The scheduler is already dealing with three different apps, setiathome_enhanced, astropulse, and astropulse_v5.

Adding a fourth is possible but probably not the best way to go. BOINC already has a mechanism for steering work based on other considerations, Homogeneous Redundancy. The criteria by which hosts are judged to be in a particular HR class can be selected by the project, and the splitters (also project specific) set what HR class should be used for each WU they produce.

The only version setting for work is the <min_version> required for all tasks of a type; IOW they could require 6.08 for all setiathome_enhanced work but that would not be helpful.
                                                              Joe


Agreed, but isn't the show stopper here that HR is a DB resource 'hog' which they can't 'afford' here?

I seem to recall that being the reason the idea of using it has been shot down in the past.

Alinator

What I put forward to Eric was a bespoke scheduler with the additional logic test hard-coded into it: acknowledging that this is programming-resource costly and a maintainer's nightmare. But I hoped that it would only be needed for a few weeks, starting two months ago - as soon as nVidia solved the VLAR problem, it could have been stripped out again and reverted to standard BOINC compatibility.

Now that nVidia have bailed / failed, the goalposts have moved.
ID: 887660 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 887661 - Posted: 23 Apr 2009, 17:42:04 UTC - in response to Message 887656.  

I don't even know how to determine whether I'm dealing with a service install or a user install unless I installed it myself. Ah - lightbulb moment as I type - registry again.


@Richard,
That bit is also relatively trivial without querying the registry directly - I have the code for that bit.

F.
ID: 887661 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14652
Credit: 200,643,578
RAC: 874
United Kingdom
Message 887666 - Posted: 23 Apr 2009, 18:00:58 UTC - in response to Message 887661.  

I don't even know how to determine whether I'm dealing with a service install or a user install unless I installed it myself. Ah - lightbulb moment as I type - registry again.

@Richard,
That bit is also relatively trivial without querying the registry directly - I have the code for that bit.

F.

Do you have an updated version you could share? Just put some debug flags in the copy I have - it detects Process and Manager correctly, but not Service. (Hadn't looked before, while just testing for personal use - but it becomes more important now)
ID: 887666 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65755
Credit: 55,293,173
RAC: 49
United States
Message 887670 - Posted: 23 Apr 2009, 18:10:56 UTC - in response to Message 887629.  

Nvidia from what I've read has left the Seti closet and departed for other tasks and left Seti hanging in the wind. VLAR will remain a problem and there are 3 potential solutions:

1. There is the fpops idea someone here put up to keep VLARs out of 6.08, But so far I think the server guys don't want to do that either, Even though they could and It would be ideal as then the VLAR killer wouldn't be needed anymore.

2. One some HATE(VLAR killer that generates an error -6)

3. One that has to be done when Boinc is shut down and when one has less than 500 WU's in ones Cache or else It won't work and It has to be done over and over again everyday and is a PERL script(No automation yet or integration), lovely.

That's about the size of it. I agree, not pretty.

The trouble is, getting anything changed.

(0) requires us to re-engineer a strategic decision by a major manufacturer during a recession. Not easy, although I'll keep nagging them whenever the opportunity presents itself.

(1) is closer to home, but still requires us to beg for favours - we can't control the project staff. I wonder whether Blurf has any feedback from the second post in this thread yet?

(2) is ugly, and I can think of at least one person (apart from me) who hates it!

Which leaves (3). That will always require BOINC to be shut down, but the shut down/restart can be made automatic (even put on a timer/scheduler) and will only take 10 seconds or less. The 500 limit is just something Raistmer's put in as a precaution in the particular optimisation he's decided to pursue - it isn't needed in the simpler case of only kicking VLARs into the long grass (CPU).

The beauty of (3) is that it is entirely within the control of the volunteer community: anybody can write a script. The drawback is that we need to find a volunteer from within our own ranks with the skills and motivation to write a good script.

I had a PM overnight from a potential volunteer with claimed vbs skills who might be interested in helping Fred's baby grow up into a big, strong adult. I'm waiting for re-contact with an email address so I can send him the file attachments [edit - just arrived - I'll send them as soon as I've finished this]. And Jason has pointed out some tools which may help to make it more robust and easy to use.

SJ, do you run your BOINC as a service, or under a user account? Part of the difficulty with automation is handling both cases at the same time. We could maybe work on an interim one for just your setup first, and get you to test it?

Boinc is setup as a User account, As running Boinc as a service doesn't interest Me.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887670 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65755
Credit: 55,293,173
RAC: 49
United States
Message 887671 - Posted: 23 Apr 2009, 18:15:16 UTC - in response to Message 887660.  


Ah, but you know from your own script that you don't have to look at the actual AR - the fpops_est is enough for a selection. And fpops_est is already processed by the scheduler, because it's needed to calculate how long a candidate task will run, and hence how many tasks are needed to fill the client's "nnn seconds" request.

That's the basis of the suggestion I put to Eric on 6 February, anyway. It may well be flawed.

... and if I'm reading Raistmer's script correctly, it's the version number (603 or 608) that decides what app. can do the work.

The splitters would know things like the fpops_est, don't they also write the version required?

... that would make the scheduler have to deal with three different apps, bit it seems that it'd work.

The scheduler is already dealing with three different apps, setiathome_enhanced, astropulse, and astropulse_v5.

Adding a fourth is possible but probably not the best way to go. BOINC already has a mechanism for steering work based on other considerations, Homogeneous Redundancy. The criteria by which hosts are judged to be in a particular HR class can be selected by the project, and the splitters (also project specific) set what HR class should be used for each WU they produce.

The only version setting for work is the <min_version> required for all tasks of a type; IOW they could require 6.08 for all setiathome_enhanced work but that would not be helpful.
                                                              Joe


Agreed, but isn't the show stopper here that HR is a DB resource 'hog' which they can't 'afford' here?

I seem to recall that being the reason the idea of using it has been shot down in the past.

Alinator

What I put forward to Eric was a bespoke scheduler with the additional logic test hard-coded into it: acknowledging that this is programming-resource costly and a maintainers nightmare. But I hoped that it would only be needed for a few weeks, starting two months ago - as soon as nVidia solved the VLAR problem, it could have been stripped out again and reverted to standard BOINC compatibility.

Now that nVidia have bailed / failed, the goalposts have moved.

Desertion, One problem left and they go poof. sigh.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 887671 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : To Whomever this concerns: Backoffs and error -6


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.