|
41)
留言板 :
Number crunching :
Getting rid of "stale" work units?
(消息 37642)
发表于:17 Oct 2004 作者: Bart Barenbrug Post: Ned asked: > Has anyone simply tried manually deleting the units that have/will expire > before you can process them and leaving the rest. Just wonder if this would > work? Not quite, but after installing a newer boinc version to a new directory instead of the one where my old boinc client was located, I did notice that I lost my cache worth of WUs due to that. So the new client started by downloading a whole new batch of WUs. Wasting the WUs in the old cache seemed a bit of a shame (leaving others waiting for credit a long time), and since they were for the same version of the seti client, it seemed to me that I should be able to merge the two caches. This was accomplished by moving the contents of old project's directory to the new one, along with the `slots' directory with the work units that were being processed by the old boinc client (renaming the subdirectories to higher slot numbers not used by the new clients and adjusting the entries in the old client_state.xml accordingly), and subsequently merging the client_state.xml files by hand. A lot of work (I'll be sure to install future boinc clients to the same directory as the old one), with the potential that making a mistake would trash the newly cached WUs as well (so I made a copy of the whole new boinc directory before beginning), so only do this if your really know what you're doing. In short: it does seem possible to edit the client_state.xml file (when boinc is not running of course) and get away with it, so I guess deleting WUs from there is an option too (though be sure to keep the file_info, workunit and results in correspondence: delete all of them for the same work unit or none). Maybe just deleting the WU file from the project subdirectory will also do the trick, as the seti application will have no choice other than reporting an error when it gets told to start that WU (since it won't have a WU file to load anymore), so boinc can move on quickly. This is probably what boincview is doing. |
|
42)
留言板 :
Number crunching :
more power - less credit
(消息 37633)
发表于:17 Oct 2004 作者: Bart Barenbrug Post: Let me first try and rephrase what I understand from Benher's (to see if I got it), and then get to my post (which addresses a slightly different, though related issue). This is my understanding of Benher's post: 1) use the current way of predicting how long a work-unit will compute. 2) do the actual computation and measure the time 3) divide the two to derive a scale factor which you can apply to __a) improve the next predictions to be able to better predict how long a WU will take __b) use also as a scale factor for the claimed credit Imho 3b doesn't solve much (but correct me if I'm wrong), since the multiplier only says something about the accuracy of the prediction, so using the multiplier for claimed credit only brings this inaccuracy into the claimed credit instead of solely relying on the actual time measured. It would only help if we knew that the prediction was a more accurate way of measuring the work done than the actual time spent, but I doubt that that is the case. It would be nice though to use the multiplier to improve the prediction. Further explanation of my post: I'm actually addressing a slighly different issue, namely that there now exist benchmark numbers for floating point computation and integer computation, but these aren't taken into account in a meaningful way. The new formula I propose for weighing the two benchmark numbers (actually dependent on the amount of floating point versus integer computation done for a typical work unit in a given project) improves the credit system on this point alone. But this may improve the balance between credits given for the same work unit processed on different systems (which can now be quite far apart). Let me illustrate with an example. Suppose there is a boinc project where a work unit needs 80 floating point operations and 20 integer operations. For this project the value of "p" in my formula would therefore be 0.8. Suppose also that I have a client that can do 1 floating point operation per second and 5 integer operations per second (the latter two are the benchmark numbers now used: host.p_?pops, respectively also denoted as f and i in my formulae). That would mean that my computer would take 80 seconds to do the floating point work (80/1) and 4 seconds to process the integer operations (20/5). So total time reported would be 84 seconds. The current way of determining claimed credit just averages the two benchmark scores and applies that to the time spent (with a few constants applied too). So the benchmark average is 3 (average of 5 and 1), which multiplied with the time spent for the work unit results in 252 (=3*84) operations worth of credit. The actual work for the example was actually only 100 operations (80+20), and another client may come up with a different claimed credit altogether: suppose client 2 can do 4 floating point operations a second, and 2 integer operations per second. Total time spent for this client will be 80/4 + 20/2 = 30 seconds, and this is what will be used to compute claimed credit. According to the current way of working, this time spent will be multiplied with the average of the two benchmarks (3 again), now resulting in a claimed credit of 90 (=3*30) operations worth of credit. Quite a big difference compared to the 252 claimed by client 1. So machines having different ratios of floating point versus integer speed result in claimed credit that can be quite different for the same work unit. What my formula does is take all this into account. For the example above, the project's "p" would be 0.8. For client 1, we have f=1, i=5 and t=84, whereas for client 2 we have f=4 and i=2 and t=30. Putting that in my formula w=(f*i/(p*i+(1-p)*f))*t yields the original 100 operations for that work unit for both clients, so they claim the same amount of credit. Or to put it differently, factor f*i/(p*i+(1-p)*f) results in 1.19 operations worth of credit per second for client 1 (which multiplied by the 84 seconds yields the 100 operations), and 3.33 operations worth of credit per second for client 2. This factor is so much higher for client 2 because it can process floating point so much faster and this is important since floating point is the major part of a work unit's work for this project. Another project may be much more reliant on integer computation. Let's say that a work unit for this second project needs 10 floating point operations and 90 integer operations. My two clients from before would report 10/1+90/5=28 seconds and 10/4+90/2=47.5 seconds of time spent respectively, resulting in claimed credit of 84 (28*3) and 142.5 (=47.5*3) respectively using the current way of computing credit. So now client 2 claims more credit (whereas for the first project it was client 1). Again, my formula recovers the original 100 operations for both clients, yielding the same amount of credit claimed. Note that for this project, the value of p is 0.1, so the two benchmark values get combined differently into the "operations worth of credit per second" for this project. In this project, client 1 gets 3.57 operations worth of credit for every second spent, and client 2 only 2.1. That is because client 1 is so much better at integer computations, which is important for this second project. So while this improved formula would not solve every issue with the credit system, it should improve the imbalance in claimed credits between clients that have different ratios of speed of processing floating point versus integer. And it also provides a means to differentiate between the kind of work (in terms of integer versus floating point) done for different projects. |
|
43)
留言板 :
Number crunching :
more power - less credit
(消息 37482)
发表于:17 Oct 2004 作者: Bart Barenbrug Post: fabs(host.p_fpops)/1e9 + fabs(host.p_iops)/1e9 shows that Drystones and Whetstones are simply averaged. So basically half of the time is treated as floating point work, and the other half as integer work. So for example if your computer actually spent say 80% of its time on floating point computations, and only 20% on integer operations, 30% of your floating point work would be credited according to your integer benchmark results. So this will mostly affect machine which have quite different scores for floating point and integer. It also depends on how the work for a work unit is distributed over floating point and integer work, and this is where different boinc projects may differ. So a solution would be to assign a ratio to each project indicating how much of the operations spent on a work unit are floating point operations versus integer operations. So if for example your average seti WU would use (and this is hypothetical) mostly floating point FFTs (let's say 80% floating point and 20% integer work), but CPDN would use mostly integer (let's say 70% integer, and 30% floating point), you would have a project setting that would be 0.8 for seti and 0.3 for CPDN (this would be the "magic number" suggested earlier). Taking this weight into account (how, see below) still doesn't account for variations of the composition of a work unit's computation (in terms of floating point and integer work) within a project (due to angle rate issues etc.), but at least it could provide a way to weigh the work more precisely at least in the aspect of floating point versus integer, without being hampered by differences between projects (which are accounted for in the ratio now). As credits are computed currently, there could be one benchmark value (which is the average of the floating point and integer measurements): the distinction between them is not really used at the moment. Let's call the parameter project.f_weight. At first glance, the formula above might then become something like: (project.f_weight * fabs(host.p_fpops) + ((1.0-project.f_weight)*fabs(host.p_iops))/2e9 where project.f_weight is a number between 0 and 1 (leaving 1.0-project.f_weight as the integer weight). But this doesn't work, since the slower a computer is on for example floating point, the more of the time is actually spent in floating point computations. Looking at this in a little more detail: if a work unit needs a certain amount of operations, say w, and a certain part p (p between 0 and 1, this is the project setting project.f_weight) of those are floating point operations, the we know that p*w operations are floating point operations and (1-p)*w are integer operations. p can be determined by profiling the computation of a number of work units for a given project and seeing how many operations are floating point, and how many are integer. We also have out benchmark results: f (=host.p_fpops, telling us how many floating point operations the client can perform per second) and i (=host.p_iops giving the number of integer operations per second). We then know that the time spent on floating point operations is p*w/f and the time spent on integer operations is (1-p)*w/i. So the total time spent on the work unit is w*(p*/f+(1-p)/i). When a result comes back, it is actually the value of w that we want to know and the total time spent that we do know. At least that's my assumption: that w is a better indicator of the amount of science done than the time spent on the work unit, and this is debatable, I think. So let the computation time that gets reported back for a work unit be t. Then from w*(p*/f+(1-p)/i)=t we can derive w=(f*i/(p*i+(1-p)*f))*t. So the factor to compute from a given time t, the number of operations w spent is: f*i/(p*i+(1-p)*f) which is what could be used in the formula to compute host.credit_per_cpu_sec instead of just averaging host.p_fpops and host.p_ipops. This does make things a bit more complicated though (especially if you would also want to weigh in network time etc.), and it doesn't account for CPUs possibly being able to perform floating point and integer in parallel etc., but at least it would use the two benchmark numbes in a more meaningful way than they currently are. |
|
44)
留言板 :
Number crunching :
Three successful results but no credits - still a queue?
(消息 37461)
发表于:17 Oct 2004 作者: Bart Barenbrug Post: Here's another one: work unit 837922. Three succesfull results (plus one unsuccesfull). Credit pending, and weirdest of all: validate state Initial for all results. Same for work unit 822548: three succesfull results, but validate state Initial for all three results. And there's more of those: work units 578816, 542976, 1303958, 955841, 1210644, and 888674. Basically all my pending results from August and back have at least three succesfull results, but a validate state of Initial for all results. |
|
45)
留言板 :
Number crunching :
Much slower processing with SETI@home ver 4.5
(消息 32005)
发表于:2 Oct 2004 作者: Bart Barenbrug Post: Same here: the 360K WUs used to complete in around 4 hours 20 minutes, but now they're taking well over 6 hours (on windows on my P4-2.6HT running 2 WUs at the same time both before and after the upgrade). |
|
46)
留言板 :
Number crunching :
New Units - but no crunching
(消息 19713)
发表于:30 Aug 2004 作者: Bart Barenbrug Post: I'm not on dial-up (using a cable modem). and I'm happy to report I just received some 20 WUs, and my computer's started crunching again. It just took a bit of time for it to find a scheduler it could connect to. (and for completeness: I didn't actually uninstall my previous install of boinc 4.05 in the control panel, but only manually renamed the directory). I can always throw away the old dir.) |
|
47)
留言板 :
Number crunching :
New Units - but no crunching
(消息 19701)
发表于:30 Aug 2004 作者: Bart Barenbrug Post: I reinstalled boinc_4.05_windows_intelx86.exe (available from many boinc projects). I first renamed my c:\program files\boinc directory to something different for backup (which was good since the newly installed boinc asked for the project url and my code, and I could copy-n-paste that from the previous account_setiathome.berkeley.edu.xml). I'm still waiting for schedulers to respond. Nothing since my last post. I'll just let it do its thing and see tomorrow if it's gotten some work. I assume the schedulers are swamped now, with new clients being installed and reset everywhere. |
|
48)
留言板 :
Number crunching :
New Units - but no crunching
(消息 19691)
发表于:30 Aug 2004 作者: Bart Barenbrug Post: Reinstalled as well. Got a good seti client and one WU that it was done with within minutes. Now I get "no work from project" (and a lot of "No schedulers responded"). And I had to merge in the newly created computer (no biggy). So I guess I'm back to waiting, but at least I got good client software now (I hope that all this resetting doesn't trash too many of the WUs that were prepared to give us a good start with the new boinc...) |
|
49)
留言板 :
Number crunching :
New Units - but no crunching
(消息 19654)
发表于:30 Aug 2004 作者: Bart Barenbrug Post: Well, I reset, but all the WUs that were waiting on the seti software are now gone, and all I get is a "daily quota exceeded" message, so it's not downloading any WUs, nor the seti client software (for me it wasn't only the pdb that was bad/missing: my whole seti project directory under boinc is empty now: no exe or banner either). |
|
50)
留言板 :
Number crunching :
Just a SETI Classic observation
(消息 12781)
发表于:27 Jul 2004 作者: Bart Barenbrug Post: > I am not sure I understand youir answer here. You start off by saying that > you could cheat and end by describing some of the mechanisms that will prevent > that. You're right: I wasn't clear on that. Overall, indeed cheating does not pay. It would be easy to make a client that claims more than it should, but the server will not grant it. |
|
51)
留言板 :
Number crunching :
Just a SETI Classic observation
(消息 12694)
发表于:27 Jul 2004 作者: Bart Barenbrug Post: > But, so far as I can tell, you cannot cheat BOINC and it's computations. Why not? If someone really wants to: the source is available, so you can make a client that reports higher benchmarks scores or longer computation time etc. But that only counts towards claimed credit, and the median operation performed by the validator server-side will ensure that if the credit for one of the results for a unit has been "artifically" increased, the actual granted credit will be one of the others. So a cheater won't get rewarded. Unless very many people cheat and there are more than one "artifically increased" claimed credit scores for one result, but should that happen, more sanity checks on claimed credits can be put in the server-side software. The point is to prevent what so often happens: that a few bad apples ruin it for the rest. The median operation gaves reasonable protection against that. |
|
52)
留言板 :
Number crunching :
Getting Boinc to immediately return WUs that are complete
(消息 11812)
发表于:24 Jul 2004 作者: Bart Barenbrug Post: Thanks. I didn't know that that option was already implemented. Just wondering though: will this put more strain on the schedulers (since they are contacted more often)? I guess that since the latest upgrades, the schedulers aren't as much the problem anymore (it's the transitioners that have to catch up), but putting more strain server-side in general seems to be something that has to be done with care (if it really increases the server load) until everything is running smoothly. Bart |
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.