Posts by Jeff Buck

1) Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 1907608)
Posted 17 hours ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Grant (SSSF) wrote:
On my 32bit Vista system i'm running 344.11 Anything higher and even with CUDA50 I got driver restarts. I even tried just 1 GTX 750Ti, reserved a core for it and just used the defaults for SoG, but still got too many driver restarts.
With CUDA50 i'm running 2 WUs at a time (although it makes the Arecibo WUs take 3 times as long if run with a GBT WU), but at least there are no restarts. I've lost entire caches to restarts.
:-/
I'm running the 353.62 driver, which I think is as low as I can go with SoG. I'm pretty sure that's about what I was running with Cuda50, also, but I don't remember if I tried to go lower or not. Probably not.

Bernie Vine wrote:
So as both machines are getting a bit long in the tooth now I think I will let them retire gracefully.
Maybe it would be a good time to try experimenting with Linux. Since I've been running Linux on 3 of my other boxes, I've also considered switching over my problem child but, since I'd want to keep the Windows partition to make it dual-boot, I'd have to put a larger HD in it first. Just not enough room on the current drive for another 30GB partition.
2) Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 1907589)
Posted 18 hours ago by Profile Jeff BuckSpecial Project $250 donor
Post:
32bit OS?
I'd always thought of putting a C2Quad in mine thinking that would help (I've got 2 GTX 750Tis in there), but it's looking like it boils down to the 32bit OS, and the slow clock speed. A 64bit OS would probably allow the system to use more of the RAM, with less resource contention with the larger available address space.
On mine, the harder the GPUs work, the less time the CPU has to process WUs, and the higher the system resources taken up with Interrupts & DPCs (up to 15% when I tried to run SoG on it).
Yeah, 32-bit Vista on an old HP dc7700 with a C2D E7500. I did switch over to SoG (from Cuda50) to see if that would help, but there really wasn't any discernible change in the driver crashes or task hangs. I stuck with SoG, though, because I still get better overall production out of it. It can sometimes go several days without any hangs, then get a couple in the same day. Just no obvious pattern. That machine is mainly a crunch-only box. Once in a great while I use it for streaming (through an HDMI connection to my plasma TV), but on those occasions I shut down BOINC first.
3) Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 1907583)
Posted 19 hours ago by Profile Jeff BuckSpecial Project $250 donor
Post:
For me, even bumping the TdrDelay all the way up to 16 (0x00000010) didn't help, or at least didn't help much. I just figure that once I moved from the 550Ti to the 750Ti, the speed and efficiency of the newer GPU just became too much for the old C2D and/or the MB bus to handle consistently. I just looked at that system and found that it's had 3 driver crashes in the last 17+ hours, though none of them caused any tasks to hang. I've never tried to figure out if there's a pattern of any kind. It just didn't seem to be worth the effort.
4) Message boards : Number crunching : 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED (Message 1907532)
Posted 1 day ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Have you been having any video driver crashes on those machines? I have one machine that gets those quite frequently. The driver recovers, but the side effect is that GPU tasks that are running when the driver crashes sometimes seem to hang, though not always. The elapsed time increases but the progress does not. The only way I manage to avoid getting the timeouts that you're experiencing is to check on that machine a couple times a day to see if any tasks are hung. If they are, I just suspend those individual tasks briefly, then resume. When they start running again, the elapsed time drops back to a more normal figure and everything runs along just fine (unless there's another driver crash).

Interestingly, I never used to have that problem when I was running a 550Ti in that machine. It only started when I switched to a 750Ti a year or so ago, and continues today with a 960. At the time I installed the 750Ti, I was running Cuda, so the first thing I did when the driver crashes started was to switch to SoG. No change. After that, I tried newer drivers, older drivers, TDR delay increases, etc., etc., with no discernible effect, so I've just learned to live with it. I generally manage to avoid the errors, but I often end up with several hours of lost processing time before I discover that a task is hung.
5) Message boards : Number crunching : Task Deadline Discussion (Message 1906435)
Posted 6 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Is he using an old version of BOINC?
Is he suffering bad communications so piling in the ghosts (I had a rogue router that resulted in a couple of thousand ghosts on an inaccessible cruncher).
Is he trying to reschedule and failing?
Is he trying to bunker for some reason or other?

And probably a few more if I put my mind to it.
I think, from the earlier discussion, that it should already be clear that this host is doing nothing but downloading tasks that all eventually time out. That doesn't appear to represent a user who's actively doing anything. And he's on BOINC 7.8.3.

The question is why the scheduler is ignoring the "Max tasks per day" value, which has correctly been reduced to '1' for all the active applications.
6) Message boards : Number crunching : Task Deadline Discussion (Message 1906348)
Posted 6 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Since quota management seems to have become as much a part of this discussion as task deadlines, I thought it might be appropriate to use that host with 6,000+ tasks again to try to pose a basic question. Inasmuch as that host is generating nothing but timeout errors, the "Max tasks per day" for each active application has dropped to 1, as it should. According to the Application Details page for the host, there appear to be 5 currently active applications, so even with that strange 8x multiplier, the host should theoretically be limited to no more than 40 tasks per day. Yet it seems to be steadily downloading about 3 times that many.

My question is simply, how? Does anyone know what other factor is in play? Here are the details for the 5 active apps:
SETI@home v8 8.00 windows_intelx86
Number of tasks completed 	1923
Max tasks per day 	1
Number of tasks today 	15
Consecutive valid tasks 	0
Average processing rate 	11.24 GFLOPS
Average turnaround time 	24.99 days

SETI@home v8 8.00 windows_intelx86 (cuda42)
Number of tasks completed 	6144
Max tasks per day 	1
Number of tasks today 	42
Consecutive valid tasks 	0
Average processing rate 	78.79 GFLOPS
Average turnaround time 	43.39 days

SETI@home v8 8.00 windows_intelx86 (cuda50)
Number of tasks completed 	14954
Max tasks per day 	1
Number of tasks today 	49
Consecutive valid tasks 	0
Average processing rate 	96.93 GFLOPS
Average turnaround time 	52.35 days

SETI@home v8 8.05 windows_x86_64
Number of tasks completed 	69
Max tasks per day 	1
Number of tasks today 	16
Consecutive valid tasks 	0
Average processing rate 	10.99 GFLOPS
Average turnaround time 	27.12 days

SETI@home v8 8.08 windows_x86_64 (alt)
Number of tasks completed 	0
Max tasks per day 	1
Number of tasks today 	17
Consecutive valid tasks 	0
Average turnaround time 	31.19 days

Obviously, in every case, the number of tasks sent to that host each day bears no obvious relationship to the "Max tasks per day" figure. So, what the heck is the scheduler thinking???
7) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1906253)
Posted 6 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
No, those don't seem to have anything to do with the app. I see them on my machines from time to time, usually associated with a sudden system shutdown and restart, often due to a power outage. I don't know that there's anything you can do, unless the shutdown was voluntary, in which case exitting BOINC first before shutting down would probably avoid them.

EDIT: Then there are Invalids like this one, https://setiathome.berkeley.edu/result.php?resultid=6214250418, where the Special App is very much at fault. All those Spikes reported after the restart are phantoms, not reported by your wingmen. The only thing you can do from your end is to either restart as seldom as possible, or change your checkpoint interval to a value that's greater than your normal task run time. That forces tasks to always restart from the beginning. I don't really subscribe to that approach, because overall I would lose more processing time on all the restarted tasks than I would gain from avoiding the Invalids. I keep my checkpoint interval at 120 seconds. However, that option is there if you want to choose that route.

EDIT2: And I see you have another type of Invalid that originates with the Special App, https://setiathome.berkeley.edu/result.php?resultid=6213403484. The newer versions changed the processing sequence such that on some overflow tasks, the Special App reports all (or almost all) 30 signals as Triplets, while all other apps report all (or mostly all) Pulses. There's nothing at all you can do about these.
8) Message boards : Number crunching : Task Deadline Discussion (Message 1906124)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Brent Norman wrote:
That should probably be ... if(#InProgress/#Devices>ServerLimit) then CheckForResends.

The Reached Limit message would be coming from the client saying,
I have 100 CPU tasks and want more for 10 days, and I have 90 GPU tasks and want more for 10 days.
Server says, none for CPU, here is 10 GPU tasks ..
or (as we often see)
You have reached your limit (for CPU[as defined server limit]) and Server has No Tasks Available (for your GPU) ... a message for each device.

It must be the client reporting how many tasks it has for the server to know how many more will reach the ServerLimit and takes that as face value without checking if you have 6000 InProgress.

Then of course there is the time limit constraint formula in that mix too.

We know the server currently has the ability to check for resends since that is how we recover ghosts - by forcing an upload Failure/TimeOut. That must set a server side flag for that ID - I know we're out of sync, so I better check ...

It might be interesting to track what Team the people with 6000 tasks are from to see if there is a link to a 'Group' using modified code ...

This is off topic from deadlines, but is also a major cause of bloating.
Generally speaking, I think that's accurate. I believe, though, that the client only indirectly tells the server how many tasks it has on hand. It lets the server figure that out from the <other_result> sections in the scheduler request. Take a look at your sched_request_setiathome.berkeley.edu.xml file. I don't think there's a hard count in there anywhere, but each <other_result> section carries an app_version and, optionally, a plan_class. It appears to be up to the server to piece it all together, which it seems to do without reference to the "In Progress" count in the DB.

As far as those with thousands of tasks go, it's likely a mix of those who do it intentionally and those who are oblivious, such as the one in my earlier example. That one's just a host with a serious ghosting problem that the owner likely isn't even aware of.

As to whether "ghosts" are a major cause of bloating or not is also an open question. Without figures to back it up, the scale of the problem is difficult to judge. But we do know it exists and some hosts stand out more than others.
9) Message boards : Number crunching : Task Deadline Discussion (Message 1906099)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Brent Norman wrote:
I guess the simplest test for a need to check would be ... if(#InProgress/#Devices>100, CheckForResends, NormalRequest) that should definitely cut down server load for checking since those values should(??) already be used in the calculation for # to send.
I'm not sure. The Scheduler certainly looks to see if the host has reached its limit on tasks in progress, but I assume that's based on the tasks carried in the scheduler request. However, I believe the Scheduler primarily looks at the number of seconds of work requested and just decrements that figure as it pulls tasks off the feeder until either the request is satisfied or the task limit is reached. I don't know that it ever needs to see how many tasks are already recorded as "In Progress" in the DB for the host. Somebody like Richard could probably pin that down better than I can.
10) Message boards : Number crunching : Task Deadline Discussion (Message 1906097)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
betreger wrote:
. Once those values settle down

They may never.
On one host of mine they never settle down. It runs 45% reserved E@H on at GTX1060 and all the CPU on E@H also. The rest of the time it runs S@H on the GPU. The CPU and GPU run times seem to averaged between the CPU and GPU per project. It does not take into account the app or the processor. The GPU can processes at the rate of 4 wk units per hr, the cpu a bit less than 6 hrs per wk unit. I see est run times for the GPU to vary between 1 and 2.5 hrs. When it reports a CPU task the est run time for the cached GPU work goes way up, Boinc often thinks my 1.5 day cache is full even though it is not.
In addition to factoring in project share, the average turnaround time also depends on how many hours a day a system runs. If it's running 24/7, then the average should be close to actual. But if a machine only crunches, for instance, an average of 12 hours per day, then the average will be double the actual. In other words, if the average actual run time is 6 hours, the turnaround will average out to .5 days because the host is only returning 2 tasks per 24-hours, due to the 12-hour per day downtime. That sort of analogy probably also applies if you subtract the number of hours an alternate project is running from the base 24-hour day. It should, however, differentiate between CPU and GPU apps, just as it does on the Application Details page, but hey, some BOINC calculations are just inexplicable. That's all there is to it!
11) Message boards : Number crunching : Task Deadline Discussion (Message 1906090)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Brent Norman wrote:
OK, I revisit to an old subject ... Turn resend lost tasks back on. I remember it was said it was causing excess database load which is why it is off, but forget what the bottle neck was - hardware or software. Would SSDs in the array help or is it the overall system that can't handle the throughput?

Same goes for the daily, monthly, yearly stats and other things. They just keep shutting things off as the database size increases. So what is needed to fix the real problems with the huge database?
I think the issue would likely be with the query that would necessarily run against the DB for every scheduler request just to determine if any lost tasks were present for the host making the request. The scheduler would have to be able to match up every task listed in the "<other_results>" section of the scheduler request with the tasks that are stored on the DB for that host, in order to determine if there are any "lost" ones that need to be resent. That probably wasn't a big deal back when an average host only carried a few dozen tasks (or less). But with so many hosts now carrying task limits for multiple GPUs, I shudder to think what the overhead would be every 5 minutes when my 4-GPU host sends in a scheduler request. Multiply that by all hosts out there carrying their task limits and I can understand why it might bring the DB and server to its knees.

However, by having that feature turned off, it allows ghosts to accumulate in the DB, though how many that might be is anybody's guess. I, for one, try to be responsible in retrieving ghosts if I know about them, and especially if I'm responsible for having created them in the first place.....which I usually am. :^(

On the other hand, I've got to think that there must be some middle ground, where the lost task resend process could be turned back on for short periods when database access is near its low point, just to see if a significant number of lost tasks can find their way home again. Alternatively, just comparing a count of the <other_results> in the scheduler request with what the server thinks should be there could be a low-impact preliminary test. Only if a discrepancy was found would a full comparison be necessary.
12) Message boards : Number crunching : Task Deadline Discussion (Message 1906087)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
ausymark wrote:
I'm going to suggest what I think might be a simple system that may help matters.
...
...
So does any of the above have merit and is it simple enough to be easily implemented?

Cheers

Mark
Yes, it definitely has merit, but I don't think any changes relating to quota management are likely to be easily implemented.

The existing system attempts to do some of what you suggest. If you look at the "Applications details" for a given host (for instance, yours would be here), you'll already find fields for "Average turnaround time" and "Max tasks per day", for each application that can be sent to that host.

In general, I suspect your proposal would probably be more restrictive on new hosts, but perhaps overly generous for existing ones. For new hosts, the initial value for "Max tasks per day" is 33. I think that number is much too high as it is, notwithstanding the fact that the value in that field is somewhat misleading. As I recall, the project applies a multiplier to that value to calculate the true maximum, and my recollection is that the multiplier is 8. (If anyone can point to someplace that would verify or refute that number, please do.)

That initial setting is then incremented by 1 for every task that is completed and validated, so it can get to be quite a huge number for reliable hosts. Theoretically, the number is decreased for each time a task is marked as an Error or Invalid (the latter being somewhat questionable). It can never drop lower than 1, but with the multiplier, and with the scheduler having multiple applications to choose from for hosts running stock), that number is kind of meaningless.

Even for hosts with huge "Max tasks" numbers, there are already two limiting factors in play. One is the maximum of 100 CPU tasks per host and 100 tasks per GPU. The other is the "Store at least nn days of work" and "Store up to an additional nn days of work" combination, controllable through user/host Preference settings. Each of these has a maximum value of 10 days which, even in extreme cases, should limit a host to no more work than it can process in 20 days. If that value is less than the maximum allowed by the other limits, that's where it should be capped. Otherwise, the other limits should come into play and, in the case of the most productive hosts, should be far less than 20 days. So, given all that, why a host with a greater than 20-day average turnaround is able to download more than 1 or 2 tasks at a time is beyond me. :^)
13) Message boards : Number crunching : Task Deadline Discussion (Message 1906076)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
See my belated edit to my last post, as well.
14) Message boards : Number crunching : Task Deadline Discussion (Message 1906072)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
You make the case that each time a task has to be resent, it adds to the DB, I would ask...How?

It depends how the database is setup.
If it just changes a value in an existing field, then the only effect on the load would be indexing (if it's an indexed field). The would be no change in the number of records so no change in the size of the database. If it requires a new record to be added, then that would add to the database size, as well as a re-indexing load (on multiple fields, not just a single one).
Well, a resent task really is a new task, so it requires a separate entry, BUT.....it doesn't add to the DB any more than a newly split task does. And if there's a resend ready to go, it just delays the need for another new one to be split.

EDIT: Keep in mind that the actual WU data, the files that we download, are stored outside the DB in those "fanout" folders, so a task resend would, I assume, simply point to an existing file for the WU. Which brings to mind another point. As long as a task is being held hostage by a laggardly host, that WU data file is still taking up disk space.
15) Message boards : Number crunching : Task Deadline Discussion (Message 1906054)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Somehow, I'm not getting my point across about reduction in DB size, and I'm running out of ideas about how to present it better, but I honestly don't see how shortening deadlines could not help but achieve that goal. Perhaps someone else could present the case better, but I'll take one more brief crack at it.

I do see, though, that we have different ideas about what constitutes "DB load". I'm looking at it from the standpoint of the overhead of DB access, whether that's updates, queries, backups, or the like. All of those should have reduced overhead when there are fewer entries, and correspondingly smaller indexes, in the DB. You've referred to DB load as "DB records change per second", which I see as only part of the equation.

Now, I do agree that better quota management could ultimately prove to have a greater impact than shorter deadlines, but the two are not mutually exclusive. It's just that the current long deadlines appear to be unnecessary and shortening them should require much less effort than attacking the quota management issue, at least if it's approached as a simple across-the-board percentage reduction. If somebody has the willingness, skill, and time to have a go at it, then I'm all for it!

So, out of 8 listed typed of deadline miss the clear positive impact from deadline shortening could be achieved only on type 2. On most of other impact will be negative.
No, in the absence of any changes to quota management, the first 4 of my categories would absolutely benefit from shorter deadlines, and I believe those are the 4 that are responsible for the highest percentage of timeouts. Just looking at that small sample of 20 examples that I posted earlier, at least 12 out of the 20 fall into categories 2 through 4. The sample is too small for a truly valid statistical analysis (hence the absence of any true "drive-bys" in the sample), but I think it should still be at least generally representative. (If anyone else would like to take a crack at extracting a larger sample, the full list of timeouts for my October wingmen is available here I'm kind of getting burned out on the manual analysis.) In addition, category 7 should also see improvement, but whether shorter deadlines might increase timeouts for that group is uncertain, so I consider that one to be up in the air. To some extent, that may also be the case for Category 5.

Let me address what I think is one additional flaw in your argument, although it probably overlaps a bit with what I've said before about "resent" tasks versus "replacement" tasks. You make the case that each time a task has to be resent, it adds to the DB, I would ask...How? The host that receives that task is not receiving it as an extra task, on top of any newly split tasks it receives. The receiving host still only gets a given number of tasks for any given time frame (assuming optimal scheduler performance). They could conceivably be all newly split or all resends but are, of course, most commonly some mix of the two. Redistributing those timed out tasks does not increase the number of tasks that any given host, or the project as a whole, can process, any more than newly split tasks do. Nor does it increase the DB size.

Now, let me try using that host with 6,000+ tasks to see if I can better explain why shorter deadlines would help with that category of hosts. I believe the same logic applies to the more general "ghost" category and others, as well.

At the moment, that host currently has 6,248 tasks listed. Of those, 5,780 are shown as "In progress" while 468 are already "Errors" (all timed out as far as I can see). If we can assume that tasks are timing out at about the same 48.85-day average deadline that I identified in my first post in this thread (feel free to come up with a more specific average for this host, if anyone so chooses), then those "In progress" tasks are timing out at the rate of about 118 per day, likely about the same rate that new ones are being downloaded.

So, what happens if the average deadline is shortened by 20%, to 39.08 days? The host still will be downloading about 118 new tasks per day, but a shorter deadline will only permit it to accumulate about 4,624 tasks before they start timing out. Since every "In progress" task represents a Workunit which averages, IIRC, about 2.1 tasks per WU, that should make for a DB reduction of 1,156 WUs and about 2,427 task entries in the DB for this one host alone. Admittedly, this is an extreme case, but I've run across a number of those over time and the cumulative effect of many more less extreme cases is likely to be even more significant.

Now, would better quota management result in an even bigger benefit to DB size? Yes, yes, yes.....but what would be the relative effort to achieve it versus a simple reduction in deadline length?
16) Message boards : Number crunching : Task Deadline Discussion (Message 1905971)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
What I don't understand is how the second host was able to get so many tasks. Some kind of rescheduling ?

Tom
It's likely all those tasks are "ghosts", which the servers think were successfully sent to that host but which never completely made it into the BOINC task list on the host. BOINC probably currently shows no tasks in progress. Therefore, BOINC keeps requesting new tasks to fill the bucket, but from BOINC's perspective the bucket remains empty. Since those tasks eventually are recorded as errors when they time out, the quota management system should, theoretically, be able to throttle such a host, but it doesn't. One of the problems that I can see in this case is that, since the host is running stock, there are multiple usable CPU and GPU apps in play, with separate "max tasks per day" limits for each, and the scheduler just keeps trying several of them every day.

As was suggested in an earlier post by Wiggo, one possible cause of "ghosts" like this may be an overly aggressive AV program, possibly combined with some twists added by Win10. If the user isn't monitoring BOINC, he's likely not even aware that anything is wrong.
17) Message boards : Number crunching : Task Deadline Discussion (Message 1905968)
Posted 7 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Where the flaw?
I think there are two, actually. The first is assuming that the "bad" host is always sent a new task to replace each one that times out. I believe that happens only in a limited number of circumstances, which I tried to enumerate in my earlier post. In the case of a "ghost", that replacement is sent long before the deadline, anyway.

The second, I think, is the 24-hour purge cycle for validated WUs. In your example, even a 6-day deadline is an extremely short one, compared to either the existing deadlines or my proposed reduction. Having a purge cycle that represents such a high percentage of the deadline window likely inflates your DB occupancy numbers. Since I found that a deadline of slightly over 48 days was the average in my sample, a 1 day purge cycle would represent about 2.1% of that average. So, if you reduce the 48 days to 6, wouldn't it also be necessary to reduce the 24-hour purge cycle to about 3 hours, and use that for both your "long" and "short" deadline calculations?

To carry that further, my recommendation for a 20% reduction in deadlines would make a 4.8 day "short" deadline more appropriate than the 2 day one in your example, as well.
18) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1905807)
Posted 8 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
Here's an unusual Inconclusive where the Special App, running on one of my machines, reported 2 Gaussians which the SoG app did not.

Workunit 2764926755 (14fe07ab.5355.16841.5.32.110)
Task 6205405922 (S=18, A=1, P=0, T=4, G=7, BS=27.00169, BG=3.570737) x41p_zi3v, Cuda 9.00 special
Task 6205405923 (S=18, A=1, P=0, T=4, G=5, BS=27.00167, BG=3.570736) v8.22 (opencl_nvidia_SoG) windows_intelx86

The 7 Gaussians reported by the Special App are:
Gaussian: peak=3.506042, mean=0.5093156, ChiSq=1.364717, time=96.47, d_freq=1421071421.02,
	score=3.369742, null_hyp=2.403931, chirp=-30.175, fft_len=16k
Gaussian: peak=3.570737, mean=0.5002306, ChiSq=1.32693, time=98.15, d_freq=1421071370.4,
	score=4.532304, null_hyp=2.442002, chirp=-30.175, fft_len=16k
Gaussian: peak=3.376572, mean=0.5267106, ChiSq=1.407839, time=62.91, d_freq=1421075607.9,
	score=1.498692, null_hyp=2.332992, chirp=51.563, fft_len=16k
Gaussian: peak=3.437077, mean=0.5095716, ChiSq=1.392497, time=64.59, d_freq=1421075694.4,
	score=3.369557, null_hyp=2.420066, chirp=51.563, fft_len=16k
Gaussian: peak=3.405566, mean=0.5179273, ChiSq=1.419629, time=66.27, d_freq=1421075780.91,
	score=2.323027, null_hyp=2.382857, chirp=51.563, fft_len=16k
Gaussian: peak=3.506042, mean=0.5093156, ChiSq=1.364717, time=96.47, d_freq=1421069540.57,
	score=3.369742, null_hyp=2.403931, chirp=51.563, fft_len=16k
Gaussian: peak=3.570737, mean=0.5002306, ChiSq=1.32693, time=98.15, d_freq=1421069627.08,
	score=4.532304, null_hyp=2.442002, chirp=51.563, fft_len=16k

The SoG app also reported the first 5 of these. What makes the 2 extra reported by SoG rather intriguing is that almost all the reported values, except for d_freq and chirp, are identical to the first 2 reported.

Following the Gaussians, each app reported 10 more Spikes which, for the SoG, app brought the total signals to 28. However, because of the 2 extra Gaussians, the Special App just barely reached the 30-signal threshold to become a -9 overflow.

It looks like the stock Cuda42 app has been assigned the tiebreaker.
19) Message boards : Number crunching : Task Deadline Discussion (Message 1905675)
Posted 8 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:

As Raistmer has said, if you reduce the deadline you automatically increase the number of resends...
It seems to me that there's only a very tiny grain of truth to that. Again, using my data, out of 901 tasks still hanging around past the 80% mark, only 23 were eventually validated. The other 878 had to be resent no matter what. Those do not increase the number of resends.

Actually they do. Both never returning hosts as I said earlier _and_ those who regularly miss deadline.
If you trash 1 task per 3 week it's 3 times less than if you trash 1 task each week. For year you will get 3 times more resends from such hosts. + increase in such hosts number just due to increasing in processing power required to finish in time.

@The other 878 had to be resent no matter what.@ you consider this as one time event, but actually it should be considered as recurring event!
I'm sorry, Raistmer, but I think your argument is flawed. For one thing, I think there's a fundamental difference between tasks being resent, which is what happens to all timed out tasks, and tasks being replaced on the hosts where the originals timed out, which only happens some of the time.

There are several identifiable categories of hosts which miss the deadlines.

1) "Drive-bys": Those users who visit the project because something they've read somewhere makes it seem interesting. They sign up, download a bunch of tasks, then immediately change their minds for some reason, and just simply drive away, leaving all their downloaded tasks to eventually time out. The timed-out tasks will never be replaced. A reduction in deadline times would simply clear out those WUs and their wingmens' pendings that much sooner.

2) Hosts that have died, or been abruptly replaced, or simply been shut down by established users for any number of reasons: Again, like in group 1, timed out tasks will not be replaced. Shorter deadlines will help.

3) Hosts with "ghosts": Those who have accumulated some number of tasks which still exist in the DB but, for whatever reason, no longer reside on the host and are unknown to the scheduler. These tasks have already been replaced on the host, but they can't be resent to other hosts until they time out, thus tying up DB space and resources unnecessarily. Shorter deadlines would certainly help with these, but so, in some cases, would more responsible users, or periodic temporary reactivation of the "lost" task resend mechanism, or even a web site method for users to trigger either resending or abandonment of such tasks.

4) Hosts who steadily download large numbers of tasks without successfully completing any (such as the one with 6,000+ in the queue that I noted in earlier posts): These hosts are likely generating ghosts, so the daily downloads constitute a continuous, immediate replacement stream, long before the task deadlines are ever reached. Shorter deadlines would have a significant impact, though better quota management and some sort of user notification would be even more important areas to address.

5) Hosts making sporadic contact: There are some with high turnaround times who also seem to be able to download an excessive number of tasks. They still seem to be doing some processing, occasionally returning batches of completed tasks, but they also have large numbers of time-outs. Reduced deadlines may or may not have an impact on those hosts, whose issues are probably better addressed through improved quota management.

6) Hosts with temporary issues: There are certainly some who miss their deadlines for a short while, then resume normal processing. Perhaps the users went on extended vacations or business trips, took a while to replace a failed part in their PC, needed to cut their electric bill for awhile, etc. In some cases, shorter deadlines may cause tasks on these hosts to time out once they resume operations, even though those tasks might still be successfully completed. I think these hosts represent a small portion of the user base, but certainly shouldn't be ignored in the discussion.

7) Hosts running multiple projects: Depending on resource shares, and BOINC's ability to manage same, some of these hosts seem to push the deadline limits time and again, sometimes even exceeding them. I don't think there's a clear answer as to whether, or how, shorter deadlines might affect these hosts. I tend to believe that many of them will still achieve "just in time" task completion no matter what the deadlines, simply because of the way the resource shares get managed and BOINC's tendency to resort to high priority execution to just barely squeak by. Certainly a category that might warrant further research.

8) Slow, low-volume, low-power, and/or part-time hosts: "Slow" is a relative term in today's processing environment. While Android devices have typically been held up as poster children for the "Seti@home for even the lowliest" cruncher argument, I didn't see them as an issue in the samples I posted earlier, but perhaps they can be found. And older, slower, inefficient hosts can realistically only be accommodated by the project for so long. Reference my earlier post about my Win98 machine. I think, though, based on my sample analysis at the beginning of this discussion, that these hosts really aren't currently an issue. However, if one of their tasks does time out then, yes, a replacement will be sent, so shortening deadlines could have a negative impact. Same for the part-timers.

I've probably overlooked a category or two but, from my perspective (and based on the examples from my earlier posts), shorter deadlines will result in speedier resends, little, if any, increase in replacement turnover, and, above all, a more streamlined and efficient DB.
20) Message boards : Number crunching : Task Deadline Discussion (Message 1905645)
Posted 8 days ago by Profile Jeff BuckSpecial Project $250 donor
Post:
That's right approach IMO, especially because nothing really disallow to make this manually on each host by introducing "virtual devices" to BOINC (either by re-scheduling CPU<-> GPU or by running multiple BOINC instances or even by creation some app_info.xml based additional "accelerators" (not tested by seems possible)).
Or take Petri's approach by fooling BOINC with his "[16] NVIDIA GeForce GTX 1080 Tu " GPUs.

But these simply address the temporary Tuesday outage drought that high-volume crunchers face. In fact, all of these approaches actually inflate the size of the DB, either temporarily (as with my own, and others', stockpiling through rescheduling on Monday, which returns to normal by the end of the outage) or more permanently (as with Petri's virtual GPUs). It's really a very separate issue from the deadlines.


Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.