Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 45 · 46 · 47 · 48 · 49 · 50 · 51 . . . 94 · Next
| Author | Message |
|---|---|
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
Why are we processing data that is 10 years old now?Because some of the old data hasn't been processed yet, or had problems when it was originally processed. Of the present 13 Arecibo data files loaded for splitting, 1 is from 2011. The other 12 are from this year. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873
|
It helps slow down the return rate so the servers have an easier load. Also, the search algorithm is much more powerful and sensitive than 10 years ago because the current hardware can handle it. We can look at old data with much greater precision now. Might have missed the needle in the haystack ten years ago. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
It helps slow down the return rate so the servers have an easier load.If it's VLAR work. Presently we've got Arecibo Shorties & noise bombs along with BLC35 noise bombs going through. Will put the return rate through the roof when people start getting some new work regularly. Grant Darwin NT |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
Linux system has finally managed to pick up some work, and keep getting slightly more than it is returning on those occasions it can get some work. May end up at the server-side limits in the next hour or 3. Grant Darwin NT |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Does anyone have an explanation for this phenomenon: On SSP the 'Current result creation rate' has stayed consistently way higher than 'Results received in last hour' (when converted to the same units) but 'Results ready to send' buffer does not grow. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
Does anyone have an explanation for this phenomenon:Lots & lots & lots (& lots & lots) of extremely empty caches. When the In-progress numbers get back up to around 6 million, the deficit will be filled & the Ready-to-send buffer can grow (or 7 million, depending on the work mix & the server side limit settings). Grant Darwin NT |
|
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0
|
Does anyone have an explanation for this phenomenon: Because the "Current result creation rate" is the value when the server status page generator looks at it. Also people are filling up their queues... I have 3 graphs that illustrate this: Creation rate: Last hour: RTS: inProgress: |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
This is weird. I have had my queues full for nearly 10 hours now and I'm spoofing my gpu count so my queue filling should have taken longer than non spoofing hosts that should be the majority.Does anyone have an explanation for this phenomenon:Because the "Current result creation rate" is the value when the server status page generator looks at it. Also people are filling up their queues... I don't think the creation rate is an instant value for a single point in time. There's some kind of exponentially decaying average there because the value decays to zero slowly after the splitters have stopped. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14013 Credit: 208,696,464 RAC: 304
|
This is weird. I have had my queues full for nearly 10 hours now and I'm spoofing my gpu count so my queue filling should have taken longer than non spoofing hosts that should be the majority.You're lucky. My Linux hoist still doesn't have a full cache. The Windows one is generally hitting the server side limits when it does manage to get work. I don't think the creation rate is an instant value for a single point in time.It is. That is it's value at the time it was checked. But the output from the splitters is all over the place at the best of times, depending on what's going on with the rest of the servers. There was a period at one stage where the output from the splitters was showing as 0 for a couple of hours, with the Ready to send buffer effectively at zero, yet work In-progress wasn't declining. It was most likely a a case of the splitters spluttering- brief periods of high output, much longer periods (during which their output was sampled) when they weren't producing anything. There's some kind of exponentially decaying average there because the value decays to zero slowly after the splitters have stopped.That's just the result of the time gap between samples, and the values reported and then being graphed. More frequent sampling (if it were possible) would result in a different shape to the graph. Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
I don't think the creation rate is an instant value for a single point in time. There's some kind of exponentially decaying average there because the value decays to zero slowly after the splitters have stopped.Well, I've only just started refilling my caches after a night's sleep ;-) On the creation rate: I think it is a snapshot at a moment in time. It's hard to read an exponential decay into the figures because the page only updates once every ten (sometimes twenty) minutes. My belief (though with no proof) is that the splitters 'finish what they're doing' when given the instruction to pause: if that means finishing the current track, each individual splitter might actually stop at a different time, giving the impression of a gradual decay. |
|
Kiska Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0
|
There's some kind of exponentially decaying average there because the value decays to zero slowly after the splitters have stopped.That's just the result of the time gap between samples, and the values reported and then being graphed. More frequent sampling (if it were possible) would result in a different shape to the graph.[/quote] Unfortunately more frequent sampling requires a) The SSP to be updated more frequently than 15 mins and b) my munin instance to update more often than once per 4 minutes a) would mean more load on the servers and b) would mean more load on my munin instance but its updating once per 4 minutes anyway so SSP is the bigger issue |
juan BFP ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799
|
I'm spoofing my gpu count so my queue filling should have taken longer than non spoofing hosts that should be the majority. Depends on how your spoofing works and how many WU your host produces per minute. In my case (i use high GPU count) the problem is due the shorties storm my host (not a very fast one) produces about 60 WU each 5 min. Refill the cache DL less than 150 WU each 5 min in this case is simply a very very long task. The 250 WU limit on the feeder worked relatively fine in the past when the 100/100 WU cache was still on place, now with the 150/150 WU limit on all the hosts, that number is not enough to feed all the hungry hosts. They need to increase the feeder capacity buffer to handle this. If they could do that in a small period of time is a question.
|
Wiggo Send message Joined: 24 Jan 00 Posts: 38637 Credit: 261,360,520 RAC: 489
|
Can someone show me when this 250 task limit was implemented on the feeder as AFAIK the last increase that I know of took it from 100 to 200. Anyway since 9am here this morning (10pm yesterday UTC time) both my rigs have been either their cache limits or with 50 tasks of it. [Edit] In fact it again would be better to unmount those blc35 files again until all those old Arecibo files are finished as both are producing a huge amount of noise bombs that are creating an extra large load on the servers ATM. Cheers. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
If it is not an exponential average, then it must be an average of some fixed time period. You can't really sample the instant rate of discrete events. The instant value is infinite at the exact points of time when a wu is produced and zero everywhere else.I don't think the creation rate is an instant value for a single point in time.It is. The simplest way to calculate this time windowed rate would be to keep the count of wus generated and sample this count and the time of the sample every time the ssp is updated and calculate the rate as the difference between last two samples divided by difference between the last two timestamps. In that case the value would be the average rate over the whole time period between the second latest and the latest updates. This works fine even when the counter overflows and wraps around as long as it has sufficiently many bits to not wrap around twice between two consecutive updates. |
Wiggo Send message Joined: 24 Jan 00 Posts: 38637 Credit: 261,360,520 RAC: 489
|
Ville what you see on the SSP is just a snapshot that is taken at the time and nothing else and that could change a lot a few seconds later. ;-)If it is not an exponential average, then it must be an average of some fixed time period. You can't really sample the instant rate of discrete events. The instant value is infinite at the exact points of time when a wu is produced and zero everywhere else.I don't think the creation rate is an instant value for a single point in time.It is. Cheers. |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
I am getting "no tasks available" and am not running gpu tasks. So E@H is cheerfully filling that hole. Full load of cpu tasks. Tom A proud member of the OFA (Old Farts Association). |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
I have only one third of your RAC but even that puts me into top 100 setiathome users by RAC according to boincstats,com (*). So we and anyone bigger than us should be in an insignificant minority.I'm spoofing my gpu count so my queue filling should have taken longer than non spoofing hosts that should be the majority.Depends on how your spoofing works and how many WU your host produces per minute. I run two computers with one real GPU in each. The slower one advertises 2 and faster one 4 GPUs which gives them both approximately 12 hour queues. This also matches the time the 'unspoofed' CPU queue in the faster computer can span. Sufficient to survive normal Tuesday downtime. The faster computer processes approximately 1500 tasks per day, so 3 to 4 tasks per scheduler contact. The long unscheduled Sunday outages last September and October motivated me to download boinc source and hack it to report fake GPUs. I made it read the number of fake GPUs from a file each time it contacts the scheduler so that I can adjust the queue size on the fly without restarting boinc. The slower computer was running the stock boinc until last Monday when I installed my spoofed client on it too to prepare for Tuesday downtime that Eric announced beforehand to be very long. (* Actually my position by RAC has been around 110 or so but these recent long outages seem to have hurt others more than me so it now reports me being at position 99, which I'm sure is just temporary) |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
There is no such thing as instantaneous snapshot of the rate of discrete events. The rate of something is the time derivative of the amount of this something. For the instantaneous rate to exist this amount must be differentiable. The amount here is the number of tasks produced, which is an integer. A non constant integer valued function is not differentiable, so it's a mathematical impossibility for the instantaneous rate does to exist.If it is not an exponential average, then it must be an average of some fixed time period.Ville what you see on the SSP is just a snapshot that is taken at the time and nothing else and that could change a lot a few seconds later. ;-) It must be the average rate over some time period. I don't know what this time period is but because the value is not an integer, it must be longer than a second. Using the time between two ssp updates would be the simplest way to implement it and would also give more useful information than some shorter period. If it is a fixed period shorter than 10000 seconds, then this period can be deduced from a sufficiently large number of values the rate can take. Can anyone scraping the ssp data provide me some history of the result creation rate values? As numbers, not a graph. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874
|
There is no such thing as instantaneous snapshot of the rate of discrete events.True, if pedantic. You could probably work out the minimum period (not necessarily an integer number of complete seconds) needed to generate the displayed number of decimal places from an integer number of tasks created. Have fun. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
You could probably work out the minimum period (not necessarily an integer number of complete seconds) needed to generate the displayed number of decimal places from an integer number of tasks created. Have fun.I need a lot of those values to work it out. If someone provides me those, I will try. |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.