Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 94 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Think about a computer with a 100k RAC suddenly starting to throw invalids, set the trigger at 1% - it needs to throw 1000 before it is trapped.Now you're the one muddling the units! A RAC of 100K - measured in credits, that's what the 'C' stands for - is probably a count of ~1K. So a 1% trigger is only ~10 tasks. Or so a pedant might say. The underlying principle of keeping the maths as simple as possible for a server which has to process 150,000 tasks per hour is still valid. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Why then did YOU mention RAC? - It is YOU who proposed a system based on the RAC of a host.Where did I mention that? The the first time in this discussion 'RAC' appears in any of my posts is when I was replying to YOU bringing that up. And the count would stay high forever after the host stops producing invalids. That's why a decay mechanism is needed.If the host produced only invalids, then its invalid score would rise very fast!Yes, its invalid COUNT would rise, use that as the control, not some derived variable - far simpler, and far more able to catch the event early. Think about a computer with a 100k RAC suddenly starting to throw invalids, set the trigger at 1% - it needs to throw 1000 before it is trapped - now consider a computer with a RAC of 1M (and they do exist) - that figure now becomes 10k - both of which are far more than "just a few"RAC would have absolutely nothing to do with it. Higher RAC host would trigger it faster because it chews through the tasks faster but any host would trigger it after the exact same number of tasks. If we assume the decay multiplier 0.999. the limit 1%, the host producing only invalids and the value starting form zero, then the value would evolve like this: 0.001, 0.001999, 0.002997. 0.003994, 0.004990, 0.005985, 0.006979, 0.007972, 0.008964, 0.009955, 0.010945. So 11th consecutive invalid result would trigger the trap. |
W-K 666 Send message Joined: 18 May 99 Posts: 19406 Credit: 40,757,560 RAC: 67 |
But a host will probably be producing good results on the CPU and if fitted other GPU's. |
bluestar Send message Joined: 5 Sep 12 Posts: 7264 Credit: 2,084,789 RAC: 3 |
Sorry, but noise is a thing only coming our way, for being a radio telescope being pointed at the sky, for detecting radio signals which also could be intelligent in nature. Therefore the blc35_2bit_guppi tasks among these, for only a bit of noise, except also the possible transmission that also could be meant to be. Again just sorry for only the discussion of that of RAC for also credit it should be here, for also the scheduled maintenance happening at times. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
But a host will probably be producing good results on the CPU and if fitted other GPU's.Even those buggy GPUs/drivers only fail on some tasks. Not all of them. |
rob smith Send message Joined: 7 Mar 03 Posts: 22536 Credit: 416,307,556 RAC: 380 |
Thanks Richard - I had a look at that post and thought "there's something wrong there, but what...." Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Thanks Richard - I had a look at that post and thought "there's something wrong there, but what...."I was trained - a long, long, time ago - at the Cavendish lab, Cambridge university. Two pieces of teaching stuck: * No number has a meaning unless the units are stated. * Do every calculation twice. Once, using the most accurate mechanical/electronic device available - and then again, on the back of an envelope or a chalk board. The second only has to be done using order-of-magnitude approximations, but it proves that the decimal point hasn't slipped in the first one. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Sunday, and the replica if falling behind again.This has happened on many weekends. At approximately the same time the total outages we experienced on many consecutive Sundays in last September and October. I wonder if the causes are related. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
The Replica has had it's break & is now catching up again. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
A few noisy WUs in the current Arecibo group, and more than the usual number of uploads timing out instantly. Grant Darwin NT |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Assimilation queue is still going down at a steady rate. If the same rate continues, the backlog is gone in about 8 days. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
Did anyone get anything other than noise bombs for ap_29ja16ad? Must have been something going on that day. Edit: only thing I see on any cosmic calendars is the moon being at its apogee. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Did anyone get anything other than noise bombs for ap_29ja16ad? Must have been something going on that day. Haven't done any of them yet. Still in progress. It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc. Thanks, just unusual to get an sse, sse2, and 4 OpenCL_ati_mac WUs, just to have them all bomb out. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36850 Credit: 261,360,520 RAC: 489 |
It's nothing uncommon when the tasks are 100% blanked.It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.Thanks, just unusual to get an sse, sse2, and 4 OpenCL_ati_mac WUs, just to have them all bomb out. Cheers. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.This will become more and more common over time as Elon Musk and his competitors are spamming the low Earth orbit with thousands of internet satellites. |
W-K 666 Send message Joined: 18 May 99 Posts: 19406 Credit: 40,757,560 RAC: 67 |
Milestone As of 11 Feb 2020, 7:30:05 UTC "Results returned and awaiting validation" is below 10,000,000. exact figure 9,990,238. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Milestone As of 11 Feb 2020, 7:30:05 UTCOnly another 5.2 million to go (and another 2.41 million to go to clear the Assimilation backlog). Grant Darwin NT |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Only another 5.2 million to go (and another 2.41 million to go to clear the Assimilation backlog).Those are essentially the same thing. Each workunit in assimilation queue is preventing on average about 2.2 results from transitioning to 'waiting for db purging' state. 5.2/2.41=2.16 |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
and we are back.... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.