Message boards :
Number crunching :
About Deadlines or Database reduction proposals
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next
Author | Message |
---|---|
W-K 666 Send message Joined: 18 May 99 Posts: 19280 Credit: 40,757,560 RAC: 67 |
Why are the initial assumptions no longer relevant?I mentioned that in the previous post. Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830 |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 |
for Juan' message 2033761 from 118 panic mode subject . here u can find old seti classic app and wu https://setiathome.berkeley.edu/forum_thread.php?id=63288&postid=1081976#1081976 |
W-K 666 Send message Joined: 18 May 99 Posts: 19280 Credit: 40,757,560 RAC: 67 |
Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to) And that only uses the GPU. And just for info, of the 6 AP's downloaded yesterday between 08:44 and 09:40, four of them (67%) have been validated. https://setiathome.berkeley.edu/results.php?userid=8083616&offset=0&show_names=0&state=0&appid=20 |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...? Grant Darwin NT |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
I propose: NOT changing the deadline, NOT changing the server software, and still resulting in a result that satisfies ALL the stake-holders here, users and SETI. It is this: On the Tuesday maintenance day, when the servers come back up, run a small chron job (assuming the servers are running some UNIX type O/S) and have it reduce the maximum number of WU's in process per CPU/GPU to around 50(?). [I am making the assumption the maximum number of WU's in process can be changed programmatically.] 50(?) would allow all but the very biggest computers at least 1 WU for each CPU and GPU. The smallest or slowest could also get 50, one-third what is now allowed. Run this way until a few minutes BEFORE the maintenance shutdown, when the chron job would increase the maximum in process to 200/300/400(?) and stay in this mode for 5(?) or 10(?) minutes before resetting to a maximum of 50(?). The servers would continue to run for another 30(?) minutes to allow the WU downloads to complete before shutting down.. Note that the big and fast systems would probably be visiting the servers to report and get more WU's frequently, and would likely do so within this once a week window. Slow and small computers would probably NOT often visit during this window. The slow and small computers would probably not even notice the changing maximums, and the big boys would be able to fill a large queue just before the servers go down. No code change so no programming cost. The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this]. Small and slow are happy. Big and fast are happy. Servers are busier with more uploads and requests but only from the big guns. DB space used is smaller. What's not to like with this idea? [The numbers followed by (?) would be adjusted as experience is gained to balance the needs and wants of slow, small, big and fast, and server computer systems.] |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this].How? The problem isn't the amount of work In progress, it's the backlogs with Validation & Assimilation (and often deletion & purging as well). What about the rest of the week? You're also assuming that everything is working well before & after the outage, that often isn't the case. Changing the deadline wouldn't require nearly as much effort as setting up and & configuring & tweaking those cron jobs. We've got enough complexity here as it is. Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue. Edit- although reducing the amount of work that is In progress for long periods of time will help the database overall. Once everything gets back down to more normal levels. The present issue is Validation & Assimilation backlogs, but the amount of work In progress has caused problems of it's own in the past. Grant Darwin NT |
W-K 666 Send message Joined: 18 May 99 Posts: 19280 Credit: 40,757,560 RAC: 67 |
Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...? Only because it is probably on 24/7. it took over 2 hours to find a noise bomb, and the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long, nearly three days. So if it was only on 2 hours/day, twice the original assumption, it would struggle to complete within the proposed 28 day limit. I am all for shortening the deadlines, mainly as an aid to clear the tasks that get abandoned*, but we do need to define acceptable limits to the minimum spec devices and the minimum average time we can expect devices to be on. And say why those limits are in place. *One of my oldest tasks is https://setiathome.berkeley.edu/workunit.php?wuid=3804671429 created 23 Dec 2019, original wingman timed out, now out to a host that has only returned one noise bomb in the last 10 days, but has 146 tasks in Progress. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14666 Credit: 200,643,578 RAC: 874 |
... the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long ...That comparison is only valid for NVidia GPUs. On CPU-like devices, mid-AR and VLAR are very similar. |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ Grant (SSSF) How? The problem isn't the amount of work In progress, it's the backlogs with Validation & No disrespect, but the title of this thread is "About Deadlines or Database reduction proposals". My proposal would be to reduce the number of the IN PROCESS Wu's, which sit in some part of a DB. How much? I don't have the data to tell, but someone probably has it and can extract the counts to determine how much, and then tell me/us if this is a good proposal. It certainly is a cheap one to implement. What about the rest of the week? Well, after the outage, the limit for in process would be set at 50(?) until just before the next outage when it would be increased for a small window of time. Perhaps post outage for some period of time an even smaller limit would be appropriate. Experience would tell us yes or no. Changing the deadline wouldn't require nearly as much effort as setting up and & configuring & I disagree. See this message by Ian for an example of how easy it is. Further, the cron job could be done WITHOUT having to modify server code [see my assumption about changing the in process limits programmatically]. No programming time, compiling, testing, regression testing, etc. The "tweaking" would occur after some period of running, and could simply require a small change to a parameter of the cron jobs. To test this proposal, it might even be possible to do this BY HAND for a couple of weeks. Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue. I am NOT trying to address this issue in my proposal. The present issue is Validation & Assimilation backlogs If this is the case, then I suggest that the title of this thread be changed to reflect that issue, and I will post my proposal in another thread. Live long and prosper! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
The whole point of having a cache, is to help people through outages. While increasing the level of the just prior to the weekly outage cache will help people through it, that will only be the case if things are working OK at that point, often they aren't. And often problems occur at other times and it would be nice to be able to get through those outages as well.What about the rest of the week?Well, after the outage, the limit for in process would be set at 50(?) until just before the next I would prefer to actually fix the problem (hardware is no longer up to the job), than implement yet more workarounds that no one will be happy with. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. I 'spoke' to Mr Kevvy about moving the off topic messages from the panic thread to here, just to clean up the mess there is nothing else. He is willing to take care of it if we/someone tags them. . . OOPS!!! Stephen :) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
Only because it is probably on 24/7.If it were on 24/7 it would have a much higher RAC as it is processing normal non-noisy WUs in under 2 days. it took over 2 hours to find a noise bomb,if you look at the other systems that processed the WU as well you'd see that it took an i7 desktop system 5min to find it as well, and an i7 Laptop 22min. So it wasn't actually a noise bomb, more of an early overflow. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Why are the initial assumptions no longer relevant? . . That begs the question ... is that PC running SETI? Stephen ? ? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
So it wasn't actually a noise bomb, more of an early overflow. Correction: LATE overflow Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to) . . And I am assuming those numbers would be hard to assess. I know the schedulers keep those stats (percentage of time PC is on and percentage of that time when SETI is running) but would those stats be available as on overview of all hosts? Stephen ? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
For me, a late overflow is one that occurs well after the WU has started, but still a significant time before it was due to end.So it wasn't actually a noise bomb, more of an early overflow.Correction: LATE overflow Noise bomb- matter of seconds (extremely early overflow). Early overflow- at least several minutes in to crunching. Mid overflow - sometime between a bit before halfway & a bit over half way (never actually noticed one). Late overflow- well in to crunching (past half way), but a significant time before it's estimated completion time. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
it took over 2 hours to find a noise bomb, That tells me the task was running for 2 hours before it finally overflowed. Makes both mine and your definition of a late overflow. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13822 Credit: 208,696,464 RAC: 304 |
Not if the estimated running time was 2 days (i don't know what it was, but it was probably over 24hrs).it took over 2 hours to find a noise bomb,That tells me the task was running for 2 hours before it finally overflowed. It took an i7 desktop system 5min to bailout. On that system even a Shortie would probably take 50min, so that was an early overflow in my book (even if a shortie takes 30min, 5min is early). If it ended with 30 spikes/pulses/whatever after 35 or 40min (with a 50min estimated runtime), that would make it a late overflow. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Nobody ask my question: How do we raise the request to evaluate the squeeze of the deadlines to the Seti powers? By look at the SSP page we could see: Results returned and awaiting validation 0 39,153 13,810,892 Results out in the field 0 61,160 5,333,622 Something must be done..... and quick! |
Sirius B Send message Joined: 26 Dec 00 Posts: 24901 Credit: 3,081,182 RAC: 7 |
I wouldn't be at all surprised if the project admin are aware of the issue & as such, they prioritise issues. It just depends where your question is on the priority list. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.