About Deadlines or Database reduction proposals

Author	Message
W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19064 Credit: 40,757,560 RAC: 67	Message 2033907 - Posted: 25 Feb 2020, 10:13:36 UTC - in response to Message 2033904. Why are the initial assumptions no longer relevant? I mentioned that in the previous post. Is it wrong to assume that there are hosts out there that are only switched on for a few hours/day and because their performance is low are set to suspend when keyboard or mouse is detected. I know for a fact that my youngest's desktop is not switched on at least 3 days per week and on Saturdays is usually only on for a few hours in the afternoon. So you'r saying we should extend deadlines even further to allow for systems that are very slow & might only run for an hour a week or so? Personally i'd rather we continue to cater for the vast majority- which are average systems which would only crunch work for a few hours most days- than cater to extreme outliers. If a system can't return a single WU within 28 days then i really don't see a need to cater for it. Catering for the widest possible range of hardware and use cases does not mean catering for all possible hardware & use cases. There will always have to be a cutoff point. Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830 ID: 2033907 · Reply Quote

Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62	Message 2033908 - Posted: 25 Feb 2020, 10:13:45 UTC for Juan' message 2033761 from 118 panic mode subject . here u can find old seti classic app and wu https://setiathome.berkeley.edu/forum_thread.php?id=63288&postid=1081976#1081976 ID: 2033908 · Reply Quote

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19064 Credit: 40,757,560 RAC: 67	Message 2033909 - Posted: 25 Feb 2020, 10:17:43 UTC - in response to Message 2033906. Last modified: 25 Feb 2020, 10:18:05 UTC Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to) So there is already a correction of deadline for "anticipated run time" in place, all that needs to happen is the reference value be adjusted to give a lower deadline. But when thinking about how low to take the deadline remember that V-K-666's host is probably in the top 1000, so not that far down the list and we are seeing people who only run their computers part-time on SETI. And that only uses the GPU. And just for info, of the 6 AP's downloaded yesterday between 08:44 and 09:40, four of them (67%) have been validated. https://setiathome.berkeley.edu/results.php?userid=8083616&offset=0&show_names=0&state=0&appid=20 ID: 2033909 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033910 - Posted: 25 Feb 2020, 10:29:08 UTC - in response to Message 2033907. Last modified: 25 Feb 2020, 10:29:22 UTC Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830 Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...? Grant Darwin NT ID: 2033910 · Reply Quote

Darrell Wilcox Volunteer tester Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118	Message 2033911 - Posted: 25 Feb 2020, 10:38:12 UTC I propose: NOT changing the deadline, NOT changing the server software, and still resulting in a result that satisfies ALL the stake-holders here, users and SETI. It is this: On the Tuesday maintenance day, when the servers come back up, run a small chron job (assuming the servers are running some UNIX type O/S) and have it reduce the maximum number of WU's in process per CPU/GPU to around 50(?). [I am making the assumption the maximum number of WU's in process can be changed programmatically.] 50(?) would allow all but the very biggest computers at least 1 WU for each CPU and GPU. The smallest or slowest could also get 50, one-third what is now allowed. Run this way until a few minutes BEFORE the maintenance shutdown, when the chron job would increase the maximum in process to 200/300/400(?) and stay in this mode for 5(?) or 10(?) minutes before resetting to a maximum of 50(?). The servers would continue to run for another 30(?) minutes to allow the WU downloads to complete before shutting down.. Note that the big and fast systems would probably be visiting the servers to report and get more WU's frequently, and would likely do so within this once a week window. Slow and small computers would probably NOT often visit during this window. The slow and small computers would probably not even notice the changing maximums, and the big boys would be able to fill a large queue just before the servers go down. No code change so no programming cost. The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this]. Small and slow are happy. Big and fast are happy. Servers are busier with more uploads and requests but only from the big guns. DB space used is smaller. What's not to like with this idea? [The numbers followed by (?) would be adjusted as experience is gained to balance the needs and wants of slow, small, big and fast, and server computer systems.] ID: 2033911 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033914 - Posted: 25 Feb 2020, 10:59:42 UTC - in response to Message 2033911. Last modified: 25 Feb 2020, 11:17:57 UTC The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this]. How? The problem isn't the amount of work In progress, it's the backlogs with Validation & Assimilation (and often deletion & purging as well). What about the rest of the week? You're also assuming that everything is working well before & after the outage, that often isn't the case. Changing the deadline wouldn't require nearly as much effort as setting up and & configuring & tweaking those cron jobs. We've got enough complexity here as it is. Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue. Edit- although reducing the amount of work that is In progress for long periods of time will help the database overall. Once everything gets back down to more normal levels. The present issue is Validation & Assimilation backlogs, but the amount of work In progress has caused problems of it's own in the past. Grant Darwin NT ID: 2033914 · Reply Quote

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19064 Credit: 40,757,560 RAC: 67	Message 2033915 - Posted: 25 Feb 2020, 11:21:26 UTC - in response to Message 2033910. Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830 Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...? Only because it is probably on 24/7. it took over 2 hours to find a noise bomb, and the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long, nearly three days. So if it was only on 2 hours/day, twice the original assumption, it would struggle to complete within the proposed 28 day limit. I am all for shortening the deadlines, mainly as an aid to clear the tasks that get abandoned, but we do need to define acceptable limits to the minimum spec devices and the minimum average time we can expect devices to be on. And say why those limits are in place. One of my oldest tasks is https://setiathome.berkeley.edu/workunit.php?wuid=3804671429 created 23 Dec 2019, original wingman timed out, now out to a host that has only returned one noise bomb in the last 10 days, but has 146 tasks in Progress. ID: 2033915 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 2033924 - Posted: 25 Feb 2020, 11:52:36 UTC - in response to Message 2033915. ... the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long ... That comparison is only valid for NVidia GPUs. On CPU-like devices, mid-AR and VLAR are very similar. ID: 2033924 · Reply Quote

Darrell Wilcox Volunteer tester Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118	Message 2033953 - Posted: 26 Feb 2020, 3:26:23 UTC - in response to Message 2033914. @ Grant (SSSF) How? The problem isn't the amount of work In progress, it's the backlogs with Validation & Assimilation (and often deletion & purging as well). No disrespect, but the title of this thread is "About Deadlines or Database reduction proposals". My proposal would be to reduce the number of the IN PROCESS Wu's, which sit in some part of a DB. How much? I don't have the data to tell, but someone probably has it and can extract the counts to determine how much, and then tell me/us if this is a good proposal. It certainly is a cheap one to implement. What about the rest of the week? Well, after the outage, the limit for in process would be set at 50(?) until just before the next outage when it would be increased for a small window of time. Perhaps post outage for some period of time an even smaller limit would be appropriate. Experience would tell us yes or no. Changing the deadline wouldn't require nearly as much effort as setting up and & configuring & tweaking those cron jobs. I disagree. See this message by Ian for an example of how easy it is. Further, the cron job could be done WITHOUT having to modify server code [see my assumption about changing the in process limits programmatically]. No programming time, compiling, testing, regression testing, etc. The "tweaking" would occur after some period of running, and could simply require a small change to a parameter of the cron jobs. To test this proposal, it might even be possible to do this BY HAND for a couple of weeks. Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue. I am NOT trying to address this issue in my proposal. The present issue is Validation & Assimilation backlogs If this is the case, then I suggest that the title of this thread be changed to reflect that issue, and I will post my proposal in another thread. Live long and prosper! ID: 2033953 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033957 - Posted: 26 Feb 2020, 3:41:46 UTC - in response to Message 2033953. What about the rest of the week? Well, after the outage, the limit for in process would be set at 50(?) until just before the next outage when it would be increased for a small window of time. Perhaps post outage for some period of time an even smaller limit would be appropriate. Experience would tell us yes or no. The whole point of having a cache, is to help people through outages. While increasing the level of the just prior to the weekly outage cache will help people through it, that will only be the case if things are working OK at that point, often they aren't. And often problems occur at other times and it would be nice to be able to get through those outages as well. I would prefer to actually fix the problem (hardware is no longer up to the job), than implement yet more workarounds that no one will be happy with. Grant Darwin NT ID: 2033957 · Reply Quote

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2033961 - Posted: 26 Feb 2020, 3:47:13 UTC - in response to Message 2033861. . I 'spoke' to Mr Kevvy about moving the off topic messages from the panic thread to here, just to clean up the mess there is nothing else. He is willing to take care of it if we/someone tags them. Uggh, that's 3 pages worth of posts. I certainly would not like to have 225 messages about post moves and deletions show up in my PM basket. Let the lie where they are. . . OOPS!!! Stephen :) ID: 2033961 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033964 - Posted: 26 Feb 2020, 3:53:11 UTC - in response to Message 2033915. Only because it is probably on 24/7. If it were on 24/7 it would have a much higher RAC as it is processing normal non-noisy WUs in under 2 days. it took over 2 hours to find a noise bomb, if you look at the other systems that processed the WU as well you'd see that it took an i7 desktop system 5min to find it as well, and an i7 Laptop 22min. So it wasn't actually a noise bomb, more of an early overflow. Grant Darwin NT ID: 2033964 · Reply Quote

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2033966 - Posted: 26 Feb 2020, 3:57:58 UTC - in response to Message 2033903. Why are the initial assumptions no longer relevant? Is it wrong to assume that there are hosts out there that are only switched on for a few hours/day and because their performance is low are set to suspend when keyboard or mouse is detected. I know for a fact that my youngest's desktop is not switched on at least 3 days per week and on Saturdays is usually only on for a few hours in the afternoon. . . That begs the question ... is that PC running SETI? Stephen ? ? ID: 2033966 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2033968 - Posted: 26 Feb 2020, 4:01:05 UTC - in response to Message 2033964. So it wasn't actually a noise bomb, more of an early overflow. Correction: LATE overflow Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2033968 · Reply Quote

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2033969 - Posted: 26 Feb 2020, 4:02:59 UTC - in response to Message 2033906. Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to) So there is already a correction of deadline for "anticipated run time" in place, all that needs to happen is the reference value be adjusted to give a lower deadline. But when thinking about how low to take the deadline remember that V-K-666's host is probably in the top 1000, so not that far down the list and we are seeing people who only run their computers part-time on SETI. . . And I am assuming those numbers would be hard to assess. I know the schedulers keep those stats (percentage of time PC is on and percentage of that time when SETI is running) but would those stats be available as on overview of all hosts? Stephen ? ID: 2033969 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033971 - Posted: 26 Feb 2020, 4:06:45 UTC - in response to Message 2033968. Last modified: 26 Feb 2020, 4:09:59 UTC So it wasn't actually a noise bomb, more of an early overflow. Correction: LATE overflow For me, a late overflow is one that occurs well after the WU has started, but still a significant time before it was due to end. Noise bomb- matter of seconds (extremely early overflow). Early overflow- at least several minutes in to crunching. Mid overflow - sometime between a bit before halfway & a bit over half way (never actually noticed one). Late overflow- well in to crunching (past half way), but a significant time before it's estimated completion time. Grant Darwin NT ID: 2033971 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2033974 - Posted: 26 Feb 2020, 4:55:38 UTC - in response to Message 2033971. it took over 2 hours to find a noise bomb, That tells me the task was running for 2 hours before it finally overflowed. Makes both mine and your definition of a late overflow. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2033974 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2033981 - Posted: 26 Feb 2020, 6:11:19 UTC - in response to Message 2033974. it took over 2 hours to find a noise bomb, That tells me the task was running for 2 hours before it finally overflowed. Makes both mine and your definition of a late overflow. Not if the estimated running time was 2 days (i don't know what it was, but it was probably over 24hrs). It took an i7 desktop system 5min to bailout. On that system even a Shortie would probably take 50min, so that was an early overflow in my book (even if a shortie takes 30min, 5min is early). If it ended with 30 spikes/pulses/whatever after 35 or 40min (with a 50min estimated runtime), that would make it a late overflow. Grant Darwin NT ID: 2033981 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2034017 - Posted: 26 Feb 2020, 13:15:40 UTC Nobody ask my question: How do we raise the request to evaluate the squeeze of the deadlines to the Seti powers? By look at the SSP page we could see: Results returned and awaiting validation 0 39,153 13,810,892 Results out in the field 0 61,160 5,333,622 Something must be done..... and quick! ID: 2034017 · Reply Quote

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7	Message 2034018 - Posted: 26 Feb 2020, 13:20:30 UTC - in response to Message 2034017. I wouldn't be at all surprised if the project admin are aware of the issue & as such, they prioritise issues. It just depends where your question is on the priority list. ID: 2034018 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.