About Deadlines or Database reduction proposals

Message boards : Number crunching : About Deadlines or Database reduction proposals
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2033907 - Posted: 25 Feb 2020, 10:13:36 UTC - in response to Message 2033904.  

Why are the initial assumptions no longer relevant?
I mentioned that in the previous post.


Is it wrong to assume that there are hosts out there that are only switched on for a few hours/day and because their performance is low are set to suspend when keyboard or mouse is detected.
I know for a fact that my youngest's desktop is not switched on at least 3 days per week and on Saturdays is usually only on for a few hours in the afternoon.
So you'r saying we should extend deadlines even further to allow for systems that are very slow & might only run for an hour a week or so?

Personally i'd rather we continue to cater for the vast majority- which are average systems which would only crunch work for a few hours most days- than cater to extreme outliers.
If a system can't return a single WU within 28 days then i really don't see a need to cater for it.
Catering for the widest possible range of hardware and use cases does not mean catering for all possible hardware & use cases. There will always have to be a cutoff point.

Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830
ID: 2033907 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2033908 - Posted: 25 Feb 2020, 10:13:45 UTC

for Juan' message 2033761 from 118 panic mode subject .

here u can find old seti classic app and wu https://setiathome.berkeley.edu/forum_thread.php?id=63288&postid=1081976#1081976
ID: 2033908 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2033909 - Posted: 25 Feb 2020, 10:17:43 UTC - in response to Message 2033906.  
Last modified: 25 Feb 2020, 10:18:05 UTC

Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to)
So there is already a correction of deadline for "anticipated run time" in place, all that needs to happen is the reference value be adjusted to give a lower deadline.


But when thinking about how low to take the deadline remember that V-K-666's host is probably in the top 1000, so not that far down the list and we are seeing people who only run their computers part-time on SETI.

And that only uses the GPU.

And just for info, of the 6 AP's downloaded yesterday between 08:44 and 09:40, four of them (67%) have been validated. https://setiathome.berkeley.edu/results.php?userid=8083616&offset=0&show_names=0&state=0&appid=20
ID: 2033909 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033910 - Posted: 25 Feb 2020, 10:29:08 UTC - in response to Message 2033907.  
Last modified: 25 Feb 2020, 10:29:22 UTC

Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830
Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...?
Grant
Darwin NT
ID: 2033910 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2033911 - Posted: 25 Feb 2020, 10:38:12 UTC

I propose:
NOT changing the deadline,
NOT changing the server software,
and still resulting in a result that satisfies ALL the stake-holders here, users and SETI.

It is this:

On the Tuesday maintenance day, when the servers come back up, run a small chron job (assuming the servers are running
some UNIX type O/S) and have it reduce the maximum number of WU's in process per CPU/GPU to around 50(?). [I am
making the assumption the maximum number of WU's in process can be changed programmatically.]

50(?) would allow all but the very biggest computers at least 1 WU for each CPU and GPU. The smallest or slowest could
also get 50, one-third what is now allowed. Run this way until a few minutes BEFORE the maintenance shutdown, when
the chron job would increase the maximum in process to 200/300/400(?) and stay in this mode for 5(?) or 10(?) minutes before
resetting to a maximum of 50(?). The servers would continue to run for another 30(?) minutes to allow the WU downloads
to complete before shutting down..

Note that the big and fast systems would probably be visiting the servers to report and get more WU's frequently, and
would likely do so within this once a week window. Slow and small computers would probably NOT often visit during this
window.

The slow and small computers would probably not even notice the changing maximums, and the big boys would be able to
fill a large queue just before the servers go down.

No code change so no programming cost.

The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this].

Small and slow are happy. Big and fast are happy. Servers are busier with more uploads and requests but only from the
big guns. DB space used is smaller.

What's not to like with this idea?

[The numbers followed by (?) would be adjusted as experience is gained to balance the needs and wants of slow, small, big and fast,
and server computer systems.]
ID: 2033911 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033914 - Posted: 25 Feb 2020, 10:59:42 UTC - in response to Message 2033911.  
Last modified: 25 Feb 2020, 11:17:57 UTC

The sizes of the DB's would be reduced with fewer in process WU's in the field [need help by someone with the data to confirm this].
How? The problem isn't the amount of work In progress, it's the backlogs with Validation & Assimilation (and often deletion & purging as well).
What about the rest of the week? You're also assuming that everything is working well before & after the outage, that often isn't the case.
Changing the deadline wouldn't require nearly as much effort as setting up and & configuring & tweaking those cron jobs. We've got enough complexity here as it is.
Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue.

Edit- although reducing the amount of work that is In progress for long periods of time will help the database overall. Once everything gets back down to more normal levels.
The present issue is Validation & Assimilation backlogs, but the amount of work In progress has caused problems of it's own in the past.
Grant
Darwin NT
ID: 2033914 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19064
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2033915 - Posted: 25 Feb 2020, 11:21:26 UTC - in response to Message 2033910.  

Don't forget we do have crunchers with a RAC over 100,000 that run some very low performance devices, with a RAC of less than one average MB task/week.. e.g. https://setiathome.berkeley.edu/hosts_user.php?userid=138830
Yeah, and a 28 day deadline would have no impact on the lowest performing system there being able to contribute, so...?

Only because it is probably on 24/7. it took over 2 hours to find a noise bomb, and the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long, nearly three days. So if it was only on 2 hours/day, twice the original assumption, it would struggle to complete within the proposed 28 day limit.

I am all for shortening the deadlines, mainly as an aid to clear the tasks that get abandoned*, but we do need to define acceptable limits to the minimum spec devices and the minimum average time we can expect devices to be on. And say why those limits are in place.

*One of my oldest tasks is https://setiathome.berkeley.edu/workunit.php?wuid=3804671429 created 23 Dec 2019, original wingman timed out, now out to a host that has only returned one noise bomb in the last 10 days, but has 146 tasks in Progress.
ID: 2033915 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2033924 - Posted: 25 Feb 2020, 11:52:36 UTC - in response to Message 2033915.  

... the longest task shown is a mid range AR task, so we can assume a VLAR is going to take twice as long ...
That comparison is only valid for NVidia GPUs. On CPU-like devices, mid-AR and VLAR are very similar.
ID: 2033924 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2033953 - Posted: 26 Feb 2020, 3:26:23 UTC - in response to Message 2033914.  

@ Grant (SSSF)
How? The problem isn't the amount of work In progress, it's the backlogs with Validation &
Assimilation (and often deletion & purging as well).

No disrespect, but the title of this thread is "About Deadlines or Database reduction proposals".
My proposal would be to reduce the number of the IN PROCESS Wu's, which sit in some part of
a DB. How much? I don't have the data to tell, but someone probably has it and can extract the
counts to determine how much, and then tell me/us if this is a good proposal. It certainly is a
cheap one to implement.
What about the rest of the week?

Well, after the outage, the limit for in process would be set at 50(?) until just before the next
outage when it would be increased for a small window of time. Perhaps post outage for some
period of time an even smaller limit would be appropriate. Experience would tell us yes or no.
Changing the deadline wouldn't require nearly as much effort as setting up and & configuring &
tweaking those cron jobs.

I disagree. See this message by Ian for an example of how easy it is.

Further, the cron job could be done WITHOUT having to modify server code [see my
assumption about changing the in process limits programmatically]. No programming time,
compiling, testing, regression testing, etc. The "tweaking" would occur after some period of
running, and could simply require a small change to a parameter of the cron jobs. To test this
proposal, it might even be possible to do this BY HAND for a couple of weeks.
Oh, and again the fact that reducing the amount of work In progress isn't going to help clear the existing backlog issue.

I am NOT trying to address this issue in my proposal.
The present issue is Validation & Assimilation backlogs

If this is the case, then I suggest that the title of this thread be changed to reflect that issue, and I
will post my proposal in another thread.

Live long and prosper!
ID: 2033953 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033957 - Posted: 26 Feb 2020, 3:41:46 UTC - in response to Message 2033953.  

What about the rest of the week?
Well, after the outage, the limit for in process would be set at 50(?) until just before the next
outage when it would be increased for a small window of time. Perhaps post outage for some
period of time an even smaller limit would be appropriate. Experience would tell us yes or no.
The whole point of having a cache, is to help people through outages. While increasing the level of the just prior to the weekly outage cache will help people through it, that will only be the case if things are working OK at that point, often they aren't. And often problems occur at other times and it would be nice to be able to get through those outages as well.
I would prefer to actually fix the problem (hardware is no longer up to the job), than implement yet more workarounds that no one will be happy with.
Grant
Darwin NT
ID: 2033957 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2033961 - Posted: 26 Feb 2020, 3:47:13 UTC - in response to Message 2033861.  

. I 'spoke' to Mr Kevvy about moving the off topic messages from the panic thread to here, just to clean up the mess there is nothing else. He is willing to take care of it if we/someone tags them.

Uggh, that's 3 pages worth of posts. I certainly would not like to have 225 messages about post moves and deletions show up in my PM basket.

Let the lie where they are.


. . OOPS!!!

Stephen

:)
ID: 2033961 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033964 - Posted: 26 Feb 2020, 3:53:11 UTC - in response to Message 2033915.  

Only because it is probably on 24/7.
If it were on 24/7 it would have a much higher RAC as it is processing normal non-noisy WUs in under 2 days.


it took over 2 hours to find a noise bomb,
if you look at the other systems that processed the WU as well you'd see that it took an i7 desktop system 5min to find it as well, and an i7 Laptop 22min.
So it wasn't actually a noise bomb, more of an early overflow.
Grant
Darwin NT
ID: 2033964 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2033966 - Posted: 26 Feb 2020, 3:57:58 UTC - in response to Message 2033903.  

Why are the initial assumptions no longer relevant?
Is it wrong to assume that there are hosts out there that are only switched on for a few hours/day and because their performance is low are set to suspend when keyboard or mouse is detected.

I know for a fact that my youngest's desktop is not switched on at least 3 days per week and on Saturdays is usually only on for a few hours in the afternoon.


. . That begs the question ... is that PC running SETI?

Stephen

? ?
ID: 2033966 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2033968 - Posted: 26 Feb 2020, 4:01:05 UTC - in response to Message 2033964.  

So it wasn't actually a noise bomb, more of an early overflow.

Correction: LATE overflow
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2033968 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2033969 - Posted: 26 Feb 2020, 4:02:59 UTC - in response to Message 2033906.  

Thanks Richard. I knew that VLAR as a tag was redundant (I'm blaming low caffeine levels for typing VHAR in my post Richard refers to)
So there is already a correction of deadline for "anticipated run time" in place, all that needs to happen is the reference value be adjusted to give a lower deadline.
But when thinking about how low to take the deadline remember that V-K-666's host is probably in the top 1000, so not that far down the list and we are seeing people who only run their computers part-time on SETI.


. . And I am assuming those numbers would be hard to assess. I know the schedulers keep those stats (percentage of time PC is on and percentage of that time when SETI is running) but would those stats be available as on overview of all hosts?

Stephen

?
ID: 2033969 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033971 - Posted: 26 Feb 2020, 4:06:45 UTC - in response to Message 2033968.  
Last modified: 26 Feb 2020, 4:09:59 UTC

So it wasn't actually a noise bomb, more of an early overflow.
Correction: LATE overflow
For me, a late overflow is one that occurs well after the WU has started, but still a significant time before it was due to end.
Noise bomb- matter of seconds (extremely early overflow).
Early overflow- at least several minutes in to crunching.
Mid overflow - sometime between a bit before halfway & a bit over half way (never actually noticed one).
Late overflow- well in to crunching (past half way), but a significant time before it's estimated completion time.
Grant
Darwin NT
ID: 2033971 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2033974 - Posted: 26 Feb 2020, 4:55:38 UTC - in response to Message 2033971.  

it took over 2 hours to find a noise bomb,

That tells me the task was running for 2 hours before it finally overflowed.

Makes both mine and your definition of a late overflow.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2033974 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2033981 - Posted: 26 Feb 2020, 6:11:19 UTC - in response to Message 2033974.  

it took over 2 hours to find a noise bomb,
That tells me the task was running for 2 hours before it finally overflowed.

Makes both mine and your definition of a late overflow.
Not if the estimated running time was 2 days (i don't know what it was, but it was probably over 24hrs).
It took an i7 desktop system 5min to bailout. On that system even a Shortie would probably take 50min, so that was an early overflow in my book (even if a shortie takes 30min, 5min is early). If it ended with 30 spikes/pulses/whatever after 35 or 40min (with a 50min estimated runtime), that would make it a late overflow.
Grant
Darwin NT
ID: 2033981 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2034017 - Posted: 26 Feb 2020, 13:15:40 UTC

Nobody ask my question: How do we raise the request to evaluate the squeeze of the deadlines to the Seti powers?

By look at the SSP page we could see:

Results returned and awaiting validation 0 39,153 13,810,892
Results out in the field 0 61,160 5,333,622

Something must be done..... and quick!
ID: 2034017 · Report as offensive     Reply Quote
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 2034018 - Posted: 26 Feb 2020, 13:20:30 UTC - in response to Message 2034017.  

I wouldn't be at all surprised if the project admin are aware of the issue & as such, they prioritise issues. It just depends where your question is on the priority list.
ID: 2034018 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 16 · Next

Message boards : Number crunching : About Deadlines or Database reduction proposals


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.