Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 87 · 88 · 89 · 90 · 91 · 92 · 93 . . . 94 · Next
Author | Message |
---|---|
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
at some point, the project just needs to move on. at this point in time with computational power available, it's unreasonable to wait 6+ weeks for someone to return a WU. if they haven't returned it by two weeks, then it should be abandoned and let someone who's actually willing to do the work process it. many other projects have much shorter deadlines, and I don't see anyone (much less the hordes of what is being called "most" users) complaining that they can't participate because of it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
It would be grand if the project could meter work allocation out to the host computers based on their ability to return processed work. . . Sadly that is a problem. But if an index were created for each host based on the daily return rate of that host this could be applied to work assignment. That would take time to construct and probably be very difficult to incorporate into the current systems. So is very unlikely. :( Stephen < shrug > |
![]() ![]() ![]() Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 ![]() ![]() |
Should they be denied to participate in something they find interesting, just because the 24/7 club don't like it when they can't get thousands of tasks every day? No, everyone should be able to participate as much as they wish to. I just wish the servers and database could accommodate all the interest. Perhaps, setting amount of tasks based on average turn around time covers machine speed and on time. For example, if one runs 1 hr/day CPU only, they should need fewer tasks to reach avg turnaround of say 10 days than someone with 8 x 2080 Tis. If I run out of tasks, at least my 24/7 club dues will go down for the month. :) ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
And just look at the runtime differential between the two valid instances.You are comparing stock windows CPU app to Linux Special Sauce GPU app running in a Turing card. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I've been rooting around in the scheduler code trying to find places where turnaround time and APR is generated. That is known for every host. So if you know those parameters for every host, you should be able to generate a priority list of which hosts should get the majority of work and clear the database the fastest. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
Richard on a bright note thanks for helping remove 5 results from the system ![]() |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
One way to discourage oversized cache would be to include the turnaround time in credit calculation. Return the result immediately for max credit and longer you sit on it, the less you get. Having a two week cache would be lot less cool if it hurts your RAC ;) |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
With a simple look at the SSP you see: Results returned and awaiting validation 0 35,474 14,150,778 I never said my host is a super fast one, i use an old CPU and a relatively slow GPU for today`s standards. But following your example, my host has a close to 10K WU buffer and all it buffer is crunched in less than 1 1/2 day. Fastest hosts does the same in less than 1/2 a day. That's is why we use such large WU cache buffer. Your host has a buffer of about 15-20 WU and it's crunch that buffer in about the same 1 1/2 day. So your & mine buffer are in the range i suggest 1-2 days max When i say fast/slow host is a host with low/high APR, not actually related to the CPU or GPU speed. Then why a host who crunch, let`s say 3 WU/day, return only invalids or has a low APR needs a 10 days or a 150 WU buffer? Now imagine a host who crunch less than 1 WU/day and has an APR of 10 or more days (there are 1000's) and has an up to 150 WU cache? Sure a large impact at the DB than your/mine host. That is why i try to explain. ![]() |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
One way to discourage oversized cache would be to include the turnaround time in credit calculation. Return the result immediately for max credit and longer you sit on it, the less you get. I see where you are coming from. I believe the only way you can return a result "immediately" as if it is a noise bomb (runs for 10 seconds) and is started as soon as it is downloaded. I cannot see any other way to return a result "immediately" ![]() |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
at some point, the project just needs to move on. at this point in time with computational power available, it's unreasonable to wait 6+ weeks for someone to return a WU. if they haven't returned it by two weeks, then it should be abandoned and let someone who's actually willing to do the work process it. . . Lets consider a very old computer in contemporary terms, something like a Core 2 Duo or Core 2 Quad ( I have and am using one of each). Even without a SETI usable GPU the machines could process from 1 to 4 WUs at a time on their CPUs. So taking the worst case (the C2D) doing one WU at a time, it would take between 2 and 3 hours to process a WU, allowing it to get through about 8WUs per day. Lets assume the owner is on a dial up connection (is there actually anyone who is?) and only calls in once a week. They have the current task limit of 150 WUs (10 days + 10 days, now that might actually meet the definition of greedy) and each week they call in and return their yield of say 55 WUs. A 3 week deadline would still allow them to 'participate' without any other restrictions compared to ALL other users. So why 8 or 12? In reality of course to actually participate they only need to set their work fetch to cover their return period of 7 days but lets allow some margin and say the full primary fetch of 10 days without the additional. Say about 80 WUS, then only a 2 week deadline would really be required. Are there any hosts out there actually as slow as that, much less slower than that. I can find no logic or reason in the claim that such long deadlines are required to allow people to participate. Even with this hypothetical dial up scenario if they called in every other day they could 'participate' even with a 1 week deadline. . . Just how low does the bar have to be set? Stephen ? ? ? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
That's what I was drawing attention to! The project finds itself where differentials like that exist (and the CPU in question is a AMD A10-9700, no dinosaur). It probably no longer has enough tools to manage every contingency.And just look at the runtime differential between the two valid instances.You are comparing stock windows CPU app to Linux Special Sauce GPU app running in a Turing card. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
One way to discourage oversized cache would be to include the turnaround time in credit calculation. Return the result immediately for max credit and longer you sit on it, the less you get. . . Immediately can only ever be a relative term, since even if your cache is empty and you only received one WU on an RTX2080ti which completes in 30 secs, your return time would be nearly one minute. But in context lets assume that a few minutes to a few (2 -3) hours would satisfy the idea of immediately. I'll restate that my personal target is 12 to 24 hours. And I still see no need for more than that. Stephen < shrug > |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
When the time scale is the 7 week deadline setiathome is using, then anything within the first couple of hours is pretty much 'immediately'. The shortest time in which you can return anything without manual micromanagement is the 5 minute cooldown between scheduler requests. Most non ancient GPUs can process at least one setiathome task in that time even when it isn't a noise bomb.One way to discourage oversized cache would be to include the turnaround time in credit calculation. Return the result immediately for max credit and longer you sit on it, the less you get.I see where you are coming from. I believe the only way you can return a result "immediately" as if it is a noise bomb (runs for 10 seconds) and is started as soon as it is downloaded. I cannot see any other way to return a result "immediately" The average turnaround of all setiathome users is about 1.5 days. Make results returned in 1.5 days give the current credit, results returned exactly at the deadline give zero credit and interpolate/extrapolate linearly using those two fixed points to get the multiplier for other times. So if you return faster than 1.5 days, you get a few % more credit than you now get. Or an alternative -Â make it a race: Return the task before your wingman for a bit of extra credit ;) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
One way to discourage oversized cache would be to include the turnaround time in credit calculation. Return the result immediately for max credit and longer you sit on it, the less you get. GPUGrid rewards fast turnaround hosts with 50% more credit if returned within 24 hours. 25% more credit if work is returned within 48 hours. The same could be implemented here. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I see where you are coming from. I believe the only way you can return a result "immediately" as if it is a noise bomb (runs for 10 seconds) and is started as soon as it is downloaded. I cannot see any other way to return a result "immediately" Actually you can. If you set: <report_results_immediately>1</report_results_immediately> in the cc_config.xml file. From the client configuration wiki. <report_results_immediately>0|1</report_results_immediately> But early overflows that run for only 15 seconds still would get reported at each 305 second scheduler connect interval. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 6 Nov 99 Posts: 717 Credit: 8,032,827 RAC: 62 ![]() ![]() |
it's not really a problem of the speedest ones against the slowest ones but ... 1_ slowest ones are a lot, the cross validated pending is not a problem .. ( slow CPU and core count , slow GPU ) ( like mine but run 24h a day , 7 day a week , 52 weeks a year and a whole life i hope ... 2_ the intermediates ones (multi core CPU and a fast GPU ) are not the problems too, they are in sufficient number to have cross validated pending with another one intermediate , 3_ the problem is that are only a few fastest ones ( multi core CPU multi fast GPU ) that can't in fact cross validated pending with another fastest computer ... they have to wait after the 2_ and 1_ some possible solutions . . . increase the number of fastest hosts 3_.... ( not even possible , money cost, electricity cost, maintenance to do , hard configuration etc ) and server side problems to feed all these 3_ split the fastests 3_ to 2_ ... more intermediates and higher intermediates to increase the cross validated pending between them , easyer to configure, to supply etc .. eliminate all the slowest ones .... against the values of the scientific project that is SETI or all the proposals you've mades before me :D (sorry for the language mistakes ) :p |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Looks like the current panic is ending. My hosts are only a few tasks short of having full caches now. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
With a simple look at the SSP you see: Results returned and awaiting validation 0 35,474 14,150,778Actually about 9 million of those 14 million results are in there not because someone is still crunching them but because the corresponding workunit is stuck in assimilation queue. |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
The reason for the longer deadlines is that the project has always wanted to keep those with old slow computers still able to contribute to the project. I think this is a valid point. I didn't think of the people who have slow machines and are only on for a few hours every day. I would love to see the stats of machines in the category and see IF the return time allowance could be shortened. I want to be as inclusive as possible, but I'll be honest that if the number of machines in this category is small then it might be better to sacrifice a few participates to maybe get even more. How many people join the project, but quit because it isn't stable and they can't reliably get WUs every week?? How much shorter would the Tuesday outage be if the db was a better size? I want seti to be run by as many people as possible. This project is not only about finding the alien, but also PR and making people feel a part of something, but what if we are losing more people by not changing the system to run better?? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Most people do NOT run their computers 24/7, and therefore a slow computer that runs maybe only a few hours/week can take a very long timeAnd for extremely slow rarely on systems, 1 month is plenty of time for them to return a WU. It's actually plenty of time for them to return many WUs. While having deadlines as short as one week wouldn't affect such systems, it would affect those that are having problems- be it hardware, internet, power supply (fires, floods, storms etc). A 1 month deadline reduces the time it takes to clear a WU from the database, but still allows people time to recover from problems and not lose any of the work they have processed. Grant Darwin NT |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.