Message boards :
Number crunching :
100 WU limit for GPUs is too low.
Message board moderation
Author | Message |
---|---|
BoMbY Send message Joined: 3 Apr 99 Posts: 8 Credit: 759,919 RAC: 0 |
As I just learned there is a maximum of 100 WUs per compute device, or device class, and with an average of about 4-5 minutes per WU, that's a maximum of about 8 hours of work I can store, and with the splitters out all the time recently, this really doesn't cut it. I'm already out of work for two hours with the current outage. |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
As I just learned there is a maximum of 100 WUs per compute device, or device class, and with an average of about 4-5 minutes per WU, that's a maximum of about 8 hours of work I can store, and with the splitters out all the time recently, this really doesn't cut it. You are preaching to the choir. Old news, my friend. This has been discussed ad infinitum here. For many with faster GPUs, 100 WUs/GPU runs out much sooner than that. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
As I just learned there is a maximum of 100 WUs per compute device, or device class, and with an average of about 4-5 minutes per WU, that's a maximum of about 8 hours of work I can store, and with the splitters out all the time recently, this really doesn't cut it. A cache of 100 tasks is only about an 8 hour cache for my dual E5-2670, or R9 390X as well. When we have shorties it is even less. There are a few options to handle running out of work during an outage. 1) Have a backup project in BOINC. 2) Let your system(s) have a break when they run out of work. 3) Not run the project. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
marsinph Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0 |
Since a few weeks/monts, there are server outage. We all know it. With result that a lot of cruncher are out of work after a few hours or day We all known that it is a limitation of 100WU/CPU and 100/GPU. Not depending of 10 days WU and 10 days additionnal WU. It nothing change. Still 100 CPU and 100 GPU. I understand the limitation because some users download WU, then nothing more do with results of many WU "out of time". More and more we have powerfull computers and GPU. For a lot of us, we runs thoses WU in less than 24 hours. In case of outage, we always are out of WU. Why not increase the limitation based on the RAC or based the turnaround time average (better i think) ??? To be clear : basic (new) user, max WU amount 100. Based on 10 days Average 1 day : 1000 WU Or turn around average of 5 days, then 500WU |
BoMbY Send message Joined: 3 Apr 99 Posts: 8 Credit: 759,919 RAC: 0 |
Why not make a simple rule and set it to at least a maximum of WUs that last 24 hours? That would be a very simple calculation. Okay, maybe not at the first day the device is running, but maybe if it was running for at least 99% of the time for the last X days? All the data should be readily available. |
marsinph Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0 |
As I just learned there is a maximum of 100 WUs per compute device, or device class, and with an average of about 4-5 minutes per WU, that's a maximum of about 8 hours of work I can store, and with the splitters out all the time recently, this really doesn't cut it. Hall, so if i understand enough, you ask to not more run SETI. OK I wil do and follow your proposal !!! Consider my suggestion about increase max alowed WU for veru acive users (like you) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
The work unit limit was deliberately set like this years ago by the project scientists to avoid stressing the fragile Informix database upon which the entire project utterly depends with too many work units in progress at once and other similar "stressers" on it. It's been discussed to death, and it isn't going to be changing in the foreseeable future. Also, they don't read this forum that we know of. So, best to make one's peace with it as the way it is. By the way I got about 2-3 hours of work units max. cached when they were Arecibo ones.... I get about 4-5 from the BLCs which is a slight (and accidental) improvement. We're all in the same boat. |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Is there a 'better' database to use than Informix? What does Google use, for example? SQL by MS? And is it practical to switch? Though I have heard of Informix for quite some time, I just Googled it and found that it is an IBM product. I guess I am old, but I thought that IBM had a reputation for doing big iron, and a solid DB? Or has that ossified over the years? *edit* Just took a closer look at the quick blurb on Google, and this caught my eye: Stable release: 12.10.xC7 / June 15, 2016. I can appreciate having something just work if it ain't broke, but I don't know, it seems to me that over 18 months between stable releases seems like a pretty long time in the compressed IT world we live in now. But then again, what do I know? ;-) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Is there a 'better' database to use than Informix? What does Google use, for example? SQL by MS? And is it practical to switch? Though I have heard of Informix for quite some time, I just Googled it and found that it is an IBM product. I guess I am old, but I thought that IBM had a reputation for doing big iron, and a solid DB? Or has that ossified over the years? It is the BOINC master database which uses mySQL rather than the science databases that use Informix. The issue is more how SETI@home uses the database. For all of the tasks assigned to hosts, "results out in the field", the task information is stored in a single table. So when tasks are assigned, reported, or a host doesn't an update then that table is accessed. For the hardware running the db the limit of that table turns out to be around 11,000,000 rows. Then it can not longer complete a query fast enough. Which causes the db server to stop and the project to be offline for several hours if not days which recovery is run. So the solutions are 1) Have more hefty master and replica db servers. 2) Recode how the results are sent to hosts are stored. 3) Limit the number of tasks sent to hosts. #1 is a stop gap until that hardware hits its limit. #2 is probably the best solution, but requires dedicated time that isn't available to the project. #3 easiest to implement and can be adjusted if the db server gets near the limit again. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30608 Credit: 53,134,872 RAC: 32 |
Is there a 'better' database to use than Informix? What does Google use, for example? SQL by MS? And is it practical to switch? Though I have heard of Informix for quite some time, I just Googled it and found that it is an IBM product. I guess I am old, but I thought that IBM had a reputation for doing big iron, and a solid DB? Or has that ossified over the years? #3 is the opposite of the thread title. One other thing that can be done is to shorten the amount of time until a report is due. The should shorten the queue of workunits out in the bushes and also let more workunits validate faster. However as Seti allows phones to crunch, I'm not sure how much shorter it can be before many of those low crunch able phones would start missing deadlines. Perhaps some sort of dynamic deadline based on the reported FLOPS? |
marsinph Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0 |
Hello Gary, also a nice idea about flops, but not forget that some of us let computer run a few hours/day but with very powerfull computer. Some less powerfull, but 24/24 and 7/7 (my self) So not easy ! Is there a 'better' database to use than Informix? What does Google use, for example? SQL by MS? And is it practical to switch? Though I have heard of Informix for quite some time, I just Googled it and found that it is an IBM product. I guess I am old, but I thought that IBM had a reputation for doing big iron, and a solid DB? Or has that ossified over the years? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
However as Seti allows phones to crunch, I'm not sure how much shorter it can be before many of those low crunch able phones would start missing deadlines. Perhaps some sort of dynamic deadline based on the reported FLOPS? It has already been discussed in another thread. Android phones aren't that slow in returning tasks, just the opposite in fact because they are always on. The worst offenders are the normal PC's with low usage, that is only turned on for a few minutes a day/week/month. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
marsinph Send message Joined: 7 Apr 01 Posts: 172 Credit: 23,823,824 RAC: 0 |
Hello, Like I wrote, why not a variable amount of WU based on computer average turnround time. So to increase cache and also to not overload the system Basic start : computer run 24/24 and 7/7 For beginner (RAC and/or credit : 0 ) : 50 WU (there are so much WU out in field waiting crunching) Then actually 100WU divide by avg turnaround day x 10 days work (or depending the settings) / 2 to let WU for evryone (unless splitter will never produce enough work for everyone It will give in this case 500 WU The same powerfull computer working 12 hours a day will also have a turnaround of 2 days So it will receive 250 WU With this easy calculation it not depend the power of computer. A PC power 10 with avg 1 day will receive the same as a twice powerfull with avg 2 days (perhaps it works 6hr/day) A PC 1/2 powerfull but avg 0.5 day also the same perhaps oy works 24/24. With such calculation, everyone have the same cache time depending crunching, depending crunching. How more sent WU, how more cache. Of course it not need a calculation to ten or twenty decimals. Only two decimal is enough. For the one with super powerfull calculator, no problem. And if one of it crashes, there will be more results "out of time". OK. BUT !!! I really think it will be less than thousand of computers with 100CPU and 100GPU, who have never run any tasks !! For beginners, the turnaround will very fast increase depending crunching. Advantage of such way of going : reduce amount of WU out in field or waiting validation. ( I have WU waiting calculation since october!!!) I am sure a lot of you too. It will also reduce the servers "waiting validation" about 3,000,000 WU !!! |
Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482 |
Actually, TBH, my bigger issue with some of my machines is that the 100WU limit for CPU's is more of a problem for me. This Tuesday, I noticed that the project went off line at 7:30ish, and by 10, my higher core crunchers are empty of CPU tasks, but my GPU's seem to last till earlyish afternoon. Of course, having dual CPU's doesn't help matters, but if some of the suggestions above could be implemented, like raising the limit based upon some combo of # of tasks returned valid, and number processed per day, that would help matters. Of course, if our database can't handle it at this time, which has been discussed, that'll have to wait till it is upgraded. Maybe one of these days we will be doing like Einstein is next week, bringing 'er down and (re)building it up. All that is needed is personnel and $, sadly... Say, maybe they can loan us some of their expertise and willing hands once they've completed their upgrade! ;-) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I agree, I think the 100 task per cpu limit needs to be urgently updated. More so than the gpu limit. The gpu caches will always be exhausted first no matter how many tasks are allowed. If you still had cpu tasks to work on at least your host wouldn't go cold. With increasing number of multi-core or multi-cpu hosts, the problem is just getting worse. Anybody with dual Xeons or Threadrippers or Ryzen 7's are finding they are out of work only 4-6 hours after the project outage starts. They then have cold machines for the next 4-6 hours. Maybe we should try and persuade the Einstein administrators to publish a post-mortem on their server upgrades. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Bill G Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 |
My TR runs out of CPU work in 3 hours, while my slow GPUs X3 usually makes it through the maintaince period. It will be interesting this coming week to see what happens. When I load work from Einstein to the CPU that work is taking 11 hours per WU. SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Yes those n-body and Continuous Gravity Wave cpu apps seem to take forever. I've stayed away from them so far. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have finished two climatepredictio.net tasks on the Windows 10 PC. Each took more than 5 days, running 24/7 and so far I got no credits. They are given only once a week or so. So don't blame CreditNew. Tullio |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 715 Credit: 8,032,827 RAC: 62 |
I have finished two climatepredictio.net tasks on the Windows 10 PC. Each took more than 5 days, running 24/7 and so far I got no credits. They are given only once a week or so. So don't blame CreditNew. Créé 13 Jan 2018, 22:17:25 UTC Envoyé 14 Jan 2018, 13:21:32 UTC Date limite de rapport 27 Dec 2018, 18:41:32 UTC Reçu 19 Jan 2018, 18:03:15 UTC curious time limite ;) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.