Message boards :
Number crunching :
Panic Mode On (109) Server Problems?
Message board moderation
Previous · 1 . . . 32 · 33 · 34 · 35
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Well, I found it necessary to make Post 58550. Read through to Post 58572, and note his titles. Well, even Project Scientists and Developers are human. Witness our own Eric K. and the recent spate of typo errors. I do remember they (MW) having issues initially with the n-body mt application but they sorted it out evidently and I didn't follow any of the threads since as I stated I don't do MW cpu work. I haven't seen many posts about n-body issues other than host configuration questions. And the mt documentation must be mostly stable and understood by now as the mt cpu app deployed at GPUGrid just this month by a student had a relatively easy startup. It was nice to find it obeyed the app_config core usage setting so it didn't hog all cores on my Ryzen 1800X. I am using 4 cores to process the cpu tasks leaving the other cores for Seti cpu tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW. Does that changes? GPUGrid with it's long time to crunch WU is not really a project to use as a backup. IMHO ![]() |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
MW is the most set and forget project I have run. I never have to micromanage it at all.I only keep a backup available on one of my crunch-only machines, just to make sure it maintains a little heat in the bedroom on chilly nights when SaH runs out of work. My first choice is Asteroids, but they're often out of work, too, so I added MilkyWay as a backup to the backup. The last time it ran on Windows was about 3 years ago. It worked fine. But that machine is now Linux, and when MilkyWay kicked in one night a couple months ago, it turned out to be a colossal waste of time. I don't remember how many tasks it ran, but when I checked the results the next day, I found that all but one of them had been marked Invalid. I think they all ran to completion without throwing any errors, but it was all just wasted electricity (except for the little bit of extra heat). I never did try to figure out what might have happened, just turned off MilkyWay and added Einstein for the next time that both SaH and Asteroids ran out. |
![]() ![]() Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 ![]() ![]() |
MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW. Milkyway used Double Precision calculations. Most GeForce GPUs are allowed DP performance of 1/32 SP Radeon GPUs are allowed DP performance of 1/16 SP So if both GPUs were 6000 GFLOPS in Single Precision the GeForce would be 188 GFLOPS DP and the Radeon 375 GFLOPS If you move to the workstation GPUs they will have DP performance of up to 1/2 SP. Which is likely why they have 4 digit price tags. SETI@home classic workunits: 93,865 CPU time: 863,447 hours ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
MW is fine, but IIRC to do something productive there you need to have an AMD GPU, NV stuff has troubles to work with DP used by MW. No, MW has no issue with Nvidia as long as the card can do double-precision. Probably any card greater than Kepler or avoiding the lowest denomination of any family. Nvidia doesn't have the same degree of performance in double-precision as ATI/AMD but they still work fine. I do a Gamma Ray Binary Pulsar task in 190 seconds and get awarded 227 credits for it. The credit is static for all task types. The longest running GPUGrid gpu task so far I've run was 8 hours and was awarded 387,150 credits. The shortest task was 3 hours. The longest cpu task was 1 hour and the shortest 20 minutes. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
In progress back up around 5 million, Received-last-hour back over 120k. WU-awaiting-deletion climbing, splitter output dropped down to lower level again. Will clearing awaiting-deletion fire up the splitters again? We'll just have to wait and see! Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
In progress back up around 5 million, Received-last-hour back over 120k. NEWS at 10! Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Around 4min 55sec on my GTX 1070s for the current BLC_02s, now we have some BLC_02s that aren't VLARs. About 50sec quicker to crunch, but they do cause some noticeable system/display lag. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
And there we go. Awaiting-deletion backlog cleared, splitters crank out the WUs. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I'd say that is pretty convincing evidence that the two are directly linked. If you overlay the graphs, they are coincident. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'd say that is pretty convincing evidence that the two are directly linked. If you overlay the graphs, they are coincident. Correlation isn't causation, but yeah, when returned-per-hour hits it's present highs, and work-in-progress gets right up there, the deleters & splitters certainly aren't able to both run at 100% at the same time; the splitters crank out the work, the deleter backlog grows. It gets to a certain point & the splitters slow down and stay there till the delete backlog clears. And it's continued to occur after the weekly outage. It's choking on it's own I/O. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I'd say the administrators need to shorten up the cron job interval on the deleters purge task so that we could maintain a higher average RTS buffer quantity. Or if the purge is threshold based, to lower it. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'd say the administrators need to shorten up the cron job interval on the deleters purge task so that we could maintain a higher average RTS buffer quantity. Or if the purge is threshold based, to lower it. I think it's just a question of I/O congestion. The deleters run all the time, however with the current rate of work return and the current rate of WU splitting required to keep that rate of return going, there's so much I/O contention that the deleters can't keep up. Eventually the I/O contention gets to such a point that the output of the splitters falls away, but the deleters still can't keep up with the load, so the backlog continues to grow. Eventually it gets to the point where the deleters are able to catch up & clear the backlog, then their reduced level of I/O allows the splitters to crank back up again; till the delete backlog & load reaches that trigger point & the splitter slow down again. Rinse and repeat. The combination of returned per hour, in progress, awaiting deletion & required splitter output is resulting in a huge amount of I/O, which is more than the servers can actually meet. So you end up with these moving trigger points where one function slows down and the other speeds up, then it slows down & the first one speeds up again. And back & forth they go. That's my speculation based on minimal facts. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post. [Edit] Found it. By Rob Smith Message 1913582 Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
That's probably a difference between Main and Beta. Beta certainly doesn't purge the database continuously - Eric likes to keep older tasks visible for comparison and retrospective bug-hunting. Main, on the other hand, needs to clear the decks within 24 hours or we're swamped.I think it's just a question of I/O congestion.OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post. Got me curious too. Generally (when things have been working well), the number of WU awaiting Validation, Assimilation and Deletion is generally around 0, occasionally 1-3 (emphasis on when everything is working OK). So even if they don't run all the time, they run when there is something to do. Which is pretty much all the time (especially with 145k results being returned per hour). Looking at AP, where the return rate is less than 1 per minute at the moment, the WUs awaiting Validation, Assimilation & Deletion are around 1, with periods of 0 & a few periods of 2 or 3. It could be they run all the time, and those values of 1-3 are at the time the data is read, before the WU is processed. Or it could be as you say- they don't run all the time, only when there is work to be done. Either way, it means the MB WU Validator/Deleter/Assimilators are running (effectively) all the time with 40/s there to be processed, as the values there were usually around (or very close to) 0. Grant Darwin NT |
![]() ![]() Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 ![]() ![]() |
Panic Mode On (110) Server Problems? Now open for business |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.