Message boards :
Number crunching :
Panic Mode On (109) Server Problems?
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 36 · Next
Author | Message |
---|---|
Stargate (SA) Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0 |
Yup..same here getting dribs and drabs..I don't get many as only have one computer :/ |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'm thinking of doing the same and refill the cpu cache for later move to the gpus. But I am also very concerned about upsetting the APR and getting into the run time exceeded pitfall. I have never had one of those errors but I don't relish spending the time and energy on a task only to have it thrown out because some artificial time limit was exceeded. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Yeah, I agree. The same thing happens when a task gets that blasted "finish file present too long" error. All that wasted processing because BOINC has that inexplicable time limit built in. Anyway, it looks like non-VLARs are flowing again, so I think I'll go grab some for my other two Linux boxes. (If nothing else, my earlier experiment cleared 600 Arecibo VLARs out of the RTS buffer. You're welcome! :^P) |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
But I am also very concerned about upsetting the APR and getting into the run time exceeded pitfall. APR is very broken, averaging the CPU run times with the GPU is insane. They should be calculated separately. This is not a problem with Seti this is a flaw in Boinc . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
APR is very broken, averaging the CPU run times with the GPU is insane. They should be calculated separately. Mine are, but I don't reschedule so they don't affect each other that way. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
APR is very broken, averaging the CPU run times with the GPU is insane. They should be calculated separately. But the case we are talking about is not rescheduling per se, but in bunkering for the outage. If you move tasks for whatever reason, you open yourself up to the issue of exceeding the task time limit if run on a different device other than the one it was sent to originally. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
APR is very broken, averaging the CPU run times with the GPU is insane. They should be calculated separately.As far as I know they are, but the calculation is based on the application that they're originally assigned to, not the app that they may actually end up running on. That's why rescheduling tends to get them out of whack. A task that was assigned to a CPU app with an estimated run time of, say, 2 hours, may run on a GPU in 5 minutes. That causes the scheduler to increase the APR and decrease the estimated run time for the CPU app and future tasks that are assigned to it. So, tasks that are assigned to the CPU, and then actually do run on the CPU, end up taking much longer than the estimates. When the disparity gets too great, "run time exceeded" errors result. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, it appears the Arecibo VLAR storm is tapering off and I am seeing BLC tasks available again. Only the linux machine is still having troubles getting the gpu cache refilled so I can bunker them to the cpu for tomorrow. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
Grant, Mine are, but I don't reschedule so they don't affect each other that way.how do you do that? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
Grant,Mine are, but I don't reschedule so they don't affect each other that way.how do you do that? I don't reschedule. As Jeff pointed out, the calculation times are based on what hardware they allocated to initially by the Scheduler. When you move them to a different application to process them, the estimated times aren't re-calculated & things end up getting messy. Grant Darwin NT |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
Grant, I don't crunch enough Seti for that to be a problem here but over at Einstein it can be a major PITA for me. The GPU tasks running 2 @ a time take 30 min and the APR can be over 2hrs. The CPU tasks are averaged down to 1/3 of less of their actual time. That gets me into a situation where I get deadline problems on the CPU causing Boinc to limit GPU usage in order to try to allocate as much CPU as possible in order to meet the deadlines for the CPU tasks. I run a very small cache, 1.4 days. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13847 Credit: 208,696,464 RAC: 304 |
The GPU tasks running 2 @ a time take 30 min and the APR can be over 2hrs. ? APR (Average Processing Rate) is measured in GFLOPS, and yes running more than 1 WU at a time results in it indicating you're doing less work than you actually are. Actual WU processing times both CPU & GPU, at least for me here on Seti, are generally pretty close to the estimated time. Sounds like Einstein have an issue with how they determine estimated processing times. The GPU tasks running 2 @ a time take 30 min and the APR can be over 2hrs. The CPU tasks are averaged down to 1/3 of less of their actual time. That gets me into a situation where I get deadline problems on the CPU causing Boinc to limit GPU usage in order to try to allocate as much CPU as possible in order to meet the deadlines for the CPU tasks. Might be worth checking out the Einstein forums for some optimisation tips- here at Seti, for the SoG application 1 CPU core is required for each GPU WU being processed. For the older Nvidia applications it's not necessary, although they are much slower. Looking at your systems, if their GPU application also requires 1 CPU core to support each WU being crunched, then I can see you running into insufficient CPU resources on 2 of your systems. Here on Seti there generally isn't much point in using the Intel integrated GPUs to crunch, as the heat they produce, and cache contention with the CPU, actually results in less work being done than just running the CPU cores alone. I don't know how well the Einstein application performs on iGPUs- try the systems without them processing and see if things get worse, improve, or makes no difference. If they improve or make no difference, then with those iGPUs disabled you won't run in to low available CPU resources. Grant Darwin NT |
betreger Send message Joined: 29 Jun 99 Posts: 11414 Credit: 29,581,041 RAC: 66 |
I don't use IGPUs I found them to be evil. Sounds like Einstein have an issue with how they determine estimated processing times. I don't know either but their times are very consistent, the variance is quite small. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Absolutely no problems whatsoever. Getting new task at every request. . . No, no promises, but if they want volunteer processing power to process all the data coming in then they need to be able to disseminate that data to the hosts out there trying to process it for them. If there is no work coming out for a machine to process why have that machine powered up and online? If you want people to crunch for you, you have to get the work to them to crunch. Since they have repeatedly stated there is more data than the current horde of volunteers can process them for them and they need more, how is that going to work if they cannot keep the work up to the ones they already have? It isn't a matter of them owing us the work, simply practicality, we cannot do the work if they cannot get it to us. Stephen <shrug> |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Been there, and it sneaks up on you quicker than you might think. After just two weeks of doing that the APRs for the CPU looked OK at over 7 mins per tasks, and that is almost the length of time the Arecibo VHAR tasks were taking on the GPU, but when moved to the GPU Q the estimates dropped to mere seconds (which I failed to notice) and anything that took about 7 mins or longer timed out. Killed that option really quickly. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yeah, I agree. The same thing happens when a task gets that blasted "finish file present too long" error. All that wasted processing because BOINC has that inexplicable time limit built in. . . Thanks for falling on your sword :) Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
APR is very broken, averaging the CPU run times with the GPU is insane. They should be calculated separately. . . Yes, the problem is that CreditScrew or whatever, does not read the result file info fully, so it does not take note of which device actually did the processing, only to which device it was despatched. And therein lies the cross stream corruption of the APRs when re-scheduling from CPU to GPU, it happens the other way as well but that does not create time outs. Stephen :( |
rob smith Send message Joined: 7 Mar 03 Posts: 22505 Credit: 416,307,556 RAC: 380 |
Both the APR and CreditScrew calculation processes are server side. CreditScrew uses the APR to guess at the cerdit to be awarded in a rather convoluted manner. APR is a measure of expected run time against acieved run time, in FOPs, and uses the assigned processor in its calculation. The value of APR is passed periodically back to your cruncher where it is used to calculate the expected run time in seconds for each task. Timeouts occur when you exceed that time by a factor of ten. Now since everything is based on the expected run time rescheduling, or introducing a new processor to the cruncher, will have an impact on the APR and thus on credit awarded per task. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Well here it is 00:30 AEDT (13:30 UTC) and still no outage. I guess it will be a late start and therefore a very late finish today. . . These rolling outage start times of late make it a little unnerving ... :( Stephen ?? |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
It all depends if someone left a full pot of coffee from the night before, or if they need to make a fresh pot :D I'm as full as I can get, so bring it on :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.