Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 94 · Next
Author | Message |
---|---|
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
The replica is still 11 hours behind so it's likely that you have no ghosts at all. ;-). Ah, right. Didn't even notice that part. Just wanted to make sure something wasn't broken on my end. I see 700k for RTS.. so I figured I'd get at least one or two here and there. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Wiggo Send message Joined: 24 Jan 00 Posts: 36857 Credit: 261,360,520 RAC: 489 |
Also the return rate is also down to less than 1/10th of normal due to Anonymous platforms being ignored. ;-)The replica is still 11 hours behind so it's likely that you have no ghosts at all. ;-).Ah, right. Cheers. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes it does only show a single card now so it either didn't reinitialise after a reboot and/or Iona removed it, but with the replica still being a good 11hrs behind only Iona can tell us sooner what her solution was and if it's working (and I didn't waste my time going through a large random selection of her errored and validated tasks Stderr outputs for nothing). ;-) . . Fingers still crossed .... Stephen :) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Also the return rate is also down to less than 1/10th of normal due to Anonymous platforms being ignored. ;-)The replica is still 11 hours behind so it's likely that you have no ghosts at all. ;-).Ah, right. Okay, so something IS broken. I haven't been here in a while and tried skimming through the most recent posts but it didn't look applicable. I did see a comment about "maybe they'll fix it for the new year" though, so maybe that was related. Hopefully things get fixed. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I did it. It has no effect. it looks like you're right. whether it's in the app_config or cmdline text file, it looks like -nobs isnt being implemented. GPU utilization is down. oh well. the price to pay until anonymous platform is fixed. this method is certainly better than the stock apps, and a lot better than nothing. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
betreger Send message Joined: 29 Jun 99 Posts: 11416 Credit: 29,581,041 RAC: 66 |
IMOH, all that can be done from this end has been done. Juan has made a personal sacrifice at the pub, hair has been set on fire, the moon has been howled at and even breath has been held until faces have turned purple. https://boinc.berkeley.edu/dev/forum_thread.php?id=8105&postid=94471 My 2 Seti hosts are now crunching Einstein joining my sole Einstein host so good science is being done here. Much to my great pleasure life remains quite OK. |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
I just shutdown my one current Linux host, and "removed" the app_info.xml from the two Windows machines. The Windows machines immediately started downloading and processing tasks. As I am running the latest Nvidia drivers anyway I may as well leave these two running stock. Obviously getting a massive mix of all the tasks "types", but as yest unable to see any results till the replica catches up. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36857 Credit: 261,360,520 RAC: 489 |
I've got about 6hrs worth of CPU tasks left to do on my old 2500K system before I shut that down in the morning and take it downstairs to strip down and blowout the dust and soot (I can't imagine what I'll find after the last 3 months since its last blowout, but I'm imagining the worse) and then I'll switch over to it before doing the same to my almost as old 3570K main system. Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Which is why I suggested we get "Resend lost tasks" turned off, and then see of Anonymous hosts can get work again.The response time to a request is very slow. It used to be so fast that I couldn't read to keep up with the log, now it pauses for so long, that I wonder if it is still doing something. 20-30 seconds sounds about right.I think the slow response time is purely because this glitch has also turned 'resend lost tasks' back on, when we have a huge number of tasks in the database. There appear to be 2 issues at play, one is a bug in the code that stops Anonymous hosts from getting work when there are long delays with the Scheduler response, and as you determined under certain conditions even with quick responses on another project. The other being the fact that the Scheduler is taking an extremely long time to respond, hence my suggestion to disable Resend lost tasks again & see if that helps with the Scheduler response times (and sorting out the Validation, Assimilation, Deletion, Purge issues may also help). Grant Darwin NT |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 66359 Credit: 55,293,173 RAC: 49 |
10hrs here, it's really hexing. Gpus are on empty. Savoir-Faire is everywhere! The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST |
halfempty Send message Joined: 2 Jun 99 Posts: 97 Credit: 35,236,901 RAC: 114 |
Running stock apps and the systems are crunching again. I don't remember Cuda50 being so painful, but at least they're downloading. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Some initial thoughts on Retvari's workround: 1) Sounds good. I'm going to test it myself later. So far, this is just theoretical. 2) It is likely only to work if you have a single, monolithic, executable. Apps which rely on external libraries - FFTW, CudaFFT - (and many SETI apps do) won't have the right links made in the slot (working) directory. They may work if you can put the libraries in the directory search path. 3) There are only two ways of specifying which GPU to use - command line and init_data.xml. It depends on the API version declared in the app_version. Command line is ancient history and should have been phased out years ago. If the app is checking init_data.xml, multiple GPUs should work as normal. 4) It should be possible to pass command line parameters like -nobs via app_config.xml 5) Disregard Retvari's references to BOINC Manager - it's the client which has to be stopped and restarted. This may require action at the service control level, if BOINC has been installed that way. I'll let you know how I get on, for both Linux and Windows. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
OK, I've applied Retvari's workround on my Linux Mint host running special sauce and the spoofed client. There's good news and bad news. Good news (1): It's running Good news (2): It's picking up the command line. -nobs isn't specifically acknowledged, but I've got an -unroll in there too, and that's reported. Bad news: It's not received the instruction to run on device 0 / device 1 The device number is being passed correctly in init_data.xml, but the special sauce app isn't looking in the right place. I see Petri's app_info.xml doesn't contain an API version specifier, and it worked before, so I assume it's only listening for a command line. That's ancient, and should be corrected. Petri needs to compile against a newer BOINC API library and make the appropriate coding adjustments. But at least my machine is running 2-up on device 0 - so at about half normal speed - and some warmth is creeping back into my workroom. |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Attn: Richard Einstein is showing "Server version 611" on my Linux host. Kevin |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
My Windows box running the standard Seti@Home apps continues to process. I believe that is "normal". After getting advice on another thread I have re-started Einstein@Home to keep my GPUs busy and am running World Community Grid to keep the rest of CPU threads busy. Now that I think I understand a simple re-naming of the app_info.xml file in the Boinc project directory will allow me to process some Seti@Home tasks I may experiment with that on my Linux box(es). I am assuming it is unlikely that anything will get "fixed" this week at the server level. Tom A proud member of the OFA (Old Farts Association). |
Retvari Zoltan Send message Joined: 28 Apr 00 Posts: 35 Credit: 128,746,856 RAC: 230 |
Bad news: It's not received the instruction to run on device 0 / device 1Please try to force your system to ask for cuda tasks only. I think you can achieve that by uninstalling opencl. Judging by the stderr output, if the special app runs instead the original CUDA60 app, it will run on the designated GPU. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Attn: RichardYup. Einstein stopped using the central BOINC server code about nine years ago. They've gone their own way, and made their own updates, without changing the version number setting. They've also never adopted per-app-version runtime estimates, which means that clients try to normalise everything using a single DCF - that can never work. Estimates are all over the place (and jump up and down) if you run more than one application. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
No. It took me bloody ages to get that driver installed (about the first thing I ever tried to do in Linux), and I'm not changing it for a temporary glitch. Running at half-throttle is fine, and kinder on the servers. I have one other host running stock so I can keep an eye on things, and all the others are waiting for the recovery. Which will be fun in itself...Bad news: It's not received the instruction to run on device 0 / device 1Please try to force your system to ask for cuda tasks only. I think you can achieve that by uninstalling opencl. Judging by the stderr output, if the special app runs instead the original CUDA60 app, it will run on the designated GPU. Edit - that was a bit harsh. I'm feeling better now I've had a bite of lunch. Don't let me stop anybody else testing this aspect of Retvari's suggestion, if they're prepared to sacrifice their OpenCL driver. However, it depends whether the Linux Cuda app was compiled against the modern API or not. If it's even modestly modern, it'll tell BOINC to use the modern calls, and we're back at square 1. I'll check it out if the server ever chooses to send me Cuda work, but so far it's alternating between sah and SoG. |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Even in "stock" the approximate ratio of how fast a GPU task is processed and how fast a CPU task is still the famous "Gpu's will run 3X or more" times faster than CPU's. It's looking like I am getting 1.5 to 2.5 hours on the cpu tasks (down from around 1 hour to 1.5 hours). And upwards to 8 minutes+ on the gpu tasks (down from 1.5 minutes to 3.5 minutes, mostly 1.5 minutes). So I am crunching Seti@Home with "all my might" (and one hand tied behind my back). :) Tom A proud member of the OFA (Old Farts Association). |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Running stock apps and the systems are crunching again. I don't remember Cuda50 being so painful, but at least they're downloading. . . We quickly get spoiled by the faster apps that have superseded cuda50. Stephen :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.