Message boards :
Number crunching :
Panic Mode On (113) Server Problems?
Message board moderation
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 37 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
No Rescheduler will move a task that is in the process of being crunched. So this warning is not valid.The warning would also apply to any computer reboot (for whatever reason), which will demand careful management of the shutdown/restart process. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
No Rescheduler will move a task that is in the process of being crunched. So this warning is not valid. I was not talking about that. What i was warning is when the rescheduler ends the active tasks (the ones who are crunching at that moment) Sometimes, be aware not allways, when the rescheduler ends the process and the WU is crunching (normaly at the end of the process) and when the crunching process is restarted, after rescheduling the other tasks, the task enter in some kind of limbo (not know a better word to explain) and it stop to crunch on the GPU and is sended to crunch on the CPU. A msg is generated on the task stderr telling about that. Not remember exactly the msg but ell something like "this pot will be process on the CPU" . In this case the task starts to be crunched on the CPU and will end on a error due the long processing time difference (<2 min on the GPU vs About an hr on the CPU). For me it's hard to explain sorry but i find an example: https://setiathome.berkeley.edu/result.php?resultid=7036454876 <core_client_version>7.4.44</core_client_version> <![CDATA[ <message> aborted by user </message> <stderr_txt> setiathome_CUDA: Found 4 CUDA device(s): Device 1: GeForce GTX 1080 Ti, 11178 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 28 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 1080 Ti, 11178 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 28 pciBusID = 3, pciSlotID = 0 Device 3: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 15 pciBusID = 1, pciSlotID = 0 Device 4: GeForce GTX 1070, 8118 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 15 pciBusID = 4, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 1080 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 1080 Ti Using pfb = 32 from command line args Unroll autotune 28. Overriding Pulse find periods per launch. Parameter -pfp set to 28 setiathome v8 enhanced x41p_V0.97b2, Cuda 9.20 special Compiled with NVCC, using static libraries. Modifications done by petri33 and released to the public by TBar. Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.009971 Sigma 66 Sigma > GaussTOffsetStop: 66 > -2 Thread call stack limit is: 1k Pulse: peak=1.618832, time=45.86, period=2.334, d_freq=8507364409.96, score=1.005, chirp=12.084, fft_len=1024 Pulse: peak=2.241779, time=45.86, period=4.099, d_freq=8507366942.69, score=1.001, chirp=12.236, fft_len=1024 Triplet: peak=12.25685, time=47.08, period=24.96, d_freq=8507362703.79, chirp=-24.472, fft_len=64 Pulse: peak=4.555198, time=45.84, period=10.72, d_freq=8507360686.14, score=1.001, chirp=-24.778, fft_len=512 Triplet: peak=11.89313, time=42.77, period=29.39, d_freq=8507368499.05, chirp=-27.837, fft_len=1024 Pulse: peak=1.695994, time=45.82, period=2.573, d_freq=8507368463.68, score=1.008, chirp=-32.119, fft_len=256 Pulse: peak=5.521799, time=45.9, period=11.81, d_freq=8507368132.32, score=1.019, chirp=32.541, fft_len=2k Triplet: peak=11.37938, time=25.03, period=6.129, d_freq=8507361389.06, chirp=-34.26, fft_len=32 Pulse: peak=4.671387, time=45.84, period=11.18, d_freq=8507360782.25, score=1.025, chirp=-34.872, fft_len=512 Pulse: peak=1.670215, time=45.84, period=2.807, d_freq=8507366596.34, score=1.01, chirp=-35.79, fft_len=512 Triplet: peak=11.14148, time=32.66, period=27.32, d_freq=8507369540.28, chirp=38.543, fft_len=256 Pulse: peak=9.656481, time=46.17, period=26.49, d_freq=8507370526.14, score=1.051, chirp=-53.924, fft_len=8k Pulse: peak=3.450325, time=45.84, period=6.297, d_freq=8507368261.54, score=1.039, chirp=56.133, fft_len=512 Pulse: peak=4.771037, time=45.82, period=9.306, d_freq=8507367791.5, score=1.008, chirp=-62.404, fft_len=128 Pulse: peak=7.937146, time=45.86, period=18.4, d_freq=8507361460.06, score=1.023, chirp=62.786, fft_len=1024 Pulse: peak=2.265188, time=45.86, period=3.909, d_freq=8507367626.89, score=1.013, chirp=-63.016, fft_len=1024 Pulse: peak=4.566975, time=45.84, period=11.22, d_freq=8507371236.36, score=1.002, chirp=68.368, fft_len=512 Pulse: peak=3.375987, time=45.84, period=7.147, d_freq=8507371715.3, score=1.013, chirp=75.404, fft_len=512 Pulse: peak=4.062914, time=45.9, period=8.814, d_freq=8507370700.05, score=1.11, chirp=-83.167, fft_len=2k Pulse: peak=6.64492, time=45.86, period=15.99, d_freq=8507365116.11, score=1.028, chirp=85.728, fft_len=1024 setiathome_CUDA: Found 4 CUDA device(s): Device 1: GeForce GTX 1080 Ti, 11178 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 28 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 1080 Ti, 11178 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 28 pciBusID = 3, pciSlotID = 0 Device 3: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 15 pciBusID = 1, pciSlotID = 0 Device 4: GeForce GTX 1070, 8118 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 15 pciBusID = 4, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 1080 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 1080 Ti Using pfb = 32 from command line args Unroll autotune 28. Overriding Pulse find periods per launch. Parameter -pfp set to 28 Restarted at 61.11 percent, with setiathome enhanced x41p_V0.97b2, Cuda 9.20 special Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Sigma 66 Sigma > GaussTOffsetStop: 66 > -2 Thread call stack limit is: 1k Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU... err = 1 </stderr_txt> ]]> As you could see the process of the WU was interrupted by the rescheduler and when it return it was redirected to the CPU. In this case, since i know how to quickly identify the error i manualy check for any "limbo" WU after i run the rescheduler and abort the WU process if that heppening. Who to indentify? The crunching timer increaces and the crunched % remains with no change for some long time. If you neeed more examples, go to my error WU and look all manualy aborted. Very few since i allready learn a to avoid the trouble, just not run the rescheduler when any WU is more than 2/3 crunched. Wait to start the rescheduler until it finish. Need some practice in the 4xGPU enviroment like ours. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
What rescheduler are you using that can stop BOINC? I am not aware of any. All of the reschedulers that I have used, and I have used them all at one time always state you have to stop BOINC first before rescheduling. So it is up to you to stop processing first before running a rescheduler. The normal caveats of the special app apply of course. Don't suspend them or stop them midway. Try to always stop them before they have written any checkpoints into the slots. And always make sure any finished tasks have fully reported and uploaded before stopping BOINC. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
It just one of those caveats with rescheduling; You have to pay attention to what's going on. I you have your checkpoints set to longer than any GPU tasks ever run it shouldn't be an issue. Or if you have it set less than that you have watch that you don't shutdown BIONC when a GPU task has had a chance to drop a checkpoint. If you don't watch what is going on sooner of latter you will get burnt. I also suspend my pending GPU tasks and let it complete what is being worked on before shutting it down. Which is petty easy if you also use BoincTasks. It is extra work, but worth it in the end. To be honest the checkpoint function should really be completely disabled now in the 'sauce.' Even for my 750Ti's it would be a great loss to lose what has been process. And I do see the it would be almost impossible to have a reliable checkpoint with synchronous processing where there are upwards of dozens of 'in progress' process each with their own set of info ... and what if the task restarts on a GPU with a different number of CU ... possibility are endless. My Ryzen frequently has checkpoint problems with all the crashes it goes through, so I have to pay attention to it on each reboot. It a GPU task starts and doesn't do anything in the first 3 minutes ... a Suspend/ClearSlot/Resume is need or it will sit there until it times out in 20 or so minutes. EDIT: I am not going to fix all my typos, just read around them :D |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
It would be perfectly possible for a rescheduler to stop and restart BOINC - I would be extremely surprised if nobody has nicked the code from knabench by now. It includes the code for retrieving the installation paths from the registry. (I'm talking Windows, of course - Linux users will have to roll their own, as always). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, I was mistaken. I see that it is possible with Jeff's rescheduler. It is in the readme as possible but untested in the early versions. I have always played safe and stopped BOINC on my own terms before rescheduling. Haven't run into any issues that way either. I have used its restart BOINC dialog after it has finished rescheduling at times with no troubles too. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I have always played safe and stopped BOINC on my own terms before rescheduling. Just curiosity, what happening if you manualy stopped the WU in the middle of the crunching process? Could that leave to the same issue? Or is different? |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
Install a 7zip application, it is similar to WinZip.As I said, of course I tried all this. Both in Linux and Windows (with... 7-Zip, which should suit your advice and that I have been using regularly for years and which usually opens any kind of archive). No joy. So I would be grateful if anyone could advise me about not just "a" 7Zip application, but "the" 7zip application which works with these files, since I have already tested several in both worlds with no avail before asking. Thank you in advance. Sleepy |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
<Scratching head> So you have the downloads but they won't open? That is strange, maybe they were saved HTML pages by mistake? It is just an thought ... I just looked at a couple of RAW 7zip files in notepad, and they seem to start with the characters "7z" followed by the compressed 'garbage' text. It might be worth look at the RAW file to see if that is what you have. Again, just a thought ... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874 |
The normal tool would be 7-zip, but like all software, it has undergone revisions over the years, and sometimes those revisions add newer, faster and/or more efficient modes of compression. If you haven't done so recently, I would download a fresh copy from https://www.7-zip.org/download.html, and see if that can handle the files. |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
<Scratching head> So you have the downloads but they won't open? That is strange, maybe they were saved HTML pages by mistake? It is just an thought ...Dear Brent, you won the prize. To shield myself from the messages about the failed Drive attempts to preview the files, after the first downloads I began to make direct saves from the links. And for some reasons, I actually downloaded the HTML code instead of the files. And that seemed to happen only for the 7zip files, hence my doubts. So, the simple remedy would downloading the files again, but as everybody know, they are presently unavailable. In any case, I am not the one downloading the files 100,000 times yesterday! ;-) Sleepy |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Dear Brent,OMG, I won, finally a toaster all to myself :D |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
OMG, I won, finally a toaster all to myself :DIt is flying to you as I type! ;-) https://www.youtube.com/watch?v=0Cm7tv5cM8g Sleepy |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13864 Credit: 208,696,464 RAC: 304 |
OK, we're back again. For now. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I have always played safe and stopped BOINC on my own terms before rescheduling. If you have your checkpoints sufficiently long enough to be greater than the crunching time of the task, then the task just starts over from zero when you restart BOINC. No harm, no foul. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, we're back again. Yes, the daily glitch was causing the site to hang for minutes and then timeout. Looks like that is over now. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It would be perfectly possible for a rescheduler to stop and restart BOINC - I would be extremely surprised if nobody has nicked the code from knabench by now. It includes the code for retrieving the installation paths from the registry. (I'm talking Windows, of course - Linux users will have to roll their own, as always). . . Stubbles script does that. Stops and restarts BOINC. But for some reason the restart does not work with the later version of BOINC and I need to do that manually. Stephen <shrug> |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The SSP doesn't appear to be over it. I'm not seeing a thing. https://setiathome.berkeley.edu/show_server_status.php Is it generating tasks...or not. +++++++++++++++++++++++++ That seemed to work. Now it's back. ----------------------------- Now gone again.... |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes, I was mistaken. I see that it is possible with Jeff's rescheduler. It is in the readme as possible but untested in the early versions. I have always played safe and stopped BOINC on my own terms before rescheduling. Haven't run into any issues that way either. I have used its restart BOINC dialog after it has finished rescheduling at times with no troubles too. . . With Linux and the checkpoint issue it is best to do it manually. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I have always played safe and stopped BOINC on my own terms before rescheduling. . . If it has made a checkpoint it will resume and then it is a lottery whether or not it will fail. If it has not made a checkpoint it will restart from scratch and will not fail. Stephen :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.