Panic Mode On (106) Server Problems?

Author	Message
kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1874839 - Posted: 24 Jun 2017, 11:34:31 UTC Last modified: 24 Jun 2017, 11:40:12 UTC Ruh roh, astrokitty............. Got a download that won't start. Project communication failed. Uploads still OK. Hope it's an isolated incident. Meow. EDIT........ And of course, although it went through lots of retries and failed, as soon as I post about it...voila. I will say that NV GPU work has either been in short supply, or the scheduler is not readily handing it out. I am just 27 tasks short of full cache across 5 rigs, so not a big problem here. I have noticed over the last few days that my cache runs down for a while and then the rigs seem to get big hauls to fill it back up again. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1874839 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874865 - Posted: 24 Jun 2017, 13:36:55 UTC - in response to Message 1874839. EDIT........ And of course, although it went through lots of retries and failed, as soon as I post about it...voila. I will say that NV GPU work has either been in short supply, or the scheduler is not readily handing it out. I am just 27 tasks short of full cache across 5 rigs, so not a big problem here. I have noticed over the last few days that my cache runs down for a while and then the rigs seem to get big hauls to fill it back up again. . . Here I am going through long periods with no downloads regardless of how many results I upload, then suddenly I will get some, but only 10 to 30 depending on which machine it is. Stephen :( ID: 1874865 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874879 - Posted: 24 Jun 2017, 15:57:00 UTC - in response to Message 1874865. Similar experiences across all machines. I can be a much as 100 tasks down from full, then I will slowly refill back to full. I ONLY EVER get downloads in maximum batches of 20 though. I always wonder how some people report that they get a single download of as many as 50. What are they doing differently? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874879 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1874881 - Posted: 24 Jun 2017, 16:05:21 UTC - in response to Message 1874879. I seen my 1080s fill up with (I think) 83 and 107 for downloads, then a few 10s-20s to complete. ID: 1874881 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874882 - Posted: 24 Jun 2017, 16:11:13 UTC - in response to Message 1874881. Do you have any other than normal settings in cc_config? Like max_transfer or max_transfer_per_project? Have you always received those download batch sizes at start of refill? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874882 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1874883 - Posted: 24 Jun 2017, 16:20:51 UTC - in response to Message 1874882. I wouldn't say it's uncommon to see 60-120 tasks come in after say maintenance (if they are available) Don't think I have anything special set, let me look. Pertaining to transfers... <fetch_minimal_work>0</fetch_minimal_work> <fetch_on_update>1</fetch_on_update> <max_file_xfers>8</max_file_xfers> <max_file_xfers_per_project>4</max_file_xfers_per_project> <max_tasks_reported>0</max_tasks_reported> <no_info_fetch>0</no_info_fetch> <report_results_immediately>0</report_results_immediately> But then too, I'm also asking for more tasks with my requests most times. ID: 1874883 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874884 - Posted: 24 Jun 2017, 16:35:18 UTC - in response to Message 1874883. Hi Brent, thanks for the snippet of your cc_config. I see a couple of things different than mine. Major one is fetch_on_update. I have mine at default 0. The only other thing is the max_file_transfers_per_project at 4 where I have mine at stock 2. I'm going to change mine and observe any differences. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874884 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1874885 - Posted: 24 Jun 2017, 16:50:46 UTC - in response to Message 1874884. Last modified: 24 Jun 2017, 16:56:06 UTC I think I see a reason, and it's probably due to all your rescheduling. Your GPU flops is only 247 compared to my 1,141, or 17%. So tasks/hour of requested time will only be 17% of mine. EDIT: My 980+750Ti is 520 Gflops. EDIT2: that's for your 2x1070 Ryzen, didn't look at the others. ID: 1874885 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874893 - Posted: 24 Jun 2017, 18:22:57 UTC - in response to Message 1874885. Even before I began rescheduling I never got much past 280 Gflops on my gpus for MB's. I haven't been able to reschedule in over a week lately because of the task mix lately. Haven't seen a Arecibo shorty on any of the CPUs in over a week. I only rescheduled if I get some Arecibo shorties on the CPUs that process faster on the gpus. It used to be I would get a predominant slug of Arecibo shorties on the Ryzen CPU and the rest a mix of VLARs and BLC tasks. I found that the Ryzen especially likes BLC tasks on the CPU and I would move them off the gpus which do worse on BLC than Arecibo. But again, no sign of any shorties so can't use the rescheduler. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874893 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874894 - Posted: 24 Jun 2017, 18:26:33 UTC I also see that you are running Linux and Petri's app so that alone will slew in your favor on GFLOPS compared to me only on Windows and the SoG app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874894 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1874895 - Posted: 24 Jun 2017, 19:24:01 UTC - in response to Message 1874894. Something is causing that number to be artificially low. Some quick looking for reference. Your Ryzen 2x1070 (gets a ++ for CPU output) - 263 Gflops, 65k RAC My AMD on Linux, single 750Ti (no CPU tasks, only AP on CPU) - 281 Gflops, 15k RAC Wiggo's i5w/2x1060 -393 Gflops, 46k RAC Your Gflops are in the mud compared to others, and that is what the server uses to determine how many tasks to send you. So no wonder it only sends 20% of what you need to be full, and just get trickles. None of your computers Gflops come close to Wiggo's for GPU output on MB, but you RAC is way higher. The only thing I can think of is the rescheduling, or amount of time not spent on Seti. ID: 1874895 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874898 - Posted: 24 Jun 2017, 20:34:48 UTC - in response to Message 1874895. Something is causing that number to be artificially low. Some quick looking for reference. Your Ryzen 2x1070 (gets a ++ for CPU output) - 263 Gflops, 65k RAC My AMD on Linux, single 750Ti (no CPU tasks, only AP on CPU) - 281 Gflops, 15k RAC Wiggo's i5w/2x1060 -393 Gflops, 46k RAC Your Gflops are in the mud compared to others, and that is what the server uses to determine how many tasks to send you. So no wonder it only sends 20% of what you need to be full, and just get trickles. None of your computers Gflops come close to Wiggo's for GPU output on MB, but you RAC is way higher. The only thing I can think of is the rescheduling, or amount of time not spent on Seti. I know that APR takes a long time to change or stabilize. The only thing I can think of is your comment about time NOT SPENT on SETI. I looked at Wiggo's APR and see that it is higher on his 1060's. I also see that he has only worked for SETI for a very long time since his average credits for his other joined projects is 0. I do work for other projects, Einstein has the next highest resource share from SETI followed by a minimal resource share on MilkyWay. As stated earlier, its been a while since I did any rescheduling, so it is likely my APR being in the mud is because of resource share. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874898 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874903 - Posted: 24 Jun 2017, 22:00:56 UTC - in response to Message 1874879. Similar experiences across all machines. I can be a much as 100 tasks down from full, then I will slowly refill back to full. I ONLY EVER get downloads in maximum batches of 20 though. I always wonder how some people report that they get a single download of as many as 50. What are they doing differently? . . I don't think they are. 4 machines here and all behave differently. Two Linux, 2 windows, all side by side but one is getting consistent work (but only in 2s and 3s) and the others are not. Both Linux machines are configured pretty much the same, except the BOINC version on the one getting the work is the repository version. But it is the same BOINC release as the second machine. . . If there is a formula for getting work consistently and in single downloads I wish I knew what it is. Stephen :( ID: 1874903 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1874907 - Posted: 24 Jun 2017, 22:18:11 UTC - in response to Message 1874895. Your Gflops are in the mud compared to others, and that is what the server uses to determine how many tasks to send you. I thought it was just based on the difference between cache setting & work on hand? My faster machine is the one that has the most difficulty getting work, and it has a much, much , much higher APR than my slow machine. The present work issues are affecting both of my machines, and unlike the usual problem, flipping application preferences isn't having any effect. There's still a lot of Arecibo VLAR work coming through, and other than 2 batches this morning of GBT work, GBT work has been noticeable in it's absence during this particular bout of difficulty getting work. Grant Darwin NT ID: 1874907 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1874908 - Posted: 24 Jun 2017, 22:21:55 UTC - in response to Message 1874898. Last modified: 24 Jun 2017, 22:39:42 UTC I know that APR takes a long time to change or stabilize. The only thing I can think of is your comment about time NOT SPENT on SETI. I looked at Wiggo's APR and see that it is higher on his 1060's. I also see that he has only worked for SETI for a very long time since his average credits for his other joined projects is 0. I do work for other projects, Einstein has the next highest resource share from SETI followed by a minimal resource share on MilkyWay. As stated earlier, its been a while since I did any rescheduling, so it is likely my APR being in the mud is because of resource share. APR actually settles very quickly, usually just a matter of hours for faster GPUs. Several days for slower/ low core count CPUs. If you run more than 1 WU at a time on your GPU, your APR value will be way down compared to those that only run 1WU at a time. APR is based on the time to process a WU, running 2 at a time it takes longer to process the WUs, hence the lower APR. EDIT- changed a higher to lower. Grant Darwin NT ID: 1874908 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874909 - Posted: 24 Jun 2017, 22:25:08 UTC - in response to Message 1874907. Your Gflops are in the mud compared to others, and that is what the server uses to determine how many tasks to send you. I thought it was just based on the difference between cache setting & work on hand? My faster machine is the one that has the most difficulty getting work, and it has a much, much , much higher APR than my slow machine. The present work issues are affecting both of my machines, and unlike the usual problem, flipping application preferences isn't having any effect. There's still a lot of Arecibo VLAR work coming through, and other than 2 batches this morning of GBT work, GBT work has been noticeable in it's absence during this particular bout of difficulty getting work. See, that is what I thought also. When you set work_fetch_debug, the output shows the shortfall in seconds between what your cache setting is and what you have on hand for both CPU and GPU. If I understand the mechanism correctly, that means at every request for work, the schedulers should send you the necessary seconds of work to get back to your cache setting. Depending on work availability of course. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874909 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1874911 - Posted: 24 Jun 2017, 22:29:27 UTC - in response to Message 1874908. APR actually settles very quickly, usually just a matter of hours for faster GPUs. Several days for slower/ low core count CPUs. If you run more than 1 WU at a time on your GPU, your APR value will be way down compared to those that only run 1WU at a time. APR is based on the time to process a WU, running 2 at a time it takes longer to process the WUs, hence the higher APR. Well that is another factor against me because I run two tasks per GPU card. I believe when you run Petri's app, it requires a single task per card. I don't know how many concurrent tasks per card, if any, that Wiggo runs on his 1060s. I am trying to compare apples to apples here, both machines on Windows7 for example and not include the outliers of Linux machines and Petri's app. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1874911 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34748 Credit: 261,360,520 RAC: 489	Message 1874912 - Posted: 24 Jun 2017, 22:45:43 UTC - in response to Message 1874911. APR actually settles very quickly, usually just a matter of hours for faster GPUs. Several days for slower/ low core count CPUs. If you run more than 1 WU at a time on your GPU, your APR value will be way down compared to those that only run 1WU at a time. APR is based on the time to process a WU, running 2 at a time it takes longer to process the WUs, hence the higher APR. Well that is another factor against me because I run two tasks per GPU card. I believe when you run Petri's app, it requires a single task per card. I don't know how many concurrent tasks per card, if any, that Wiggo runs on his 1060s. I am trying to compare apples to apples here, both machines on Windows7 for example and not include the outliers of Linux machines and Petri's app. I only run single tasks on my cards Keith. Cheers. ID: 1874912 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1874916 - Posted: 24 Jun 2017, 23:58:53 UTC - in response to Message 1874907. The present work issues are affecting both of my machines, and unlike the usual problem, flipping application preferences isn't having any effect. There's still a lot of Arecibo VLAR work coming through, and other than 2 batches this morning of GBT work, GBT work has been noticeable in it's absence during this particular bout of difficulty getting work. There are times when I wonder if the Scheduler checks for posts in this thread, and uses them to allocate work. After only getting dribs & drabs of Arecibo work after many requests for work over many hours, I then get 2 large batches of GBT work- after posting here about the lack of GBT work. Grant Darwin NT ID: 1874916 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874918 - Posted: 25 Jun 2017, 0:04:51 UTC - in response to Message 1874916. The present work issues are affecting both of my machines, and unlike the usual problem, flipping application preferences isn't having any effect. There's still a lot of Arecibo VLAR work coming through, and other than 2 batches this morning of GBT work, GBT work has been noticeable in it's absence during this particular bout of difficulty getting work. There are times when I wonder if the Scheduler checks for posts in this thread, and uses them to allocate work. After only getting dribs & drabs of Arecibo work after many requests for work over many hours, I then get 2 large batches of GBT work- after posting here about the lack of GBT work. . . Hi Grant . . In that case I really, really need lots and lots of normal AR Arecibo tasks for my GPUs which are running low ..... Stephen :) ID: 1874918 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.