Panic Mode On (93) Server Problems?

Author	Message
TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1615919 - Posted: 18 Dec 2014, 19:31:47 UTC - in response to Message 1615917. So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait? I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet. BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used. So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers". There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks. So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs? Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared. My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs. Why? To be honest, I don't know. But then again, I'm not the systems analyst responsible for designing a system that distributes viable workunits across a mixed fleet of ~150,000 active computers. I simply observe that the current processing rate (returned results) is very much in line with the long term average - so the project as a whole is working as required. Well, we're both in the same boat then. There isn't any reason for the CPUs to NOT have a Work Fetch Deferral Interval while the GPUs have a 40 minute Interval. I suspect it's related to why the Server will attempt to Fill the CPU cache First though, even while the GPUs are out of work. I believe there's a thread about that around here... ID: 1615919 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1615921 - Posted: 18 Dec 2014, 19:39:17 UTC - in response to Message 1615919. So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait? I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet. BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used. So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers". There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks. So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs? Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared. My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs. Why? To be honest, I don't know. But then again, I'm not the systems analyst responsible for designing a system that distributes viable workunits across a mixed fleet of ~150,000 active computers. I simply observe that the current processing rate (returned results) is very much in line with the long term average - so the project as a whole is working as required. Well, we're both in the same boat then. There isn't any reason for the CPUs to NOT have a Work Fetch Deferral Interval while the GPUs have a 40 minute Interval. I suspect it's related to why the Server will attempt to Fill the CPU cache First though, even while the GPUs are out of work. I believe there's a thread about that around here... Yes, that's correct. The specific reason why you have a work fetch deferral interval of 40 minutes on the GPUs is that the last four successive requests for GPU work received a 'no work allocated' reply. I have a 76,800 second (21 hour) deferral on one of my CPU projects, for much the same reason (it's out of work). You'll probably find that you received work in response to your most recent CPU request, if you look back that far. ID: 1615921 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1615925 - Posted: 18 Dec 2014, 19:47:18 UTC - in response to Message 1615921. Last modified: 18 Dec 2014, 19:51:28 UTC Yes, here it is; AstroPulse Work Fetch Thread Would Someone PLEASE change the server back to attempting to fill the GPU cache First. The way it was before late September... It's now up to an 80 minute Interval for the GPUs while the CPUs don't have a deferral. ID: 1615925 ·

OTS Volunteer tester Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0	Message 1615937 - Posted: 18 Dec 2014, 20:16:19 UTC - in response to Message 1615899. So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait? I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet. BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used. Pre BOINC v7 doesn't have the same high delay. It's just silly. Using Linux I can have cron run a script every six minutes that will tail the last line of stdoutdae.txt and if there has been no change in the last line since the last run, issue the update command which forces a contact that results in new work or at least a statement that there is none available. Perhaps Windows users can have the Task Scheduler run a batch file every so often to accomplish the same thing. ID: 1615937 ·

S@NL Etienne Dokkum Volunteer tester Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0	Message 1615949 - Posted: 18 Dec 2014, 20:43:20 UTC well, that's it... back to 100% MB. If the servers are so kind to throw me a couple of GPU tasks in the process ID: 1615949 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1615968 - Posted: 18 Dec 2014, 21:56:12 UTC Hmmm, just got the Seti 'why bother?' / request donation email in. The first paragraph being: SETI@home has been running for over a decade, harnessing the power of millions of computers around the world in the search for extraterrestrial intelligence. We've observed for thousands of hours on some of the world's largest telescopes. Volunteers like you have donated immense amounts of computing time. And many of you have also donated money, or donated your time to help other users on our online forums. Yet we've found nothing. Zip. Not a peep from ET in all those terabytes of data. So why bother? Uh hold on... we've found nothing? So does that mean that Nitpicker works and has been going through those terabytes of compiled data? Why weren't we informed? ID: 1615968 ·

ReiAyanami Send message Joined: 6 Dec 05 Posts: 116 Credit: 222,900,202 RAC: 174	Message 1615979 - Posted: 18 Dec 2014, 22:19:49 UTC Last modified: 18 Dec 2014, 22:21:08 UTC Oh, No! Now I'm out of GPU work. What's going on? Panic Mode is definitely ON. ID: 1615979 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1616034 - Posted: 19 Dec 2014, 1:20:38 UTC - in response to Message 1615979. As well it should be. ID: 1616034 ·

OTS Volunteer tester Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0	Message 1616059 - Posted: 19 Dec 2014, 4:07:43 UTC - in response to Message 1615951. The "Results out in the field" for AP's are falling at every update of the SSP, despite that the AP splitters shows as running. They seem to be going up now, but ever so sloooowly. [As of 19 Dec 2014, 1:00:05 UTC it was 32,493 [As of 19 Dec 2014, 4:00:05 UTC it is 33,310 ID: 1616059 ·

EdwardPF Volunteer tester Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374	Message 1616065 - Posted: 19 Dec 2014, 4:18:56 UTC Last modified: 19 Dec 2014, 4:24:21 UTC I'll just throw this out to see how bad my math is ... A typical AP WU is about 8195 KB or about 8MB The difference between pre-AP Cricket stats and post-AP going-to-us (in) Cricket stats is about 75Mb/sec and 250Mb/sec ... now let me see 250 - 75 --> 175 Mb/sec for outgoing (in) AP WU's this is about 140 MB/sec ... 140 MB per sec at 8Mb per WU is about 17.5 AP WU going out to us per sec. Is this somewhat near what is going on with the AP creation now?? if not ... is there a better way to get AP WU creation rates?? Ed F edit [As of 19 Dec 2014, 1:00:05 UTC it was 32,493 [As of 19 Dec 2014, 4:00:05 UTC it is 33,310 or 272 WU/hr going out more than coming in or .075 per sec ID: 1616065 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1616072 - Posted: 19 Dec 2014, 4:48:21 UTC - in response to Message 1616065. If I were a cynical person, I would postulate that they installed a new protocol where you have to download X amount of Multibeams in order to get 1 Astropulse... But, that's just me.... But it really does seem like that doesn't it?? ID: 1616072 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1616075 - Posted: 19 Dec 2014, 5:22:11 UTC - in response to Message 1616065. I'll just throw this out to see how bad my math is ... A typical AP WU is about 8195 KB or about 8MB The difference between pre-AP Cricket stats and post-AP going-to-us (in) Cricket stats is about 75Mb/sec and 250Mb/sec ... now let me see 250 - 75 --> 175 Mb/sec for outgoing (in) AP WU's this is about 140 MB/sec ... 140 MB per sec at 8Mb per WU is about 17.5 AP WU going out to us per sec. Is this somewhat near what is going on with the AP creation now?? ... The Cricket graphs are in Megabits per second, while an AP WU is about 8 Megabytes. So the 140 Mbps is about 17.5 MB/sec, divided by 8MB gives around 2.2 AP WU per second. That may be a reasonable approximation, it's about half the creation rate we saw when AP v6 splitting was going well. Joe ID: 1616075 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1616085 - Posted: 19 Dec 2014, 6:18:32 UTC You're pretty much much not going to see any high production out of the AP splitters while 3 or 4 of them are working on 1 file as they've been doing for the last few months. Currently file 26my14ab has 4 splitters tied up. Cheers. ID: 1616085 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1616113 - Posted: 19 Dec 2014, 7:11:35 UTC - in response to Message 1616072. If I were a cynical person, I would postulate that they installed a new protocol where you have to download X amount of Multibeams in order to get 1 Astropulse... But, that's just me.... But it really does seem like that doesn't it?? The amount of AP work split has always been a fraction of the MB work. As Wiggo mentioned, multiple splitters working on the one file reduces the output further. Grant Darwin NT ID: 1616113 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1616148 - Posted: 19 Dec 2014, 8:23:42 UTC Well, if you do not have enough work, Pirates is dealing out work again as well. Only caveats: max 1 task per CPU core no matter the cache setting you have; a one hour scheduler wait between contacts. Try to force a new contact? The hour resets. Remember, Pirates is for fun, not for credits. ID: 1616148 ·

Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3	Message 1616172 - Posted: 19 Dec 2014, 9:32:33 UTC Last modified: 19 Dec 2014, 9:36:35 UTC The cricket graph for the month Is Seti experiencing a DDOS attack of some sort seems to be to high the incoming part of the graph ??? 24 hrs and still not even 1 GPU . 2 GPU's sitting there doing F all because of no work at all gggggggrrrrrrrrrr ID: 1616172 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1616175 - Posted: 19 Dec 2014, 9:41:29 UTC - in response to Message 1616172. The cricket graph for the month Is Seti experiencing a DDOS attack of some sort seems to be to high the incoming part of the graph ??? 24 hrs and still not even 1 GPU . 2 GPU's sitting there doing F all because of no work at all gggggggrrrrrrrrrr The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-) [edit] We're also having a VLAR storm which means that GPU work will suffer. Cheers. ID: 1616175 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616176 - Posted: 19 Dec 2014, 9:45:35 UTC - in response to Message 1616175. [edit] We're also having a VLAR storm which means that GPU work will suffer. Cheers. Also a high proportion of VHARs overnight, which means that the feeder cache gets sucked dry more quickly and more often (at least the non-VLAR tasks in the cache). ID: 1616176 ·

Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3	Message 1616185 - Posted: 19 Dec 2014, 10:01:18 UTC - in response to Message 1616175. The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-) Now i'm confused if the green line is bits'IN and the blue is bit's out how can the green be work going out to us i would have thought it's the blue line sending stuff out to us and the green is what is coming in to them ? or is it misleading the legend symbols at the bottom of the graph ?? this is the other cricket graph daily which shows buggaer all coming in and bugger all going out so i'm comfused ID: 1616185 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1616190 - Posted: 19 Dec 2014, 10:10:35 UTC - in response to Message 1616185. The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-) Now i'm confused if the green line is bits'IN and the blue is bit's out how can the green be work going out to us i would have thought it's the blue line sending stuff out to us and the green is what is coming in to them ? The cricket graphs are a network management tool designed for the benefit of the campus network management team. They show the data from the point of view of the routing hardware between us and the SETI servers. The particular port we most commonly monitor shows our uploads leaving the router on their onward journey to the servers (hence outbound from the router - blue), and our downloads arriving from the servers (hence inbound - green) ready to be forwarded on to us. ID: 1616190 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.