Panic Mode On (93) Server Problems?

Message boards : Number crunching : Panic Mode On (93) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 24 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615919 - Posted: 18 Dec 2014, 19:31:47 UTC - in response to Message 1615917.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared.

My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs.
Why?

To be honest, I don't know. But then again, I'm not the systems analyst responsible for designing a system that distributes viable workunits across a mixed fleet of ~150,000 active computers. I simply observe that the current processing rate (returned results) is very much in line with the long term average - so the project as a whole is working as required.

Well, we're both in the same boat then. There isn't any reason for the CPUs to NOT have a Work Fetch Deferral Interval while the GPUs have a 40 minute Interval. I suspect it's related to why the Server will attempt to Fill the CPU cache First though, even while the GPUs are out of work. I believe there's a thread about that around here...
ID: 1615919 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1615921 - Posted: 18 Dec 2014, 19:39:17 UTC - in response to Message 1615919.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

So if your machine runs out of work you're SOL? My other 2 machines don't have this interval and are receiving work. Why is my Best machine being punished? Why is the normal 5 minutes not enough? The 5 minute delay Works, then it just sits there. One would think 5 minutes is enough to prevent "DOSing the servers".

There is a 24 hr limit to the backoff. Which is a lot better than DA original intended limit of 2 weeks.

So tell me why the CPUs don't have this Delay. That's right, the CPUs don't have a delay, just the GPUs. So your attempted explanation fails. Can someone tell me why there is a 40 minute delay on the GPUs but Not the CPUs?

Both CPUs and GPUs have the same backoff rules, as can be seen by turning on <work_fetch_debug>. But you would need to study a stable configuration over time, to see when backoffs are applied and when they are cleared.

My CPUs are Not showing any Work Fetch Deferral Interval. The GPUs are. I'll bet if I increase the cache setting I will receive CPU tasks with mixed VLARS & non-VLARS, been there done that. But the server is refusing to send those same non-VLARs to my GPUs.
Why?

To be honest, I don't know. But then again, I'm not the systems analyst responsible for designing a system that distributes viable workunits across a mixed fleet of ~150,000 active computers. I simply observe that the current processing rate (returned results) is very much in line with the long term average - so the project as a whole is working as required.

Well, we're both in the same boat then. There isn't any reason for the CPUs to NOT have a Work Fetch Deferral Interval while the GPUs have a 40 minute Interval. I suspect it's related to why the Server will attempt to Fill the CPU cache First though, even while the GPUs are out of work. I believe there's a thread about that around here...

Yes, that's correct. The specific reason why you have a work fetch deferral interval of 40 minutes on the GPUs is that the last four successive requests for GPU work received a 'no work allocated' reply. I have a 76,800 second (21 hour) deferral on one of my CPU projects, for much the same reason (it's out of work). You'll probably find that you received work in response to your most recent CPU request, if you look back that far.
ID: 1615921 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1615925 - Posted: 18 Dec 2014, 19:47:18 UTC - in response to Message 1615921.  
Last modified: 18 Dec 2014, 19:51:28 UTC

Yes, here it is; AstroPulse Work Fetch Thread

Would Someone PLEASE change the server back to attempting to fill the GPU cache First. The way it was before late September...

It's now up to an 80 minute Interval for the GPUs while the CPUs don't have a deferral.
ID: 1615925 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1615937 - Posted: 18 Dec 2014, 20:16:19 UTC - in response to Message 1615899.  


So, why is my out of work machine having to wait 20 minutes between useless requests when my other 2 machines aren't having to wait?
I just checked it again, now it's up to a 40 minute interval, while the GPUs are quiet.

BOINC has an incremental delay built in for failed response. This is to avoid DOSing the servers. It the same no matter which BOINC version or OS used.

Pre BOINC v7 doesn't have the same high delay. It's just silly.



Using Linux I can have cron run a script every six minutes that will tail the last line of stdoutdae.txt and if there has been no change in the last line since the last run, issue the update command which forces a contact that results in new work or at least a statement that there is none available. Perhaps Windows users can have the Task Scheduler run a batch file every so often to accomplish the same thing.
ID: 1615937 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1615949 - Posted: 18 Dec 2014, 20:43:20 UTC

well, that's it... back to 100% MB. If the servers are so kind to throw me a couple of GPU tasks in the process
ID: 1615949 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1615968 - Posted: 18 Dec 2014, 21:56:12 UTC

Hmmm, just got the Seti 'why bother?' / request donation email in. The first paragraph being:
SETI@home has been running for over a decade, harnessing the power of millions of computers around the world in the search for extraterrestrial intelligence. We've observed for thousands of hours on some of the world's largest telescopes. Volunteers like you have donated immense amounts of computing time. And many of you have also donated money, or donated your time to help other users on our online forums. Yet we've found nothing. Zip. Not a peep from ET in all those terabytes of data. So why bother?

Uh hold on... we've found nothing? So does that mean that Nitpicker works and has been going through those terabytes of compiled data? Why weren't we informed?
ID: 1615968 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1615979 - Posted: 18 Dec 2014, 22:19:49 UTC
Last modified: 18 Dec 2014, 22:21:08 UTC

Oh, No! Now I'm out of GPU work. What's going on?
Panic Mode is definitely ON.
ID: 1615979 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1616034 - Posted: 19 Dec 2014, 1:20:38 UTC - in response to Message 1615979.  

As well it should be.
ID: 1616034 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1616059 - Posted: 19 Dec 2014, 4:07:43 UTC - in response to Message 1615951.  

The "Results out in the field" for AP's are falling at every update of the SSP, despite that the AP splitters shows as running.


They seem to be going up now, but ever so sloooowly.



[As of 19 Dec 2014, 1:00:05 UTC it was 32,493

[As of 19 Dec 2014, 4:00:05 UTC it is 33,310
ID: 1616059 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1616065 - Posted: 19 Dec 2014, 4:18:56 UTC
Last modified: 19 Dec 2014, 4:24:21 UTC

I'll just throw this out to see how bad my math is ...

A typical AP WU is about 8195 KB or about 8MB

The difference between pre-AP Cricket stats and post-AP going-to-us (in) Cricket stats is about

75Mb/sec and 250Mb/sec ... now let me see

250 - 75 --> 175 Mb/sec for outgoing (in) AP WU's this is about 140 MB/sec ...

140 MB per sec at 8Mb per WU is about 17.5 AP WU going out to us per sec.

Is this somewhat near what is going on with the AP creation now??

if not ... is there a better way to get AP WU creation rates??

Ed F

edit
[As of 19 Dec 2014, 1:00:05 UTC it was 32,493

[As of 19 Dec 2014, 4:00:05 UTC it is 33,310


or 272 WU/hr going out more than coming in or .075 per sec
ID: 1616065 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1616072 - Posted: 19 Dec 2014, 4:48:21 UTC - in response to Message 1616065.  

If I were a cynical person, I would postulate that they installed a new protocol where you have to download X amount of Multibeams in order to get 1 Astropulse...

But, that's just me....

But it really does seem like that doesn't it??
ID: 1616072 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1616075 - Posted: 19 Dec 2014, 5:22:11 UTC - in response to Message 1616065.  

I'll just throw this out to see how bad my math is ...

A typical AP WU is about 8195 KB or about 8MB

The difference between pre-AP Cricket stats and post-AP going-to-us (in) Cricket stats is about

75Mb/sec and 250Mb/sec ... now let me see

250 - 75 --> 175 Mb/sec for outgoing (in) AP WU's this is about 140 MB/sec ...

140 MB per sec at 8Mb per WU is about 17.5 AP WU going out to us per sec.

Is this somewhat near what is going on with the AP creation now??
...

The Cricket graphs are in Megabits per second, while an AP WU is about 8 Megabytes. So the 140 Mbps is about 17.5 MB/sec, divided by 8MB gives around 2.2 AP WU per second. That may be a reasonable approximation, it's about half the creation rate we saw when AP v6 splitting was going well.
                                                                  Joe
ID: 1616075 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1616085 - Posted: 19 Dec 2014, 6:18:32 UTC

You're pretty much much not going to see any high production out of the AP splitters while 3 or 4 of them are working on 1 file as they've been doing for the last few months.

Currently file 26my14ab has 4 splitters tied up.

Cheers.
ID: 1616085 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1616113 - Posted: 19 Dec 2014, 7:11:35 UTC - in response to Message 1616072.  

If I were a cynical person, I would postulate that they installed a new protocol where you have to download X amount of Multibeams in order to get 1 Astropulse...

But, that's just me....

But it really does seem like that doesn't it??

The amount of AP work split has always been a fraction of the MB work. As Wiggo mentioned, multiple splitters working on the one file reduces the output further.
Grant
Darwin NT
ID: 1616113 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1616148 - Posted: 19 Dec 2014, 8:23:42 UTC

Well, if you do not have enough work, Pirates is dealing out work again as well. Only caveats: max 1 task per CPU core no matter the cache setting you have; a one hour scheduler wait between contacts. Try to force a new contact? The hour resets.

Remember, Pirates is for fun, not for credits.
ID: 1616148 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1616172 - Posted: 19 Dec 2014, 9:32:33 UTC
Last modified: 19 Dec 2014, 9:36:35 UTC

The cricket graph for the month



Is Seti experiencing a DDOS attack of some sort seems to be to high the incoming part of the graph ???

24 hrs and still not even 1 GPU . 2 GPU's sitting there doing F all because of no work at all gggggggrrrrrrrrrr
ID: 1616172 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1616175 - Posted: 19 Dec 2014, 9:41:29 UTC - in response to Message 1616172.  

The cricket graph for the month



Is Seti experiencing a DDOS attack of some sort seems to be to high the incoming part of the graph ???

24 hrs and still not even 1 GPU . 2 GPU's sitting there doing F all because of no work at all gggggggrrrrrrrrrr

The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-)

[edit] We're also having a VLAR storm which means that GPU work will suffer.

Cheers.
ID: 1616175 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1616176 - Posted: 19 Dec 2014, 9:45:35 UTC - in response to Message 1616175.  

[edit] We're also having a VLAR storm which means that GPU work will suffer.

Cheers.

Also a high proportion of VHARs overnight, which means that the feeder cache gets sucked dry more quickly and more often (at least the non-VLAR tasks in the cache).
ID: 1616176 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1616185 - Posted: 19 Dec 2014, 10:01:18 UTC - in response to Message 1616175.  

The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-)


Now i'm confused if the green line is bits'IN and the blue is bit's out how can the green be work going out to us i would have thought it's the blue line sending stuff out to us and the green is what is coming in to them ?

or is it misleading the legend symbols at the bottom of the graph ??



this is the other cricket graph daily which shows buggaer all coming in and bugger all going out so i'm comfused
ID: 1616185 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1616190 - Posted: 19 Dec 2014, 10:10:35 UTC - in response to Message 1616185.  

The high blue line areas is work being sent to the Colo for us to crunch whereas the higher green area is AP's finally being put out us Glenn. ;-)


Now i'm confused if the green line is bits'IN and the blue is bit's out how can the green be work going out to us i would have thought it's the blue line sending stuff out to us and the green is what is coming in to them ?

The cricket graphs are a network management tool designed for the benefit of the campus network management team.

They show the data from the point of view of the routing hardware between us and the SETI servers. The particular port we most commonly monitor shows our uploads leaving the router on their onward journey to the servers (hence outbound from the router - blue), and our downloads arriving from the servers (hence inbound - green) ready to be forwarded on to us.
ID: 1616190 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (93) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.