Message boards :
Number crunching :
Panic Mode On (71) Server problems?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next
Author | Message |
---|---|
musicplayer Send message Joined: 17 May 10 Posts: 2430 Credit: 926,046 RAC: 0 |
Hmm. If I go for Seti@home v7 as well (perhaps Seti@home Enhanced becomes double portion then), how about the Graphics Preference (like Minimalist) and Rainbow for the Color Preference? Will it work against Seti@home v7? Also I adjusted Default Computer Location to Home when doing this (it really does not matter). |
Belthazor Send message Joined: 6 Apr 00 Posts: 219 Credit: 10,373,795 RAC: 13 |
It's funny - it's seems like AP 6.01 tasks doesn't counting on the server status page: the number of AP's "out in the field" still decreasing... |
LadyL Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0 |
Hmm. If I go for Seti@home v7 as well (perhaps Seti@home Enhanced becomes double portion then), how about the Graphics Preference (like Minimalist) and Rainbow for the Color Preference? Will it work against Seti@home v7? You can go for MB V7 as much as you like but there are no V7 apps yet, so checking that box doesn't do anything yet (apart from when you only check the V7 box and uncheck the 'accept other work' box, in which case you don't get any tasks.) I'm not the Pope. I don't speak Ex Cathedra! |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
It's funny - it's seems like AP 6.01 tasks doesn't counting on the server status page: the number of AP's "out in the field" still decreasing... Yes, it appears that the code that counts all the AP values has not been updated to include v6. It might be an easy fix, or it might be a nightmare. Most likely an easy fix though. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Ok so whats up with comms between users and Berkeley now? Cant contact server, hung downloads, timeouts etc.. Do we once again have router or network problems in Berkeley? Or is it the feed itself thats gone toes up? Thought those problems had been sorted out, had cood comms for a while, now back to substandard.. Cheers, Cliff, Been there, Done that, Still no damm T shirt! |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Ok so whats up with comms between users and Berkeley now? It's because the communications link out of Berkeley is completely saturated by the AP_v6 rollout. Please bookmark Cricket. I expect it to remain like this for at least a week, maybe longer. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Was just looking through my task list and saw an AP_v505 that was inconclusive, so I looked at the WUid for it.. http://setiathome.berkeley.edu/workunit.php?wuid=894563761. I was _7, and it has now gone out to _8. It's one of those ones that keeps getting sent to stock Linux apps, or the "hit and run" people who install, download a bunch of work, and then bail. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0 |
From server status page (for Astropulse): Results out in the field: 31,473 Results returned and awaiting validation: 62,379 Workunits waiting for validation: 0 So now, that v505 isn't splitting anymore, does this mean there are over 30,000 AP workunits stuck in "waiting for validation" state for at least one of the tasks? Are projects admins aware of this problem with stuck APs? |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
So now, that v505 isn't splitting anymore, does this mean there are over 30,000 AP workunits stuck in "waiting for validation" state for at least one of the tasks? Are projects admins aware of this problem with stuck APs? They're not so much stuck as they are waiting for wingmen to report or time-out. "Time-out" is probably 2/3 to 3/4 of the actual situation. With only a 25-day deadline, they will get sent out to a third wingman, and so on. The number of v505's will decrease, but at a slow rate. More like a logarithmic curve: faster and more frequently since there are many of them now, but slower and less frequently as the total gets smaller. Though since there is a 25-day deadline and the last ones were split about 15 days ago, that means that even if they make it all the way to _9, that leaves somewhere around 200 days from now for the very last ones to be expired/retired/completed. 200 days from now is the last day of September. So by Halloween, 505's should be 100% completely gone. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
So now, that v505 isn't splitting anymore, does this mean there are over 30,000 AP workunits stuck in "waiting for validation" state for at least one of the tasks? Are projects admins aware of this problem with stuck APs? Given there are only 31,386 why not just send them all out again to a couple more systems with RACs over 10,000 and they will all be done in 2 weeks at most? |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Given there are only 31,386 why not just send then all out again to a couple more systems with RACs over 10,000 and they will all be gone in 2 weeks at most! Agreed. The same thing was said when we went from "Astropulse" to "Astropulse_v5". And then again when v505 came along, but it apparently can't--or more accurately, won't--be done. They will work themselves out over time. It's not like it causes anyone any anguish by having some of those still floating around. There are apps for crunching it, so if you happen to be assigned one, you can crunch it and return it, thus helping get rid of them. Once the lunatics app gets released for v6, you can put r409 (or the GPU equivalent) back into your app_info along with the new app for v6, and change your preferences to allow the 505's to be sent. That's what I plan on doing. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
So now, that v505 isn't splitting anymore, does this mean there are over 30,000 AP workunits stuck in "waiting for validation" state for at least one of the tasks? Are projects admins aware of this problem with stuck APs? Conversely, why bother to send them out? They will be processed in due course anyway. To do otherwise would require micro-management by the project staff, and I'm sure they've got plenty of higher-priority tasks on their hands already. |
kittyman Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004 |
Given there are only 31,386 why not just send then all out again to a couple more systems with RACs over 10,000 and they will all be gone in 2 weeks at most! As the kitties prefer MB anyway, no changes here until things settle and the new Lunatics installer is released.... I'll take any 505 reissues that happen my way, and add v6 when the opti version is tested for a bit and the validation/credit issues have been sorted. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
Given there are only 31,386 why not just send then all out again to a couple more systems with RACs over 10,000 and they will all be gone in 2 weeks at most! Won't be needed. We're writing both the apps, and the installer, in such a way that it will keep existing v505 tasks running, and fetch resends if they're available: while at the same time handling the new v6 tasks properly. And doing both types a bit faster than at present. That's the plan, at least. It's complicated, which is why we need the time for testing. |
cliff Send message Joined: 16 Dec 07 Posts: 625 Credit: 3,590,440 RAC: 0 |
Hi Richard, Thanks for the info, bookmarked the insect:-) Cheers, Cliff, Been there, Done that, Still no damm T shirt! |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Given there are only 31,386 why not just send them all out again to a couple more systems with RACs over 10,000 and they will all be done in 2 weeks at most? A regime such that when any WU needs to resent after a timeout it would always be sent to a computer with a RAC above a minimum to minimise the risk of getting another timeout. This should reduce the overall number of results awaiting validation and whinges about slow wingmen. |
kittyman Send message Joined: 9 Jul 00 Posts: 51469 Credit: 1,018,363,574 RAC: 1,004 |
Given there are only 31,386 why not just send them all out again to a couple more systems with RACs over 10,000 and they will all be done in 2 weeks at most? RAC is not always indicative of turn around time...... "Freedom is just Chaos, with better lighting." Alan Dean Foster |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Given there are only 31,386 why not just send them all out again to a couple more systems with RACs over 10,000 and they will all be done in 2 weeks at most? Given the current 400/50 limits the per computer RAC is very close. If it's not close enough then also include a maximum computer Average turnaround time. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Given there are only 31,386 why not just send them all out again to a couple more systems with RACs over 10,000 and they will all be done in 2 weeks at most? Quite true, turnaround time is what counts and the BOINC database keeps that statistic for each app version on a host. The available BOINC feature uses that plus consecutive valid and having a quota at least equal to the basic setting of 100 to judge. The documentation is slightly out of date, but gives a reasonable overall view. Reading the code in sched_send.cpp and a few other source files gives accurate current info on how it works. Having that feature turned on would of course be an additional load on the Scheduler processes, but with improved servers that may become feasible. What actual settings would be appropriate would take some thought, the general advice that about 25% of hosts ought to be considered reliable seems sensible. If set too tight, reissue tasks might occupy positions in the Feeder queue for too long. Joe |
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0 |
Quite true, turnaround time is what counts and the BOINC database keeps that statistic for each app version on a host. The available BOINC feature uses that plus consecutive valid and having a quota at least equal to the basic setting of 100 to judge. The documentation is slightly out of date, but gives a reasonable overall view. Reading the code in sched_send.cpp and a few other source files gives accurate current info on how it works. Could we kill two birds with one stone and have two feeder queues in different instances of the scheduler? Ideally one with a big queue for the relable hosts that would include resends and a second with a smaller queue for the others. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.