Panic Mode On (24) Server problems |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (24) Server problems
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next
| Author | Message |
|---|---|
Crazy EDF GPU bug you say? Is that when BOINC process a few % of a GPU task, then moves to another, process a few % of a GPU task, then moves to another,process a few % of a GPU task, then moves to another... Well, I've just stopped doing GPU task on that PC. The GF8500 isn't very fast, GPU tasks bogged the system to much, and I don't have the time to baby sit it. ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 933600 · | |
|
| |
| ID: 933601 · | |
I had before, but since I reimage the system often. I could run XP, Vista, Windows 7, Server 2003, Server 2008, or 2000 all in one day. The opt app didn't really like that. lol ____________ SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today! | |
| ID: 933603 · | |
I don't think you should use 6.6.36, I think you should use 6.6.38 or later. Sutaru, The problem is congestion: when the bandwidth is maxed out, everything is competing for the same bandwidth -- and uploads fail. If the BOINC client slows down, that means less congestion, more successful uploads, and more successful uploads means even less congestion. There is a saying: you can't put 8 pounds of stuff in a 5 pound bag. When things back up, we're trying to put 800 pounds of stuff in a 5 pound bag, and we're upset because the bag is constantly breaking. But this won't work at all if people can turn it off. If they can turn it off, they will, and if they do, we're no better off than we are right now. This is not a new idea. This is how SMTP E-Mail works, by backing way down when things get too busy. If part of your job was running a busy mail server, you'd see exactly what the upload server (and to a lesser extent, the download and scheduling servers) see every day. I understand that you look at your upload queue, and you see all the pending retries, and you worry that they won't get through -- and I realize that it's very hard to sit back, leave it alone and do nothing. But you won't know what happens if you do if you keep pushing harder. Sorry, I know, it's hard. It took me years to learn that when things are incredibly overloaded that trying to fix the overload usually makes things worse. -- Ned ____________ | |
| ID: 933604 · | |
|
| |
| ID: 933608 · | |
|
Hi, apart rom the fact, that the maintenance outage, was very short, I haven't had or have trouble UP &/or DOWN-loading. | |
| ID: 933609 · | |
Yes.. it's a pity that my GPU cruncher have a so big performance and BOINC isn't designed for this. Sutaru, I am saying the opposite. I am saying that this is a way to make more time available for monster crunchers like yourself. I am saying that the revised upload algorithm will help you much more than it will help anyone else. ... and you're saying "I don't care if the servers are completely buried, I want to make sure that we pile on as much as possible." You say "the solution to upload problems is to push really, really hard" and I'm saying "imagine a world were retries are unusual, and congestion is rarely a factor." When this happens in a crowd at some big concert or event, people get trampled and the news reports say "how sad" when an orderly line would have been fine. -- Ned ____________ | |
| ID: 933610 · | |
|
| |
| ID: 933614 · | |
|
Hi, it is possible to push, over and over again, to get a (bunch of) WU's, to Berkeley, but this is, like Ned Ludd said, UNwanted. | |
| ID: 933616 · | |
The UL function in DEV-V6.6.38 don't work well. The only problem with uploading in v6.6.38 is what you don't see when the project-backoff is in effect. Well, if you enable <file_xfer_debug> you will see a message like "project backoff, xxx hours/minutes/seconds", but nothing else, and you'll need to calculate yourself when next connect-attempt is, and these messages is also very easily overlooked in message-tab. The bug of no info is fixed in v6.10.0, and you'll see in Transfer-tab when a project is being backed-off, and for how long. Since it now works as intended, it's unlikely it will be removed again. Afterall, the only reason the code was removed again in v5.2.2 was the lack of info to users. edit - The code works by, if you're repeatably failing to upload (or download), all uploads (or downloads) to project gets an exponential backoff between 1 minute and 4 hours. If later an upload (or download) succeeds, all uploads (or downloads) will go-through as normal, if there aren't any more connection-problems that is... ____________ "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." | |
| ID: 933617 · | |
You can read the ticket here. The basic idea is: If uploads are working, keep going. If not, when uploads are failing, back off all uploads, not just the single failed upload. You can read the changeset, and you can read the discussion in the developer's forum from the above link. Without this change, the BOINC Clients attached to a project will launch a distributed denial of service attack on the servers. By stopping the DDoS attack, throughput will improve, and Sutaru, you very much want better throughput. Note that the <file_xfer_debug> flag turns on logging, and if it is not working, you are probably in the best position to test it. Getting this to work right and reducing load on the servers, will help throughput. ____________ | |
| ID: 933627 · | |
|
totally off topic, I just wanted to point out that the post-maintenance network congestion lasted about 3 hours and has come back down to the mid-60mbit range. | |
| ID: 933644 · | |
|
My take was that nobody is doing anything special this week, that would take the system down longer. Don't know, because nobody is talking on the "technical news" board. | |
| ID: 933674 · | |
edit - The code works by, if you're repeatably failing to upload (or download), all uploads (or downloads) to project gets an exponential backoff between 1 minute and 4 hours. If later an upload (or download) succeeds, all uploads (or downloads) will go-through as normal, if there aren't any more connection-problems that is... Since this is something I can test, I've been testing it, and it appears to work fine. 6.10.4 is definitely better in that you can see the backoff. As described, the logic means BOINC will try at least 6 times per day. If uploads are working, there won't be a project backoff, and uploads proceed normally. ____________ | |
| ID: 933688 · | |
|
If a project backoff for uploads is in progress in Boinc 6.6.38 it is visible in the projects tab. | |
| ID: 933693 · | |
If a project backoff for uploads is in progress in Boinc 6.6.38 it is visible in the projects tab. Hmm, was under the impression this only showed when next can ask for cpu, cuda or ati-work, and not any deferrals for uploads or downloads... I can't test it at the moment, since can't run BOINC on this computer... ____________ "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." | |
| ID: 933700 · | |
Also da habe ich mir eine DOS Patch File gemacht, die ich mit dem Taskplaner anwerfe. Und Nein da hatte ich auch nur Seti drauf, als ich Deinen vorgeschlagenen Manager draufhatte, bekam ich keine Pakete mehr nach. Erst da habe ich dann die anderen Projekte mit draufgemacht. Also nun hab ich wieder den neusten 6.6.36 mit Hilfe dieser Patchdatei. Läuft nun ganz gut wie man nun sehen kann ;-) Ach ja meine 600.000 Pendings sind immer noch weg, auch nach dem Boinc Service-Tag von gestern. ____________ | |
| ID: 933701 · | |
|
| |
| ID: 933707 · | |
|
| |
| ID: 933710 · | |
My pending credit: 295,397.36 8,267 for me. A new record. *shrugs and wanders off* ____________ Grant Darwin NT. | |
| ID: 933711 · | |
Message boards : Number crunching : Panic Mode On (24) Server problems
| Copyright © 2013 University of California |