Struggling Downloads!!

Message boards : Number crunching : Struggling Downloads!!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1112432 - Posted: 2 Jun 2011, 18:54:09 UTC

Has anyone else experienced struggling d/ls since the outage yesterday. It seems that I had to employ nnt & suspend network activity on my A-SYS cruncher so that my B-SYS can possiblly d/l some enough tasks to hold it for a while.

The faster machine will d/l a number of WUs with no problem while the other one is constantly going into retry, mostly with a project backoff of any where from 1.5 to several hours. While the slower machine is in retry/backoff status, the faster one will come along and gulp down any number of WUs without any problem. Sometimes while the slower one is trying to download, the faster one would jump in to d/l throwing the slower on back into retry/backoff status again.

Never noticed this before the last big server problem, or am I just being impatient?
ID: 1112432 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 1112444 - Posted: 2 Jun 2011, 19:32:55 UTC

I had a problem on one of my machines, was getting an error message when it tried to download any of the 4 tasks it had waiting:
[---] Can't create HTTP response output file projects/setiathome.berkeley.edu/17se10ag.9252.21744.16.10.52

I was getting the same error for all 4 downloads, but with the specific file name in the error message. It seems like somehow something happened to the 4 files in question, maybe some permission got unset or set wrong somehow? Anyway, I quit out of Boinc, deleted the 4 files in question, and re-started Boinc and they all re-started from 0% and zipped right through on the first try. One of them had been up to something like 32 retries before.
Maybe something similar on your end? Do you get any error message when it retries?

-Dave
ID: 1112444 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1112455 - Posted: 2 Jun 2011, 19:45:46 UTC - in response to Message 1112444.  

Mine situaion is a little different in that it seems that whatever is happening is on the other end. I have

06/02/2011 14:28:32 | SETI@home | Resent lost task 02ap11aa.17681.7429.12.10.198_2
06/02/2011 14:28:36 | | Project communication failed: attempting access to reference site
06/02/2011 14:28:37 | | Internet access OK - project servers may be temporarily down.
06/02/2011 14:35:47 | SETI@home | Temporarily failed download of 04mr11af.30829.16018.15.10.51: HTTP error
06/02/2011 14:35:47 | SETI@home | Backing off 40 min 44 sec on download of 04mr11af.30829.16018.15.10.51
06/02/2011 14:35:47 | SETI@home | Started download of 03mr11ad.25619.11110.14.10.46
06/02/2011 14:36:01 | | Project communication failed: attempting access to reference site
06/02/2011 14:36:02 | | Internet access OK - project servers may be temporarily down.
06/02/2011 14:36:51 | SETI@home | Computation for task 04mr11af.12694.21744.10.10.199_0 finished
06/02/2011 14:36:51 | SETI@home | Starting task 04mr11af.12694.21744.10.10.195_0 using setiathome_enhanced version 608
06/02/2011 14:36:53 | SETI@home | Temporarily failed download of 03mr11ad.25619.11110.14.10.46: HTTP error
06/02/2011 14:36:53 | SETI@home | Backing off 33 min 45 sec on download of 03mr11ad.25619.11110.14.10.46
06/02/2011 14:36:53 | SETI@home | Started upload of 04mr11af.12694.21744.10.10.199_0_0
06/02/2011 14:36:57 | | Project communication failed: attempting access to reference site
06/02/2011 14:36:58 | | Internet access OK - project servers may be temporarily down.


Even massively abusing the retry button, I get the same messages.
ID: 1112455 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112457 - Posted: 2 Jun 2011, 19:46:29 UTC - in response to Message 1112432.  
Last modified: 2 Jun 2011, 20:05:50 UTC



Has anyone else experienced struggling d/ls since the outage yesterday. It seems that I had to employ nnt & suspend network activity on my A-SYS cruncher so that my B-SYS can possiblly d/l some enough tasks to hold it for a while.




Yes, I'm having the same sort of problem. Let me apologize to you and everyone else for this.

It's all my fault.

I've got a new build that I'm trying to make sure is working, and with BOINC Manager 6.12.26 loaded on it, it doesn't seem to be working. ...or it does if you happen to be sitting with it watching it and you hit the "Retry" button a lot.

I'm getting "project backoff" times that get longer and longer and longer, but as soon as you hit "Retry" it will download a couple of WUs and MAYBE upload one, maybe two, then go back to "retry in <a few> minutes." If you sit there and watch it, the few minutes gets longer, and longer, and longer until it tells me "project backoff" of several, several hours. Hit "Retry Now" and it will immediately download a couple and upload maybe half as much.

What gives me the most heartburn about it is that for some reason "Do Network Communication" doesn't seem to do anything, or if it does, it isn't being registered by the servers.

The "Details" section of the application tells me that this machine is doing SETI work 100% of the time with a network connection 100% of the time, but for some reason it is stuck on "Number of times client has contacted server" at 7, total. It is set to always run the CPU and GPU. I can't find anything wrong with the setup.

It's running 24/7. It's crunching when it has something to do. It has a constant internet connection. It's trying to phone home. The firewall isn't in its way. It reports that the internet connection is fine, that the project may be down.

Two other machines on the same network are uploading and downloading just fine, but they are running 6.10.58 and 6.10.60.

I can't help but think that the issue is something built into the new client, or that it has something to do with the scheduler, or both. It has long finished WUs that can't upload, so it looks to the scheduler as though it has twice as much "in progress" as "pending" and very little "valid."

There isn't much point in my speculating about what may or may not be going-on since I don't know what's been done. But I can report that my other computer running 6.12.26 (the eldest cruncher) also seems to have an obstruction.

I can't come to any conclusions, though, because it IS "contacting the server" with regularity.

What versions of the BOINC manager are you running?


EDIT: By the way, I'd be perfectly happy to be patient and let the servers catch-up, but with a new build I'm not sure waiting will solve the problem.

EDIT EDIT: I've just checked the IP address and it reports that it is the same as the last 9 times; but the "details" only shows 7 total contacts. Frustrating? You betcha.
ID: 1112457 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 1112458 - Posted: 2 Jun 2011, 19:49:43 UTC

Cliff,
You could try opening a cmd window and do
ipconfig /flushdns
and then re-try comms, maybe that would help?

-Dave
ID: 1112458 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1112465 - Posted: 2 Jun 2011, 20:08:30 UTC - in response to Message 1112458.  

my Windows boxes always seem to lag behind with the upload/download after the downtime. My 1 linux box never has any problems getting work or sending results. Perhaps the linux systems have an easier time talking to each other. For whatever reason my linux is always my first machine to be done u/l and dl work


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1112465 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1112472 - Posted: 2 Jun 2011, 20:23:24 UTC - in response to Message 1112458.  

Cliff,
You could try opening a cmd window and do
ipconfig /flushdns
and then re-try comms, maybe that would help?

-Dave


Should have mentioned it in the first post, but been there, done it and have the t-shirt. Did it immediately prior to a machine recycle, but thanks for the suggestion.
ID: 1112472 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1112476 - Posted: 2 Jun 2011, 20:25:49 UTC - in response to Message 1112457.  

What versions of the BOINC manager are you running?

Both machines are running 6.12.28(x64).
ID: 1112476 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112487 - Posted: 2 Jun 2011, 20:46:50 UTC - in response to Message 1112476.  

What versions of the BOINC manager are you running?

Both machines are running 6.12.28(x64).



This comes under the heading "Things that make you go, "Hmmmmm.""

ID: 1112487 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1112503 - Posted: 2 Jun 2011, 21:49:42 UTC - in response to Message 1112487.  

What versions of the BOINC manager are you running?

Both machines are running 6.12.28(x64).



This comes under the heading "Things that make you go, "Hmmmmm.""

or things that should be reported to the alpha team...


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1112503 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 1112504 - Posted: 2 Jun 2011, 21:52:43 UTC

I had the same problem
This the second time for me I had to switch to an older boinc and this time I switched to
6/2/2011 5:48:35 PM Version change (6.12.26 -> 6.10.60)

ID: 1112504 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112524 - Posted: 2 Jun 2011, 23:54:39 UTC - in response to Message 1112503.  

What versions of the BOINC manager are you running?

Both machines are running 6.12.28(x64).



This comes under the heading "Things that make you go, "Hmmmmm.""

or things that should be reported to the alpha team...




MAYBE, BUT --- After watching this computer have trouble connecting last night and not connect all day I decided to pay it another visit (at my office). I'm on it now.

I told you it was a new build. Windows downloaded and installed an update overnight, restarted the computer, and promptly screwed-up the wireless network adapter drivers. (a restart didn't solve it, uninstalling and reinstalling the hardware device didn't solve it, reinstalling the driver did)

Then I restarted BOINC and what little work I had in queue immediately uploaded.

And it downloaded one CUDA task, completed that, and uploaded it.

It talked to the scheduler about getting more work, but didn't. Then, about two minutes later it downloaded a bunch of work.

I don't know why it wouldn't play nice with the servers yesterday, but today's fiasco was a Windows update issue. All seems to be well right now.

The other computer running 6.12.x is also playing well with others today.

Sometimes things are coincidences and sometimes you can add 2+2 and get 4 when the real question was 2+.2 and you misread it.
ID: 1112524 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112529 - Posted: 3 Jun 2011, 0:11:46 UTC - in response to Message 1112524.  

Thanks for posting that. It's an important lesson - to learn, and to remember for the future.

We are dealing with complicated beasties - not only the beasties on our desks, but the beasties that lurk in our communications networks, and the beasties that lurk in Berkeley's server closet.

When something goes wrong, it's all to easy too grab at the newest/latest/closest/most annoying component, and point the finger of blame. But if we want to get things fixed, we have to go further than that - we have to be sure we've picked on the right culprit, and provide evidence to back up our case. Sometimes - like your story tonight - our carefully-constructed theory collapses like a house of cards. No problem - we learn from the setbacks. But next time, it may be a real live BOINC bug :-)

PS - I have my machines set to download Windows updates, but only to install them when I give the say-so. So far, I've avoided Internet Explorer 9.....
ID: 1112529 · Report as offensive
amanda_b

Send message
Joined: 6 Jul 07
Posts: 6
Credit: 1,686,734
RAC: 0
United States
Message 1112535 - Posted: 3 Jun 2011, 0:36:42 UTC - in response to Message 1112504.  
Last modified: 3 Jun 2011, 0:38:46 UTC

I had the same problem
This the second time for me I had to switch to an older boinc and this time I switched to
6/2/2011 5:48:35 PM Version change (6.12.26 -> 6.10.60)


thats exactly what I had to do.
downgrade back to 6.10.60
downloads better AND downloads more WUs, uploads better and the part that I don't quite understand is I crunch faster with 6.10.60

It also seemed that many of the configuration options in 6.12.26 did absolutely nothing.
ID: 1112535 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1112545 - Posted: 3 Jun 2011, 1:48:41 UTC - in response to Message 1112529.  

Thanks for posting that. It's an important lesson - to learn, and to remember for the future.


When something goes wrong, it's all to easy too grab at the newest/latest/closest/most annoying component, and point the finger of blame. But if we want to get things fixed, we have to go further than that - we have to be sure we've picked on the right culprit, and provide evidence to back up our case. Sometimes - like your story tonight - our carefully-constructed theory collapses like a house of cards. No problem - we learn from the setbacks. But next time, it may be a real live BOINC bug :-)



You won't get arguments from me; which is why I left my easy-chair and drove to the office just to be SURE.

Since I run several computers, and since the only thing in common between two of them was BOINC 6.12.x, and since only those two were constipated, and since I could force both of them to upload and download with "Retry," <so they were exhibiting the exact same behavior> and since the others were all running normally...

It ain't like I exactly jumped to a conclusion.

Let's not be too quick to make a donkey of ourselves by assuming even more things.

If we just *have* to blame a single user problem, then we assume: "There was nothing wrong at SETI's end or with the client."

Okay ---- then these questions remain:

1) Why was it only machines running the newest client that seemed to be getting the message "internet connection okay, project may be down" and experiencing ever-lengthening project "backoff times" when the connections were up? Not all of the crunchers in question were mine.

2) What constipated 6.12.x in the first place?

3) Why did downgrading to 6.10.x "fix" other people's problems?

4) What changed and made one of my 6.12.x clients start behaving correctly on its own once the other was effectively "unplugged?" (this is especially interesting since another user with two 6.12.x computers had the experience that the only way he could get one client to work correctly was to turn the other off; could be coincidence, but maybe not)

The fact that I lost my network connection in the middle of the night only explains why that one computer didn't connect today.

While I understand the desire (mine, too) to dismiss the problem by citing a tangential occurrence, in this case I believe it would be a mistake to do that.

Remember in the BOOK Jurrasic Park that all the dinosaurs were accounted-for and all was declared "okay" when, in fact, there were many, many, many more that had not been counted? The reason for that was the system was looking for a specific number of each beast. Once it found that number, the system stopped looking.

In my opinion, it's not time to stop looking for extra dinosaurs in the bushes just because I happened to find one.

Bret
ID: 1112545 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1112577 - Posted: 3 Jun 2011, 6:48:14 UTC - in response to Message 1112545.  


All mine are experiencing slow downloads. It's actually quite funny to see that it takes 10-15 minutes to get a wu and ~7 min to process one ... (I can see the water guage slowly going down) ... the upshot for me is that I seem to have about 2,000 wus downloading to each box without any end in sight ... almost like the recent old days ...
ID: 1112577 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1112675 - Posted: 3 Jun 2011, 14:45:15 UTC


I´m just wondering why it doesn´t get any better.
There are 6.7 million units out in the field.



With each crime and every kindness we birth our future.
ID: 1112675 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112678 - Posted: 3 Jun 2011, 14:52:57 UTC - in response to Message 1112675.  

I´m just wondering why it doesn´t get any better.
There are 6.7 million units out in the field.

In a word - shorties.
ID: 1112678 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1112690 - Posted: 3 Jun 2011, 15:30:27 UTC - in response to Message 1112678.  

I´m just wondering why it doesn´t get any better.
There are 6.7 million units out in the field.

In a word - shorties.


I dont think so.
Not for 7 days in a row.
There was already 6,1 miilion out before the storage server broke.



With each crime and every kindness we birth our future.
ID: 1112690 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1112702 - Posted: 3 Jun 2011, 15:45:40 UTC - in response to Message 1112690.  

I´m just wondering why it doesn´t get any better.
There are 6.7 million units out in the field.

In a word - shorties.

I dont think so.
Not for 7 days in a row.
There was already 6,1 miilion out before the storage server broke.

Yes, so.

Two-thirds of the downloads I've currently got backed-up (33 of 52) are shorties. When the proportion is as high as that, we (collectively) crunch 'em faster than we can download 'em.
ID: 1112702 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : Number crunching : Struggling Downloads!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.