Is this a Boinc bug??????

Message boards : Number crunching : Is this a Boinc bug??????
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201285 - Posted: 1 Mar 2012, 8:37:30 UTC
Last modified: 1 Mar 2012, 8:38:36 UTC

I have to ask, and I hope to get some true input.

An hour or so ago my internet connection dropped dead.
And....I have seen this happen before.

When the internet connection is totally lost, and Boinc tries to contact the servers....bad things happen.

The Boinc messages report....cannot resolve host name. And it ties everything up in absolute knots. Crunching comes almost to a dead standstill, and the computer goes round and round, can't get a response from Boinc for minutes at a time. Just ties everything up tight.

Can't even reboot at that point, because Boinc will not respond to shut down for reboot!!!

What causes this??????
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201285 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38302
Credit: 261,360,520
RAC: 489
Australia
Message 1201296 - Posted: 1 Mar 2012, 9:16:03 UTC - in response to Message 1201285.  

I had this happen last week when 1 of our ISP's brought down our country's DNS system and what a pain in the ar** it was to get 1 of mine back up after a M$ update restart. :(

Cheers.
ID: 1201296 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201298 - Posted: 1 Mar 2012, 9:20:12 UTC - in response to Message 1201285.  

In simple terms, it's in an area of networking called 'DNS' - Domain Name Service. Whenever we do anything over the internet - be it browsing the web, sending and receiving email, or crunching with BOINC, we like to refer to places by name. While I'm typing this, my browser says I'm talking to setiathome.berkeley.edu

But my computer isn't. My computer does this by number, and my computer thinks it's talking to 128.32.18.150

It's DNS which makes this happen. And because you really, really don't want to keep a list of every web server in the world on your computer (unless you're Google), your ISP provides a DNS service for you, as part of the package.

Whenever BOINC needs to call home, to report or request new work, it knows it has to contact setiboinc.ssl.berkeley.edu: so the first thing it has to do is to call directory enquiries (DNS) and get the number - 208.68.240.20

Yes, it would have been possible to write BOINC to use numbers from first principles, but it wouldn't have helped - if the line's down, you can't do anything with the number anyway. It's very rare for a line to be working, but DNS not to be available, so it's normal practice to use the readable form of web addresses, and DNS, all the time.

So when a line goes down, the very first that BOINC (or anything else) knows about it is when the DNS lookup fails. That's what gives you the message "can't resolve hostname" - that's right and proper, it gives you the information that DNS isn't available.

But as to why BOINC slows to a crawl when this happens - yes, that's worthy of investigation, and it ought to be possible handle a perfectly normal situation better. But I'd question whether the actual crunching slows down - you wouldn't happen to have opened Windows Task Manager while this was going on, I suppose, and checked the CPU usage of the science apps? I'd expect them still to be running, even if BOINC is slow to update the progress display.

To get the BOINC devs to even think about looking at the problem, you'd need to supply some information. The version of BOINC you noticed this on, of course. How you connect to the internet - some sort of router, I'd guess. How your computers are configured to communicate with the router.

One good starting point would be to open a command prompt, and type

ipconfig /all

That would tell us where the first port of call for a DNS lookup is - do you see a 'Default Gateway' address, and a 'DNS Server' address? One or several DNS servers? are the Gateway and DNS Server addresses the same? That sort of thing.
ID: 1201298 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201303 - Posted: 1 Mar 2012, 9:29:40 UTC - in response to Message 1201298.  

When this happens, I can't open much of anything at all......
On my crunch only rigs, I have the GPU apps set to high priority.....so when they lock up.......the whole rig locks up.

I do know that GPU processing drops to a standstill....because I have Killawatts on the rigs, and on my top rig, the power usage drops from around 860 watts to about 300 or some......in other words....kaput. Saw the same drop on a Win7 rig, and a couple of Win2Ks.

Boinc just haz a freakin' fit when it cannot even TRY to contact the servers.
May be OS/comms related and not exactly Boinc's fault, I just do not have a clue.

Mystified. But fit to be tied at the same time.

Just because all connection with the outside world is lost, it should not cause crunching to come to almost a dead halt. This goes in cycles, the rigs will lock up for several minutes, and then all is good until they try to phone home again...... Then, the lockup ensues once again.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201303 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201305 - Posted: 1 Mar 2012, 9:38:57 UTC

You could try setting 'Network Activity' to suspended for the time being, so that it doesn't even try? (Provided that you set it back when your browser starts working again, of course)

I keep an eye on crunch-only machines via BoincView, running on an old P4 that is so outdated I've given up crunching on it because it wouldn't contribute anything worthwhile. That means I can turn off network activity remotely, without having to get a program open on the actual machine that's having problems.
ID: 1201305 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201307 - Posted: 1 Mar 2012, 9:44:33 UTC - in response to Message 1201305.  

You could try setting 'Network Activity' to suspended for the time being, so that it doesn't even try? (Provided that you set it back when your browser starts working again, of course)

I keep an eye on crunch-only machines via BoincView, running on an old P4 that is so outdated I've given up crunching on it because it wouldn't contribute anything worthwhile. That means I can turn off network activity remotely, without having to get a program open on the actual machine that's having problems.

Yes I could try to suspend network activity, when I could possibly interrupt the lockup to do so.......but then I would first have to get Boinc to respond, you see.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201307 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1201311 - Posted: 1 Mar 2012, 9:49:28 UTC - in response to Message 1201298.  
Last modified: 1 Mar 2012, 9:52:32 UTC

For Windows Vista and later if BOINC was updated to use IsInternetConnected() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366143(v=vs.85).aspx) there would not be this issue.
ID: 1201311 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38302
Credit: 261,360,520
RAC: 489
Australia
Message 1201312 - Posted: 1 Mar 2012, 9:50:00 UTC - in response to Message 1201305.  

You could try setting 'Network Activity' to suspended for the time being, so that it doesn't even try? (Provided that you set it back when your browser starts working again, of course)

I had to leave mine turned off for 45 mins until my ISP could patch around the problem as even trying to suspend network activity was impossible on that PC as trying to do anything else on it in fact (the ISP that was to blame here is called "Dodo" which the name should say it all I suppose).

Cheers.
ID: 1201312 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201314 - Posted: 1 Mar 2012, 9:56:09 UTC

I dunno, but anything that keeps the kitties from crunching merrily along when work is in the cache deeply bothers me. Comms be damned sometimes. It happens.

Should not throw such a wrench into the works.


"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201314 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38302
Credit: 261,360,520
RAC: 489
Australia
Message 1201315 - Posted: 1 Mar 2012, 9:56:37 UTC - in response to Message 1201311.  

For Windows Vista and later if BOINC was updated to use IsInternetConnected() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366143(v=vs.85).aspx) there would ne be this issue.

That wouldn't work then either (in fact nothing would work on that PC until the DNS problem was resolved).

By rights BOINC shouldn't have to rely on the internet connection just to start up but trying to do anything on a PC while this is happening is completely useless I've found.

Cheers.
ID: 1201315 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201316 - Posted: 1 Mar 2012, 10:04:29 UTC

I had a series of Internet connection problems just before and over Christmas - some of the time the internet (ADSL) was completely dead, and for one night ADSL would sync but the exchange wouldn't accept my login. I don't recall it making any difference to my cached BOINC crunching. I wonder whether this could be another case of BOINC failing to scale up with large cache sizes?
ID: 1201316 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201318 - Posted: 1 Mar 2012, 10:07:02 UTC - in response to Message 1201316.  

I had a series of Internet connection problems just before and over Christmas - some of the time the internet (ADSL) was completely dead, and for one night ADSL would sync but the exchange wouldn't accept my login. I don't recall it making any difference to my cached BOINC crunching. I wonder whether this could be another case of BOINC failing to scale up with large cache sizes?

I don't think so......I have dealt with faaaaaaaar larger caches that I currently am able to get.
It's something with comms, not with cache.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201318 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51562
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201319 - Posted: 1 Mar 2012, 10:10:19 UTC
Last modified: 1 Mar 2012, 10:10:35 UTC

Gotta toss meself into bed with the kitties now.......working 11 hour shifts.
Will check back in...
G'night.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1201319 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1201320 - Posted: 1 Mar 2012, 10:31:35 UTC - in response to Message 1201315.  
Last modified: 1 Mar 2012, 10:34:54 UTC

For Windows Vista and later if BOINC was updated to use IsInternetConnected() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366143(v=vs.85).aspx) there would ne be this issue.

That wouldn't work then either (in fact nothing would work on that PC until the DNS problem was resolved).

By rights BOINC shouldn't have to rely on the internet connection just to start up but trying to do anything on a PC while this is happening is completely useless I've found.

Cheers.


A lot depends on how the BOINC code is written. I have just tried calling IsInternetConnected() after setting none existent DNS servers and my testing indicates it will. If you wish to see for yourself run SIV (http://rh-software.com/) and you will see the green blob above the CPUs change as the DNS availability changes.
ID: 1201320 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201322 - Posted: 1 Mar 2012, 10:52:59 UTC - in response to Message 1201320.  

For Windows Vista and later if BOINC was updated to use IsInternetConnected() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366143(v=vs.85).aspx) there would ne be this issue.

That wouldn't work then either (in fact nothing would work on that PC until the DNS problem was resolved).

By rights BOINC shouldn't have to rely on the internet connection just to start up but trying to do anything on a PC while this is happening is completely useless I've found.

Cheers.

A lot depends on how the BOINC code is written. I have just tried calling IsInternetConnected() after setting none existant DNS servers and my testing indicates it will. If you wish to see for yourseft run SIV (http://rh-software.com/) and you will see the green blob above the CPUs change as the DNS availability changes.

I suppose the question there would be: how has Microsoft implemented IsInternetConnected() under the hood, and what is the latency of the call in the two states?

If there's some Windows service continually polling for connectivity in its own time (which might be expensive in internet bandwidth), then IIC (if we can call it that for short) could quickly return a cached flag. But if IIC has to check afresh ab initio each time it's called, then latency in the call might still be a problem. After all, if we ask "Is DNS listening?" (and what other test would IIC use - what defines 'connection', in technical terms?), nothing is ever going to answer "No". You just have to wait long enough to be able to interpret the silence. It's just like walking into a house and calling "Is anybody there?" - sometimes you have to allow time for a yawn and a stretch before you can be sure that no-one's going to answer.

The technical answer about implementation is that BOINC uses LibCurl. I'm not sure which version they currently use - we successfully got BOINC to upgrade a few years ago, because we were falling foul of an earlier bug in libcurl.

Because internal communications between the BOINC client and the BOINC manager also take place over TCP/IP, the libcurl library will be implicated here. I suspect that the latency of a failed/failing DNS call in libcurl is blocking client/manager communications too - perhaps it depends on the libcurl threading model. Does any of that make sense?
ID: 1201322 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 38302
Credit: 261,360,520
RAC: 489
Australia
Message 1201323 - Posted: 1 Mar 2012, 10:54:36 UTC - in response to Message 1201320.  

For Windows Vista and later if BOINC was updated to use IsInternetConnected() (http://msdn.microsoft.com/en-us/library/windows/desktop/aa366143(v=vs.85).aspx) there would ne be this issue.

That wouldn't work then either (in fact nothing would work on that PC until the DNS problem was resolved).

By rights BOINC shouldn't have to rely on the internet connection just to start up but trying to do anything on a PC while this is happening is completely useless I've found.

Cheers.


A lot depends on how the BOINC code is written. I have just tried calling IsInternetConnected() after setting none existent DNS servers and my testing indicates it will. If you wish to see for yourself run SIV (http://rh-software.com/) and you will see the green blob above the CPUs change as the DNS availability changes.

It's far to late to try that again but it was impossible to try anything at the time as the system (my 2500K) just locked up to fast, end of story here as it's bedtime. ;)

Cheers.

Cheers.
ID: 1201323 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201325 - Posted: 1 Mar 2012, 11:18:21 UTC

That previous LibCurl update was to version 7.19.7 in December 2009 - see [trac]changeset:19768[/trac].

Since then, the Mac client has been updated to version 7.21.7, but I can't see any newer Windows updates.

LibCurl itself is now at version 7.24.0 - but there's a huge list of changes and fixes, and I don't really feel like going through the whole list to try and work out if there's anything that would help us here. Do we have any networking specialists reading this board?
ID: 1201325 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1201327 - Posted: 1 Mar 2012, 11:30:54 UTC - in response to Message 1201285.  

Why boinc goes banana's is beyond me, I've had a couple of DNS failures after my ISP changed something and my router failed to release and re-aquire DNS.

Boinc tried and failed to get the servers & the backup, but all it then did was go into increasing timeouts.

I was able to shut down boinc and loginto my router and do a manual release etc.
That cured the DNS lookup problem, and bionc was ok after I restarted.

Dunno why yours does something very different.. wierd.

As you know I have the current lunatics installed but I've had the broadband problem before and after lunatics..

Of course I'm using Win7 sp1 maybe its an os thing? Other OS's might give different results since the apps would be written for those OS's..

Cheers
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1201327 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201328 - Posted: 1 Mar 2012, 11:36:28 UTC - in response to Message 1201327.  

Why boinc goes banana's is beyond me, I've had a couple of DNS failures after my ISP changed something and my router failed to release and re-aquire DNS.

Boinc tried and failed to get the servers & the backup, but all it then did was go into increasing timeouts.

I was able to shut down boinc and loginto my router and do a manual release etc.
That cured the DNS lookup problem, and bionc was ok after I restarted.

Dunno why yours does something very different.. wierd.

As you know I have the current lunatics installed but I've had the broadband problem before and after lunatics..

Of course I'm using Win7 sp1 maybe its an os thing? Other OS's might give different results since the apps would be written for those OS's..

Cheers

All communications (in this context) are done by the BOINC client.

I suspect the version of BOINC in use may make the most difference: the version of Windows may have some effect, but the Lunatics science applications will certainly have no effect at all (except by increasing the number of tasks you complete and have to return). They don't play any part in the communication processes.
ID: 1201328 · Report as offensive
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1201329 - Posted: 1 Mar 2012, 11:43:35 UTC - in response to Message 1201322.  

I suppose the question there would be: how has Microsoft implemented IsInternetConnected() under the hood, and what is the latency of the call in the two states?


I do not currently have access to the Windows source, so I cannot check how it works.

SIV works by polling in the "Remote Thread" at 2 seconds intervals (by default). Looking at the Menu->Edit->STC Info panel I can see any delay must be less than 1 clock tick (16ms) when it calls IsInternetConnected(). Further the CPU usage is effectively zero (16 ms after 20 minutes). If there were a delay then the delay in ms would be displayed to the right of the number below "upd_rapi" as it is for some of the other threads.
ID: 1201329 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Is this a Boinc bug??????


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.