New work download issues

Message boards : Number crunching : New work download issues
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1344767 - Posted: 9 Mar 2013, 23:49:21 UTC - in response to Message 1344762.  

Sorry, the program can be designed to allow the WU to download fully before moving onto the next one.

No it can't- Downloads are controlled by your computers operating system & it's network & TCP/IP drivers. The programme can be configured at to when to give up on trying a download & to try again later, but the downloads are done by your computer.
Hence the TCP optimisation thread to configure the computer specifically how to handle a highly congested network link- which is exactly what we have with Seti.
Grant
Darwin NT
ID: 1344767 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1344770 - Posted: 10 Mar 2013, 0:24:40 UTC - in response to Message 1344762.  

See..here is where I agree and disagree. If there was a problem with the download speed or lack of data I could understand. To me this is a program issue. There is not a network traffic issue as far as speed. Give me one good reason why a WU downloading quickly should abort at 99.78% to just a WU that is just at 8.7%? Let that WU finish so work can be started. This is not a one time thing, watch it (I'm sure you have) they will stop in the high 90's and jump to lower ones. Sorry, the program can be designed to allow the WU to download fully before moving onto the next one. I have been running this long enough, and never had this issue until recently. Sure, it may be more popular now, so adjust the program to deal with the added people.

This is like watching the 1st place marathon running stop 3 feet in front of the finish line to let the last place person catch up!

You are mixing two different things...
BOINC (not the project) is the app that handle the file transfers. And it doesnt stop the transfer unless there is no data received in some time.
When this happens, as that file was the one failling it tries the next one and it keeps doing that on every faillure. You think that it will be better that instead of trying the next file it should retry the same. Well if they do that and the file has any kind of permanent error then you will get all the file transfers stalled due to just one problematic download. Anyway, as this is a kind of subjective matter Im not going to try to convince you that the current way is better, the only thing I can say is that the owners of the BOINC project designed it to work in that way and they dont want to change it. (IIRC it was asked to be designed in that way and with longer back-off by some people that puts money to support BOINC...)

Now why the transfers fail?, simply, because the SETI pipes are heavily congested. And as the project is dependant on Berkeley directives they cant do aything to increase the pipes capacity. (And also there are other issues like the capacity of their servers that might not be able to support the higher load). So, to deal with this issue we needed to find a workaround on our side. The first one was the use of proxy servers, but as this proxies are not meant to heavy use, the owners ussually block the traffic from SETI as soon as they find that we are using it. A couple of days ago cdemers found that there are an optional setting in th TCP protocol that helps to pass through the congestion, thats the trick that explain why Linux computers were not having issues and why the proxies worked.
If you aply those settings, the congestion wont be a big issue anymore and the files transfers wont be failling and you will not care anymore about in which order BOINC chooses the next file to retry... simply because there will not be retries... (or at least no so many to become an issue).
ID: 1344770 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 6995
Credit: 2,084,789
RAC: 3
Message 1344771 - Posted: 10 Mar 2013, 0:27:52 UTC
Last modified: 10 Mar 2013, 0:30:42 UTC

Just commencing CUDA tasks on computer 6929589 right now after finally getting that mentioned cufft32_30_14.dll file.

The first 16 tasks received error 1 because once again the environment variable CUDA_GRID_SIZE_COMPAT was not set.

This is a driver software problem that better should be fixed.
ID: 1344771 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 1344774 - Posted: 10 Mar 2013, 0:43:50 UTC - in response to Message 1344771.  

This is a driver software problem that better should be fixed.

Then you need to talk to Nvidia about it.

Grant
Darwin NT
ID: 1344774 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1344775 - Posted: 10 Mar 2013, 0:44:16 UTC - in response to Message 1344771.  

Just commencing CUDA tasks on computer 6929589 right now after finally getting that mentioned cufft32_30_14.dll file.

The first 16 tasks received error 1 because once again the environment variable CUDA_GRID_SIZE_COMPAT was not set.

This is a driver software problem that better should be fixed.

Agreed. NVIDIA wrote the application, and NVIDIA wrote the drivers. The solution is entirely in NVIDIA's hands, no one else's.

NVIDIA provided the CUDA_GRID_SIZE_COMPAT workround, and I've done my best to promote it. But I can't write an NVIDIA driver by myself.
ID: 1344775 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1344812 - Posted: 10 Mar 2013, 2:55:10 UTC - in response to Message 1344775.  
Last modified: 10 Mar 2013, 2:56:26 UTC

Just commencing CUDA tasks on computer 6929589 right now after finally getting that mentioned cufft32_30_14.dll file.

The first 16 tasks received error 1 because once again the environment variable CUDA_GRID_SIZE_COMPAT was not set.

This is a driver software problem that better should be fixed.

Agreed. NVIDIA wrote the application, and NVIDIA wrote the drivers. The solution is entirely in NVIDIA's hands, no one else's.

NVIDIA provided the CUDA_GRID_SIZE_COMPAT workround, and I've done my best to promote it. But I can't write an NVIDIA driver by myself.


Haha!, and it's a complex situation indeed. One where if you could go back in time to 6.09/6.10 development and force the Engineers to use 'Best Practices' that hadn't been invented yet, then the problem would never have arisen.

Sadly a driver fix alone won't repair outdated technology, in this case dated popular application coding techniques for a constantly evolving language & api. With the pace of GPU development, things change, and Cuda 3.0 was very much a preliminary Fermi Cuda support Item. 3.1 had specific CUFFT library multi-GPU bugs making it unusable/unsupportable here, further making for our practical purposes 3.2 the practical choice.

Along those lines, the sentiment from my own direction would be that applications need maintenance, at least while the GPU technology field is changing so rapidly. Paraphrasing/summarising my own discussions with their dev support, they go along the lines of 'Looks like you found a quality control issue, Try this workaround. Your application seems to work, when are you updating to that ?' lol
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1344812 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 6995
Credit: 2,084,789
RAC: 3
Message 1344832 - Posted: 10 Mar 2013, 5:30:13 UTC - in response to Message 1344812.  
Last modified: 10 Mar 2013, 5:36:28 UTC

Hi, Jason!

Right now I am fooling around with some eight 2 TB SATA discs.

Apparently the operating system is a little reluctant at determining whether these discs are SATA, ATA, or even SCSI. As you may know, the Asus motherboards also provides RAID support if necessary.

When it comes to 32 bits vs. 64 bits operating systems (software and possibly firmware) as well as 64 bits processor architecture, 64 bits in my opinion is most likely the best option when it comes to the quality of results, including running times (meaning processing times).

You may possibly know that I am also running PrimeGrid tasks. I am having two accounts there. More than 11 million credits on my primary account and a little more than 1 million credits on my secondary account.

I happen to be neither a software developer nor a systems administrator, but I run the so-called Genefer tasks at PrimeGrid. These tasks may be run by means of both CPU and GPU. The disadvantage of the GPU is a possibly noisy graphics card as well as a very sluggish screen. In return, these tasks may be running some 20 times faster than using the CPU for the same purpose.

The Genefer tasks at PrimeGrid may be calculating either b^524288+1, b^1048576+1, or b^4194304+1, where b is some 3-6 digits. b^2097152+1 apparently is reserved.

One of the options, or possibilities of running these tasks is by means of a 64 bits CUDA application using a 64 bits operating system on a 64 bits processing platform. For Windows, the current options are either Windows XP 64 bits (like Windows XP 64 Enterprise server, which I do not have), Windows Ultimate 7, 64 bits, or Windows 8, 64 bits. Again multi-user or multi-processor systems or architectures (mini-computers or mini-frames) may benefit from this when excluding the other complexities inherited or built-in into such systems.

I have posted a question about a 64 bits GeneferCUDA application a couple of times at the PrimeGrid message boards. I also was able to get the file for this by means of searching the web. Right now that disc lies on my shelf.

Therefore, the so-called "optimized applications" should readily be generally available as long as there is the current thought of belief that the "standard applications" are not fully up to the task. For Seti@home, only the gaussian search is thought to be returning something close to a signal which may be coming from another civilization in space.

In order to be able to obtain shorter processing times, at least the optimized applications should be available and if possible, being made available by means of or through BOINC Manager and the correct application which is available for a given task.
ID: 1344832 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1344840 - Posted: 10 Mar 2013, 7:05:22 UTC - in response to Message 1344832.  
Last modified: 10 Mar 2013, 7:13:59 UTC

Hi, Jason!

Right now I am fooling around with some eight 2 TB SATA discs.

Apparently the operating system is a little reluctant at determining whether these discs are SATA, ATA, or even SCSI. As you may know, the Asus motherboards also provides RAID support if necessary.

When it comes to 32 bits vs. 64 bits operating systems (software and possibly firmware) as well as 64 bits processor architecture, 64 bits in my opinion is most likely the best option when it comes to the quality of results, including running times (meaning processing times).

You may possibly know that I am also running PrimeGrid tasks. I am having two accounts there. More than 11 million credits on my primary account and a little more than 1 million credits on my secondary account.

I happen to be neither a software developer nor a systems administrator, but I run the so-called Genefer tasks at PrimeGrid. These tasks may be run by means of both CPU and GPU. The disadvantage of the GPU is a possibly noisy graphics card as well as a very sluggish screen. In return, these tasks may be running some 20 times faster than using the CPU for the same purpose.

The Genefer tasks at PrimeGrid may be calculating either b^524288+1, b^1048576+1, or b^4194304+1, where b is some 3-6 digits. b^2097152+1 apparently is reserved.

One of the options, or possibilities of running these tasks is by means of a 64 bits CUDA application using a 64 bits operating system on a 64 bits processing platform. For Windows, the current options are either Windows XP 64 bits (like Windows XP 64 Enterprise server, which I do not have), Windows Ultimate 7, 64 bits, or Windows 8, 64 bits. Again multi-user or multi-processor systems or architectures (mini-computers or mini-frames) may benefit from this when excluding the other complexities inherited or built-in into such systems.

I have posted a question about a 64 bits GeneferCUDA application a couple of times at the PrimeGrid message boards. I also was able to get the file for this by means of searching the web. Right now that disc lies on my shelf.

Therefore, the so-called "optimized applications" should readily be generally available as long as there is the current thought of belief that the "standard applications" are not fully up to the task. For Seti@home, only the gaussian search is thought to be returning something close to a signal which may be coming from another civilization in space.

In order to be able to obtain shorter processing times, at least the optimized applications should be available and if possible, being made available by means of or through BOINC Manager and the correct application which is available for a given task.


Hi bluestar. Thanks very much for the input, and I agree with pretty much all of it.

You can find the latest 'Public beta' Cuda application Windows builds on the download page at my site at http://jgopt.org. Developed for new Stock Cuda (to be distributed under Boinc), These are considered Public beta 'advanced user' install at this moment, simply because the project hasn't rolled forward to V7 multibeam yet, which has a number of search & precision enhancements addressing some specific concerns you mention.

There are a couple of complications with 64 bit & GPU applications,. the dominant one being a (5-10%) performance penalty related to use of a 64 bit addressing schemes on the GPU itself, noting in particular that on nVidia cards only Fermi or newer have 64 bit capability at all, and support for earlier ones is emulated.

In many cases CPU applications do indeed benefit for Intel/AMD CPU architecture by way of avoiding the 'Windows on Windows' layer - 32 bit to 64 bit overheads. In the case of pure 64 bit Cuda builds though, this pretty much only makes sense if you really need huge amounts of memory. That isn't to say there won't ever be some other advantage for a 64 bit Cuda build (These Titan Kepler 2's will need thorough examination for starters), but does say for the time being it's redundant to bother, at least for this particular application (Many other projects use substantial CPU resources for example, which will be a different picture).

With respect to a disadvantage of 'possibly noisy GPU', I would currently regard that as a result of the technologies still maturing. i.e. In many cases fault tolerant computing methods exist, but are not implemented in current (quite dated) stock applications. Most of the work gone into third party applications so far has been aimed at this 'tightening', some of which work has reflected straight into v7 CPU apps. It'll be interesting to see how things change with v7 introduction, now we have pretty clear pictures where a lot of technical issues lie.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1344840 · Report as offensive
Profile BigWaveSurfer

Send message
Joined: 29 Nov 01
Posts: 186
Credit: 36,311,381
RAC: 141
United States
Message 1345075 - Posted: 10 Mar 2013, 20:04:40 UTC

Well my complaining seemed to work! :) 'Magically' for the 1st time in a while my transfers page is EMPTY! When a WU does show up it downloads to 100% and goes empty again. Ahhh....the way life should be!
ID: 1345075 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : New work download issues


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.