Someone please explain this

Message boards : Number crunching : Someone please explain this
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845516 - Posted: 27 Dec 2008, 2:28:24 UTC - in response to Message 845496.  
Last modified: 27 Dec 2008, 2:32:55 UTC

If and when the get it arranged so both the Cuda card AND the CPU could crunch Seti at the same time, you might bring down a lotta rigs....

Especially some that do not get regular attention such at cleaning the dust out....now all of a sudden you've got a full CPU load PLUS the vid card churning away full bore. Extra heat in the case that may not be handled properly and start pushing things over the edge. CPU and GPU alike....

We're already doing CUDA and opti AP and I run Einstein in there as well. Adding cpu tasks hasn't caused any additionl heat on my system. I'd think that anyone running an opti cpu task would be more likely to have problems with temps because of poor pc maintenance, than from the gpu.
ID: 845516 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 845582 - Posted: 27 Dec 2008, 6:56:32 UTC
Last modified: 27 Dec 2008, 6:58:30 UTC

I'm going to agree that CUDA should be opt-in, but AP should not.

My reasons for taking this stance is that if you are part of SETI@Home, you are willing to donate your CPU to crunch needed data, and in my opinion, it shouldn't matter if its an app called SETI or AstroPulse, and it shouldn't matter as to the length of the workunit.

But CUDA is different. Many video cards are built by third parties and many do not have sufficient cooling, or are sold pre-overclocked. Machines are already overheating with faster CPUs and now GPUs too. Some video cards are also designed to be "dual slotters" (a wide card with the cooling taking up the slot next to the video card slot) just to get the proper cooling! It can be dangerous, and overheating the graphics card is much more likely than the CPU, especially since the CPUs have been built for the last 5+ years with overheat protection built-in, a feature GPUs do not have as of yet. Because of this, potentially destroying a participant's component can't be good PR for the project and can/will result in many complaints. I think a warning should also be included indicating the potential problems that may arise from using one's video card to do crunching so that people are better informed.


I'm sure many will disagree with me, but I don't think the name of the app should matter at all, but what part of the computer a participant is willing to donate does matter.
ID: 845582 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 845617 - Posted: 27 Dec 2008, 10:09:06 UTC - in response to Message 845582.  
Last modified: 27 Dec 2008, 10:10:40 UTC

My reasons for taking this stance is that if you are part of SETI@Home, you are willing to donate your CPU to crunch needed data, and in my opinion, it shouldn't matter if its an app called SETI or AstroPulse, and it shouldn't matter as to the length of the workunit.


That I basically agree with, but my opposition to AP being defaulted to yes is because of the short deadline.

I have two hosts that complete 45cr MB in ~45-50hrs, an AP on these would probably take 1100-1200hrs(45-50days) There is no possible way they could finish an AP within the time allowed, so they would probably have wasted all that crunch time.

If the deadline was 3 months instead of 1 month my opinion would change.
ID: 845617 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 845629 - Posted: 27 Dec 2008, 11:23:20 UTC - in response to Message 845471.  

A set and forget user might have a passively cooled CUDA card (the only one I have is) and not notice until it's too late that the CUDA app has over-heated it...

-Dave

If the system is cooled correctly, a user shouldn't have to notice the card is overheating. All GPUs come with a VBIOS that checks the GPU's temperature. The drivers in combination with the VBIOS know what is the best temperature for this sort of GPU to run on, knows its minimum and maximum. If the maximum is reached, the GPU will change the fan speed (on active cooling) or clock down its clocks/voltage (on passive cooling) fully automatically.

Not everyone has adequate cooling and venting in the rest of the computer. That's something you really need when using passively cooled cards.

Now when comparing Folding@Home with Seti's use of the GPU, FAH does calculations in bursts. It'll use the GPU for a couple of minutes, then back off and use the CPU. SAH will use the GPU continuously at full load. The work does come in bursts, but the time between the bursts is very short, not enough to cool the GPU down.
ID: 845629 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 845653 - Posted: 27 Dec 2008, 13:01:18 UTC - in response to Message 845629.  

A set and forget user might have a passively cooled CUDA card (the only one I have is) and not notice until it's too late that the CUDA app has over-heated it...

-Dave

If the system is cooled correctly, a user shouldn't have to notice the card is overheating. All GPUs come with a VBIOS that checks the GPU's temperature. The drivers in combination with the VBIOS know what is the best temperature for this sort of GPU to run on, knows its minimum and maximum. If the maximum is reached, the GPU will change the fan speed (on active cooling) or clock down its clocks/voltage (on passive cooling) fully automatically.

Not everyone has adequate cooling and venting in the rest of the computer. That's something you really need when using passively cooled cards.

Now when comparing Folding@Home with Seti's use of the GPU, FAH does calculations in bursts. It'll use the GPU for a couple of minutes, then back off and use the CPU. SAH will use the GPU continuously at full load. The work does come in bursts, but the time between the bursts is very short, not enough to cool the GPU down.


The thing that is easy to overlook here is where the heat goes. I have 2 8800 GTS cards running Folding@Home and it keeps one card @ 81C and the other (with the partially blocked air intake) @ 89C. The problem is all the heat is not being blown out the back, everything inside the case heated up until I cooked 2 sticks of ram. (This unit is now running again with new ram and the case open and on it's side so the heat goes up into the air instead of up into the ram).

Moral of the story- Keep an eye on the case temp along with the GPU temps.

That being said, I agree that the GPU option should default to "OFF". The system needs to be checked and monitored after turning it on.




Boinc Button Abuser In Training >My Shrubbers<
ID: 845653 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 845658 - Posted: 27 Dec 2008, 13:26:23 UTC - in response to Message 845653.  

The thing that is easy to overlook here is where the heat goes.

I agree with that. Any obstructions, such as other cards, RAM, cables, the CPU, heck even the case siding itself, should be checked. The (very very very long!) Nvidia card my girlfriend has in her PC blew directly on her hard drives. That's a good way to get the strangest of BSODs. ;-)

They're trying to make everything smaller these days, but for these cards, it seems. So the you have a medium sized case, with a mini-ATX board and you need to put in a card that's about the length that your case is wide... Unless you saw part of the case out so the card can blow its exhaust directly out the front, this is never going to work. (Nothing said about how to put the molex connectors in ;-))

Then there are cards that blow the hot air out the back, through a hole in the backplate. Give it enough time and it'll melt the VGA connector in place. ;-)

So don't wonder why I have no sides on my PC. It's the bare-bone inner casing holding the mobo and PSU, but that's all.
ID: 845658 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 845683 - Posted: 27 Dec 2008, 15:52:38 UTC - in response to Message 845617.  

My reasons for taking this stance is that if you are part of SETI@Home, you are willing to donate your CPU to crunch needed data, and in my opinion, it shouldn't matter if its an app called SETI or AstroPulse, and it shouldn't matter as to the length of the workunit.


That I basically agree with, but my opposition to AP being defaulted to yes is because of the short deadline.

I have two hosts that complete 45cr MB in ~45-50hrs, an AP on these would probably take 1100-1200hrs(45-50days) There is no possible way they could finish an AP within the time allowed, so they would probably have wasted all that crunch time.

If the deadline was 3 months instead of 1 month my opinion would change.


Theoretically, computers that do not meet the minimum specs for AP should not receive AP workunits. I believe the minimum specs is a 1.6GHz processor, so your 500MHz and your 533MHz machines should not get getting AP at all. As to why they are receiving AP in the first place, I don't know why.
ID: 845683 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 845714 - Posted: 27 Dec 2008, 18:10:32 UTC - in response to Message 845629.  


If the system is cooled correctly, a user shouldn't have to notice the card is overheating. All GPUs come with a VBIOS that checks the GPU's temperature. The drivers in combination with the VBIOS know what is the best temperature for this sort of GPU to run on, knows its minimum and maximum. If the maximum is reached, the GPU will change the fan speed (on active cooling) or clock down its clocks/voltage (on passive cooling) fully automatically.

Not everyone has adequate cooling and venting in the rest of the computer. That's something you really need when using passively cooled cards.



Yes, but I'm not talking about the active Seti participant who frequents the boards here and knows all about the CUDA app, I'm talking about the set and forget type who may NOT have a properly cooled system and doesn't know that the CUDA app has suddenly started overheating the whole thing. Maybe the system IS adequately cooled for what that person normally does, but that doesn't include running a GPU based science application. That's why it should be opt-in, IMO.

-Dave
ID: 845714 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 845728 - Posted: 27 Dec 2008, 18:43:50 UTC - in response to Message 845714.  
Last modified: 27 Dec 2008, 19:00:59 UTC

I'm talking about the set and forget type

Technically, the set it and forget it type won't have a problem as they never updated to a BOINC version that can do CUDA.

They'll still be on a version 4.45 that get complaints for other reasons. ;-)

To get the warning that there's a newer recommended BOINC out, you'd need in the least 5.10.45 and read your messages every once in a while.

To update to 6.4.5, which does do CUDA, you'd have to see the warning about it on the download page, or see trouble arising when you installed it. It'll also only work correctly if you have the right drivers for the GPU installed.

So tell me, when you're a set it and forget it type, do you do all the above without noticing anything strange about your system? But in all honesty... even if they managed to do all that and set it to forget it, they'd be airheads to just install a new BOINC version and not check the running of it every once in a while for probable problems. ;-)

Although you probably have a number of these people around, I'm thinking their number is negligible.
ID: 845728 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 845736 - Posted: 27 Dec 2008, 19:03:55 UTC - in response to Message 845728.  
Last modified: 27 Dec 2008, 19:07:56 UTC

I'm talking about the set and forget type

Technically, the set it and forget it type won't have a problem as they never updated to a BOINC version that can do CUDA.

They'll still be on a version 4.45 that get complaints for other reasons. ;-)

To get the warning that there's a newer recommended BOINC out, you'd need in the least 5.10.45 and read your messages every once in a while.

To update to 6.4.5, which does do CUDA, you'd have to see the warning about it on the download page, or see trouble arising when you installed it. It'll also only work correctly if you have the right drivers for the GPU installed.

So tell me, when you're a set it and forget it type, do you do all the above without noticing anything strange about your system? Aren't you an air-head then? :-)


Actually, I think all the 5.10.x's will tell you about available upgrades. Most of mine are 5.10.13, and are always reminding me that there are newer versions available and I should hurry over and get them (which I choose to ignore). ;-)

As far as whether it is wise to treat any computer as a 'set and forget' appliance like a toaster, can opener, or refrigerator (if they even are) isn't really the point. :-)

As Ozz said, enabling AP to run by default is at least tolerable, since the work being done by it is within the well known scope of the project.

Enabling CUDA by default is 'squatting' on hardware they haven't asked permission to use first.

Please note that they didn't bother updating the Rules and Policies before rolling it out to reflect the fact that the software is going to use the GPU if available. This makes the default on setting completely unsat.

You want to talk about a nightmare scenario. Consider the great PR the project will get if (heaven forbid) someone does experience a catastrophic system failure from this ill advised policy.

Alinator
ID: 845736 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 845743 - Posted: 27 Dec 2008, 19:21:49 UTC - in response to Message 845736.  
Last modified: 27 Dec 2008, 19:22:22 UTC

Let's hold the poll then:

David Anderson wrote:
This will be fixed by the latest version of the app, which falls back to CPU execution if GPU memory allocation fails.

This version has been in beta for a while, but for some reason wasn't put in sah.
I asked Jeff to do it; he says he'll do it Monday.

This does raise the questions:

- Should "use GPU?" be a global (rather than per-project) preference?
- Should it default to No?


I have already answered that by my knowledge the only projects using applications for both GPU and CPU are Seti and Seti Beta. GPUGrid only uses a GPU app. Einstein hasn't released its application yet. I know some others have one in development, but how far are they? So No on the first question, for now.

Yes on the second. :-)
ID: 845743 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 845767 - Posted: 27 Dec 2008, 21:33:33 UTC - in response to Message 845743.  

Let's hold the poll then:

David Anderson wrote:
This will be fixed by the latest version of the app, which falls back to CPU execution if GPU memory allocation fails.

This version has been in beta for a while, but for some reason wasn't put in sah.
I asked Jeff to do it; he says he'll do it Monday.

This does raise the questions:

- Should "use GPU?" be a global (rather than per-project) preference?
- Should it default to No?


I have already answered that by my knowledge the only projects using applications for both GPU and CPU are Seti and Seti Beta. GPUGrid only uses a GPU app. Einstein hasn't released its application yet. I know some others have one in development, but how far are they? So No on the first question, for now.

Yes on the second. :-)


Agreed.

#1 No, but I don't think it should ever be more than a project only preference.

#2 Yes.

Alinator

ID: 845767 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 845815 - Posted: 28 Dec 2008, 0:39:00 UTC - in response to Message 845683.  
Last modified: 28 Dec 2008, 0:46:39 UTC

Theoretically, computers that do not meet the minimum specs for AP should not receive AP workunits. I believe the minimum specs is a 1.6GHz processor, so your 500MHz and your 533MHz machines should not get getting AP at all. As to why they are receiving AP in the first place, I don't know why.


They dont, I chose a poor way of explaining.

But my machines DO run 24/7 @ 100% CPU. A min spec machine that effectively runs less than 16 hrs/day @ 100% or 24/7 @ less than 65% would still be pushing to meet the deadline.

The crux of my point is that machines that are issued AP's are expected to use about 20x the crunching power to meet deadline as for MB (ie a 2000% increase in output).

The ratio of Crunching hrs to Allocated time should not have changed or the change should be opt-in only.

[Edit to correct figures]
ID: 845815 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 845818 - Posted: 28 Dec 2008, 0:47:25 UTC - in response to Message 845815.  
Last modified: 28 Dec 2008, 0:52:40 UTC

Theoretically, computers that do not meet the minimum specs for AP should not receive AP workunits. I believe the minimum specs is a 1.6GHz processor, so your 500MHz and your 533MHz machines should not get getting AP at all. As to why they are receiving AP in the first place, I don't know why.


They dont, I chose a poor way of explaining.

But my machines DO run 24/7 @ 100% CPU. A min spec machine that effectively runs less than 16 hrs/day @ 100% or 24/7 @ less than 65% would still be pushing to meet the deadline.

The crux of my point is that machines that are issued AP's are expected to use about 3x the crunching power to meet deadline as for MB (ie a 300% increase in output).

The ratio of Crunching hrs to Allocated time should not have changed.


Having larger workunits is a good thing though. Again I refer to the original SETI tasks that took a week or more for people to crunch just a single WU, and the servers weren't as overloaded with work requests as they are now. Having larger workunits allows for hosts to stay busy longer without returning for work as often. If anything, a longer deadline would be necessary, but then people complain about long deadlines and they have to wait forever to get their credit, especially if several people don't return them on time - it could be a couple months before they see their credit!

What messes everything up is that most computers can do SETI MB rather efficiently, and the Task Duration Correction Factor is geared toward MB. Then you have AP come along that takes much longer, and the TDCF (or DCF for short) is suddenly out of whack. So a computer for which the BOINC scheduler thinks it can complete a AP task within the time allowed downloads a workunit only to find that it probably can't finish, but the work was downloaded and BOINC will try to finish it anyway.

If it were possible, there should have been a separate DCF for MB and AP so that BOINC's scheduler doesn't think the computer is more efficient than it really is.

... but I would gather that this is an exception and not the rule, so in my opinion, I still don't think allowing AP by default is wrong. Even in the worse case scenario, where a computer doesn't finish on time, the task will simply be sent to another machine so no science is lost. BOINC has this fail safe built in, so it isn't as big an issue as it would seem.
ID: 845818 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 845829 - Posted: 28 Dec 2008, 1:29:49 UTC - in response to Message 845818.  

Even in the worse case scenario, where a computer doesn't finish on time, the task will simply be sent to another machine so no science is lost. BOINC has this fail safe built in, so it isn't as big an issue as it would seem.


You are correct that it is not a big issue for the project (As you point out - no science is lost).

However I beleive it IS a big issue for the cruncher, who has just spent a whole month donating his crunchtime, just to have it tipped down the drain because your machine is capable of doing this amount of work even if you dont wish to donate that much.

BTW Did you notice the correction to the figures of my last post. According to my calcs AP requires 20-30 x work/time depending on host.
ID: 845829 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 845832 - Posted: 28 Dec 2008, 1:58:15 UTC - in response to Message 845818.  

What messes everything up is that most computers can do SETI MB rather efficiently, and the Task Duration Correction Factor is geared toward MB. Then you have AP come along that takes much longer, and the TDCF (or DCF for short) is suddenly out of whack. So a computer for which the BOINC scheduler thinks it can complete a AP task within the time allowed downloads a workunit only to find that it probably can't finish, but the work was downloaded and BOINC will try to finish it anyway.

If it were possible, there should have been a separate DCF for MB and AP so that BOINC's scheduler doesn't think the computer is more efficient than it really is.


I understand Einstein are also looking at a 2nd app (PALFA search, something along the lines of AP), so even more reason why BOINC should maintain a seperate DCF for each app within a project. It would also help in the case of CUDA and non-CUDA apps on the same box. Eg if I had Seti MB 6.03 (cpu), Seti MB 6.05 (gpu) and Seti AP 5.00 (cpu). I'll go have a look and see if anyone has raise a Trak ticket regarding this.
BOINC blog
ID: 845832 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 845838 - Posted: 28 Dec 2008, 2:46:36 UTC - in response to Message 845728.  

Ageless wrote:
...
To update to 6.4.5, which does do CUDA, you'd have to see the warning about it on the download page...

There is no warning on the download page. There is this:
Note: if your computer is equipped with an NVIDIA Graphics Processing Unit (GPU), you may be able to use it to compute faster.

A skeptical user may be able to see a warning in that, most won't.
                                                                 Joe
ID: 845838 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 845841 - Posted: 28 Dec 2008, 2:57:10 UTC - in response to Message 845832.  

I'll go have a look and see if anyone has raise a Trak ticket regarding this.


Doesn't look like anyone has asked for it so I have raise Trak ticket 812.
BOINC blog
ID: 845841 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 845859 - Posted: 28 Dec 2008, 4:14:59 UTC - in response to Message 845829.  

Even in the worse case scenario, where a computer doesn't finish on time, the task will simply be sent to another machine so no science is lost. BOINC has this fail safe built in, so it isn't as big an issue as it would seem.


You are correct that it is not a big issue for the project (As you point out - no science is lost).

However I beleive it IS a big issue for the cruncher, who has just spent a whole month donating his crunchtime, just to have it tipped down the drain because your machine is capable of doing this amount of work even if you dont wish to donate that much.


Well, its not necessarily down the drain. As long as the host returns the WU before the other person does (with a new one month deadline - not that it will necessarily be a month before they turn it in), then the cruncher still gets credit for it.

But I get what you're saying. However, wasn't that the same case for SETI Classic? When I first joined up with Classic, I did not realize how long it was going to take for one of my machines, an AMD K6 166 to finish a single workunit, so I started leaving it on all the time. Though there was no deadline for the workunit, for all I knew the workunit could have already been crunched and I was providing busy work for the project because they could have ran dry during certain periods, all to increase my stats. This would be just as wasteful of electricity.

Its great that so many people are trying to be more "green" these days, but I can't quite seem to fathom the (IMO) excessive concern over infrequent and unfortunate events. Surely there have been far greater wastes of resources in our entire human history that we shouldn't get too upset over something like this?

But that's just my viewpoint on the subject.

BTW Did you notice the correction to the figures of my last post. According to my calcs AP requires 20-30 x work/time depending on host.


Yes, I did.
ID: 845859 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19714
Credit: 40,757,560
RAC: 67
United Kingdom
Message 845870 - Posted: 28 Dec 2008, 5:11:45 UTC - in response to Message 845841.  

I'll go have a look and see if anyone has raise a Trak ticket regarding this.


Doesn't look like anyone has asked for it so I have raise Trak ticket 812.

I see there is a response to your requset:
12/27/08 19:17:03 changed by Nicolas ¶

I agree. But note David Anderson wrote in a different ticket:

DCF is a kludge to compensate for bad FLOP estimates by projects. I don't want to make the kludge even more complicated.


Which makes me wonder if Dr.A is actually on this planet.
When testing AP on Beta site before its release here, I noticed that Pappa's X2 6000+ and my Q6600 had similar benchmarks. From that Dr. A would assume that performance on AP tasks was similar, when it fact my computer was over twice as fast as Pappa's. Therefore with Dr.A's totally meaningless benchmarks in the BOINC client, the DCF is not a kludge but an absolute necessity.

ID: 845870 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Someone please explain this


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.