Multi core greater than 80 core

Message boards : Number crunching : Multi core greater than 80 core
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1085938 - Posted: 11 Mar 2011, 2:41:11 UTC - in response to Message 1085936.  
Last modified: 11 Mar 2011, 2:41:50 UTC

Same deal with that version:
3/10/2011 6:34:46 PM | | GPUs have become unusable; disabling tasks

Are you eventually logged in using some sort of remote desktop? The client and apps must have access to the same session the graphics is running in.

Peter
ID: 1085938 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1085941 - Posted: 11 Mar 2011, 2:44:58 UTC - in response to Message 1085936.  
Last modified: 11 Mar 2011, 2:48:06 UTC


"use all coprocessors" will work only after the line "No usable GPUs found" disappears!

Did you try GPU-Z?

Post the contents of cc_config.xml


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1085941 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1085964 - Posted: 11 Mar 2011, 3:47:04 UTC - in response to Message 1085938.  

Same deal with that version:
3/10/2011 6:34:46 PM | | GPUs have become unusable; disabling tasks

Are you eventually logged in using some sort of remote desktop? The client and apps must have access to the same session the graphics is running in.

Peter


The only thing that causes that is RDP in windows. If you are logging in via a remote client use one of the VNC's. I prefer TightVNC runs as fast as RDP in windows. I know some use GoToMyPC and things like that.

Traveling through space at ~67,000mph!
ID: 1085964 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1086038 - Posted: 11 Mar 2011, 10:41:15 UTC
Last modified: 11 Mar 2011, 10:45:01 UTC

I raised the issue of 160 cores on the BOINC Alpha mailing list. It appears that due to the way Windows handles things when it has more than 64 cpus, BOINC doesn't quite work.

To quote David's email on the list:
It turns out Windows' support for >64 cores is funky, as usual. A host's cores are divided into "processor groups" of at most 64 each (presumably because of 64-bit bitmaps for affinity etc.). Calls like GetSystemInfo() return info only about the calling process's processor group.

Supporting > 64 cores on windows will involve:

1) enumerating the processor groups and adding up the
# of cores in each group
(Rom, please look into this)

2) The client will need to keep track of which processor group
each task is running in, and explicitly set the group for new tasks.

In addition, if parallel applications want to use > 64 cores, they will have to do something similar when creating threads.


There was a follow up email suggesting a work around for the GetSystemInfo(), well thats what I interpreted it to be.

No word as to what version of the BOINC client might support this, but at least they know what needs to be done.
BOINC blog
ID: 1086038 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1086043 - Posted: 11 Mar 2011, 11:34:59 UTC - in response to Message 1085964.  

-BeNt- wrote:
Pepo wrote:
Bry B wrote:
3/10/2011 6:34:46 PM | | GPUs have become unusable; disabling tasks

Are you eventually logged in using some sort of remote desktop? The client and apps must have access to the same session the graphics is running in.

The only thing that causes that is RDP in windows. If you are logging in via a remote client use one of the VNC's. I prefer TightVNC runs as fast as RDP in windows. I know some use GoToMyPC and things like that.

RDP is not the only know reason for GPU to become unavailable for BOINC. The same happens when multiple users log in on a desktop, and the one running BOINC has her/his session locked.

I know that mainframes and servers are mostly accessed and used by other means than just a direct cables to graphics card and mouse/keyboard connector. A plain physically connected KVM switch with remote connection over network etc. would possibly not be a problem, but any direct logical connection could possibly behave as some sort of RDP and render the GPUs unusable for BOINC.

MarkJ wrote:
I raised the issue of 160 cores on the BOINC Alpha mailing list. It appears that due to the way Windows handles things when it has more than 64 cpus, BOINC doesn't quite work.

To quote David's email on the list:
It turns out Windows' support for >64 cores is funky, as usual. A host's cores are divided into "processor groups" of at most 64 each (presumably because of 64-bit bitmaps for affinity etc.). Calls like GetSystemInfo() return info only about the calling process's processor group.

Supporting > 64 cores on windows will involve:

2) The client will need to keep track of which processor group
each task is running in, and explicitly set the group for new tasks.

In addition, if parallel applications want to use > 64 cores, they will have to do something similar when creating threads.

There was a follow up email suggesting a work around for the GetSystemInfo(), well thats what I interpreted it to be.

No word as to what version of the BOINC client might support this, but at least they know what needs to be done.

The work-around addressed just obtaining the total number of processors. The question remains, whether the client process will be able to launch task child processes on other processor groups, or will be restricted to its initial processor group, without doing some addidional work with (sort-of) group affinities.

Peter
ID: 1086043 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1086074 - Posted: 11 Mar 2011, 14:59:45 UTC - in response to Message 1086038.  

I raised the issue of 160 cores on the BOINC Alpha mailing list. It appears that due to the way Windows handles things when it has more than 64 cpus, BOINC doesn't quite work.

To quote David's email on the list:
It turns out Windows' support for >64 cores is funky, as usual. A host's cores are divided into "processor groups" of at most 64 each (presumably because of 64-bit bitmaps for affinity etc.). Calls like GetSystemInfo() return info only about the calling process's processor group.

Supporting > 64 cores on windows will involve:

1) enumerating the processor groups and adding up the
# of cores in each group
(Rom, please look into this)

2) The client will need to keep track of which processor group
each task is running in, and explicitly set the group for new tasks.

In addition, if parallel applications want to use > 64 cores, they will have to do something similar when creating threads.


There was a follow up email suggesting a work around for the GetSystemInfo(), well thats what I interpreted it to be.

No word as to what version of the BOINC client might support this, but at least they know what needs to be done.


Have they looked into GetNativeSystemInfo() to see if it has the same limitation they are currently seeing with GetSystemInfo()?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1086074 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1086085 - Posted: 11 Mar 2011, 15:54:55 UTC - in response to Message 1086074.  

Have they looked into GetNativeSystemInfo() to see if it has the same limitation they are currently seeing with GetSystemInfo()?

Should not be necessary for a native 64-bit client:
MSDN wrote:
GetNativeSystemInfo Function
Retrieves information about the current system to an application running under WOW64. If the function is called from a 64-bit application, it is equivalent to the GetSystemInfo function.

And Bryan has installed "client version 6.12.18 for windows_x86_64".

But I'm shortly expecting 6.12.19. Bryan will tell...

Peter
ID: 1086085 · Report as offensive
Bry B

Send message
Joined: 3 Apr 99
Posts: 53
Credit: 832,165
RAC: 0
United States
Message 1086145 - Posted: 11 Mar 2011, 19:19:23 UTC - in response to Message 1086085.  

yes I am RDPing into the system. I will try to log on locally. Another problem is that the app keeps crashing. I have the debugger up but dont have symbols for your app. If anyone wants to look at it and knows how to use KD I can set it up. This could be related to my memory risers which I will be replacing today.
ID: 1086145 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1086196 - Posted: 11 Mar 2011, 22:53:12 UTC - in response to Message 1086145.  

yes I am RDPing into the system. I will try to log on locally. Another problem is that the app keeps crashing. I have the debugger up but dont have symbols for your app. If anyone wants to look at it and knows how to use KD I can set it up. This could be related to my memory risers which I will be replacing today.


Yeah figured that was probably what was going on, that's the most common cause noted here causing for the gpu's to become unusable.
Traveling through space at ~67,000mph!
ID: 1086196 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1086244 - Posted: 12 Mar 2011, 1:01:07 UTC - in response to Message 1086196.  
Last modified: 12 Mar 2011, 1:05:03 UTC

Are all the CPUs/cores, recognised by BOINC Manager, is the one you use, the
newest release?

Do you have one of the Optimized App.s installed or used the V0.37
LUNATICs Unified Installer?

Or just trying to get it all to work, comes first.
ID: 1086244 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103394 - Posted: 5 May 2011, 0:18:25 UTC

Can we dumb this down a little and just give a one-liner on why those of use with 40 physical cores (and 80 threads) may only be seeing 20 in our account's "Computer Information" pages

Example of 40 core machine:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5848928

Thanks :)
ID: 1103394 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1103410 - Posted: 5 May 2011, 2:05:11 UTC - in response to Message 1103394.  

Can we dumb this down a little and just give a one-liner on why those of use with 40 physical cores (and 80 threads) may only be seeing 20 in our account's "Computer Information" pages

Example of 40 core machine:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5848928

Thanks :)

Could you please give the 6.12.26 a try?
(But... sorry, I've just noticed, that the changeset [trac]changeset:23215[/trac] (which was expected to solve this issue) was not yet ported to the 6.12 line)-:

Peter
ID: 1103410 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20265
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103522 - Posted: 5 May 2011, 12:43:17 UTC - in response to Message 1103394.  
Last modified: 5 May 2011, 12:47:08 UTC

Can we dumb this down a little and just give a one-liner on why those of use with 40 physical cores (and 80 threads) may only be seeing 20 in our account's "Computer Information" pages

Example of 40 core machine:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5848928


It is a "quirk"/"feature"/"limit" of how the Windows kernel has been written to group together processing threads to help scheduling.

From earlier in this thread, Dr A is working on or already has put in a fix into Boinc to work around the feature for Windows.

If you were to try for example Linux[*], you would not see the problem (unless that is you had more than 1024[**] processors!). The Linux schedulers do not have such limits to worry about.


Happy fast parallel crunchin',
Martin


[*] And before the Windows fan-boys jump in to supposedly defend their 'very best OS that ever can exist': Yes... If that machine is being used in a corporate Windows-only (locked in) environment, then Linux is 'only good' as a test case. Unless that is, there is going to be a migration over to Linux in any case...

[**] If so, you'd already be using Linux tweaked for whatever number of processors for whatever special system you were running.
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103522 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1103556 - Posted: 5 May 2011, 14:04:01 UTC - in response to Message 1103522.  
Last modified: 5 May 2011, 14:12:58 UTC

Its matter of choice will Windows group running threads on one processor group or not. There are ways for proper enumeration of NUMA nodes and logical processor count per node. BOINC is not group aware application as it seems. It's not Windows "problem".
ID: 1103556 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1103562 - Posted: 5 May 2011, 14:14:44 UTC - in response to Message 1103394.  
Last modified: 5 May 2011, 14:17:20 UTC

Can we dumb this down a little and just give a one-liner on why those of use with 40 physical cores (and 80 threads) may only be seeing 20 in our account's "Computer Information" pages

Example of 40 core machine:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5848928

Thanks :)

The work around is in this post

Claggy
ID: 1103562 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1103595 - Posted: 5 May 2011, 15:14:43 UTC - in response to Message 1103562.  

The work around is in this post

Yes, this works reliably.

But the cleanest way would still be to finally apply both changesets [trac]changeset:23214[/trac]+[trac]changeset:23215[/trac] onto the 6.12 line ;-(

Peter
ID: 1103595 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103616 - Posted: 5 May 2011, 16:25:43 UTC - in response to Message 1103562.  
Last modified: 5 May 2011, 16:30:04 UTC

And where exactly does this magical cc_config.xml file live? I cannot find a single copy of it anywhere on my system. Or is it a file that needs to be created? And if so, where does this file need to be placed?

Edit: For what it's worth...I'm not a developer. I'm a validation engineer in a lab full of Dell R900 and R910 servers.
ID: 1103616 · Report as offensive
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1103622 - Posted: 5 May 2011, 16:39:32 UTC - in response to Message 1103616.  
Last modified: 5 May 2011, 16:39:54 UTC

And where exactly does this magical cc_config.xml file live? I cannot find a single copy of it anywhere on my system. Or is it a file that needs to be created? And if so, where does this file need to be placed?

If you open your BOINC Messages log, at the very beginning (among first few lines) you will see something like

27.04.2011 17:37:34 |  | Data directory: C:\ProgramData\BOINC

This is the folder where your cc_config.xml file lives its magical life.

And yes, it is being manually created and maintained (until maybe the 6.13 line will start to understand the get_cc_config + set_cc_config GUI RPCs).

Peter
ID: 1103622 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103632 - Posted: 5 May 2011, 17:08:59 UTC - in response to Message 1103622.  

I created the xml file and performed an update from the BOINC manager. My account list of computers still shows this system only having 20 cores. Thoughts?
ID: 1103632 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1103646 - Posted: 5 May 2011, 18:24:19 UTC - in response to Message 1103632.  
Last modified: 5 May 2011, 18:42:36 UTC

I created the xml file and performed an update from the BOINC manager. My account list of computers still shows this system only having 20 cores. Thoughts?

Do you get the following when you do the read config file? (or restart Boinc):

05/05/2011 19:15:00 Number of usable CPUs has changed from 2 to 4. Running benchmarks.
05/05/2011 19:15:01 Running CPU benchmarks
05/05/2011 19:15:01 Suspending computation - running CPU benchmarks
05/05/2011 19:15:01 SETI@home Beta Test [cpu_sched] Preempting 07mr11ah.5998.9474.206158430213.14.45_1 (left in memory)
05/05/2011 19:15:01 SETI@home [cpu_sched] Preempting ap_03dc10ab_B4_P0_00237_20110420_24663.wu_1 (left in memory)
05/05/2011 19:15:01 SETI@home Beta Test [cpu_sched] Preempting 07mr11ah.5998.9474.206158430213.14.219_0 (left in memory)
05/05/2011 19:15:01 SETI@home [cpu_sched] Preempting 18no10ac.10985.18472.6.10.41_0 (left in memory)
05/05/2011 19:15:33 Benchmark results:
05/05/2011 19:15:33 Number of CPUs: 4
05/05/2011 19:15:33 4021 floating point MIPS (Whetstone) per CPU
05/05/2011 19:15:33 11939 integer MIPS (Dhrystone) per CPU
05/05/2011 19:15:33 [dcf] scaling all duration correction factors by 0.973263
05/05/2011 19:15:33 SETI@home Finished download of 11mr07ac.26889.154854.6.10.207
05/05/2011 19:15:33 Resuming computation

and does Boinc start 20 tasks?

(Info on configuring Boinc is on the Client configuration page of the Boinc Wiki)

Claggy
ID: 1103646 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : Multi core greater than 80 core


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.