Multi core greater than 80 core

Message boards : Number crunching : Multi core greater than 80 core
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Pepo
Volunteer tester
Avatar

Send message
Joined: 5 Aug 99
Posts: 308
Credit: 418,019
RAC: 0
Slovakia
Message 1103670 - Posted: 5 May 2011, 19:54:01 UTC - in response to Message 1103632.  

I created the xml file and performed an update from the BOINC manager. My account list of computers still shows this system only having 20 cores. Thoughts?

To let you understand:

  • BOINC will not recognize all cores unless e.g. the changesets [trac]changeset:23214[/trac]+[trac]changeset:23215[/trac] (or something similar) will be applied on the client code.
  • The cc_config.xml <ncpus> workaround will just force your client to behave "as if it were aware of having that much cores available" and run the desired number of tasks.


Peter

ID: 1103670 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103683 - Posted: 5 May 2011, 20:16:46 UTC - in response to Message 1103670.  
Last modified: 5 May 2011, 20:23:30 UTC

The last message I get when I read the config file is:

"Missing start tag in cc_config.xml"

Directions so far have only been to create this file, put in in the appropriate folder and add the following to it:

<options>
<ncpus>n</ncpus>
</options>

(Where n=num cores)

What else am I missing?

Edit: When BOINC starts I am now getting >16 tasks starting. I would have to count them all to see exactly how many. Again, if all of these tweaks are outside the normal use of BOINC, I have little to no familiarity with the code or use thereof. I am not a BOINC power user despite my many years of use :)
ID: 1103683 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103686 - Posted: 5 May 2011, 20:25:44 UTC
Last modified: 5 May 2011, 20:27:46 UTC

How does Todd Hebert have 64 processors on this system showing?

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5480331

Edit: Hmm, just checked my account again and it now shows for my 40 core machine 60 processors...man, this is some goofy stuff...
ID: 1103686 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1103691 - Posted: 5 May 2011, 20:37:55 UTC - in response to Message 1103683.  

The last message I get when I read the config file is:

"Missing start tag in cc_config.xml"

Directions so far have only been to create this file, put in in the appropriate folder and add the following to it:

<options>
<ncpus>n</ncpus>
</options>

(Where n=num cores)

What else am I missing?

Edit: When BOINC starts I am now getting >16 tasks starting. I would have to count them all to see exactly how many. Again, if all of these tweaks are outside the normal use of BOINC, I have little to no familiarity with the code or use thereof. I am not a BOINC power user despite my many years of use :)

Sorry, the full cc_config.xml should be:

<cc_config>
<options>
<ncpus>n</ncpus>
</options>
</cc_config>

Claggy
ID: 1103691 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1103776 - Posted: 6 May 2011, 2:24:05 UTC - in response to Message 1103686.  

Directions so far have only been to create this file, put in in the appropriate folder and add the following to it:

<options>
<ncpus>n</ncpus>
</options>

If you had looked at the link already given to you by Claggy "(Info on configuring Boinc is on the Client configuration page of the Boinc Wiki)":
http://boinc.berkeley.edu/wiki/Client_configuration

... you would know that the correct format of the cc_config.xml file is:
<cc_config>
   <log_flags>
       [ ... ]
   </log_flags>
   <options>
       [ ... ]
   </options>
</cc_config>


How does Todd Hebert have 64 processors on this system showing?
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5480331

Edit: Hmm, just checked my account again and it now shows for my 40 core machine 60 processors...man, this is some goofy stuff...

Read this:
http://setiathome.berkeley.edu/forum_thread.php?id=63416&nowrap=true#1085425

Since the current BOINC version uses GetSystemInfo() function BOINC can see up to 64 logical processors (that is what Windows reports thru GetSystemInfo()).
(if >64 CPUs you have to tell/force BOINC to use all of them by <ncpus>n</ncpus>)

You may have "40 core machine" but it is obviously with Hyper-Threading enabled
so you have 80 logical processors (use <ncpus>80</ncpus>) recognized by Windows (you can see this in Windows Task Manager).

This "goofy stuff" is created by Microsoft - they decided to divide CPUs in groups of up to 64 CPUs
- if you have exactly 64 logical CPUs (or less) they will be in one group #0 and will all be seen by BOINC (using GetSystemInfo()) (the Todd Hebert's case - he has exactly 64 logical CPUs)
- if you have e.g. 80 logical CPUs they will be in two groups #0 #1 and current BOINC will see just one of the groups (using GetSystemInfo()) - the group it is in.

How Windows decides the "cut" point to divide the 80 logical CPUs in two groups I don't know but it seems in your case it divides 60+20.

So Windows puts BOINC in one of the CPU groups and BOINC sees sometimes 60, sometimes 20 CPUs.

Using <ncpus>80</ncpus> you tell BOINC: no matter what you see I want you to act as if there are 80 CPUs.

(You can use <ncpus>80</ncpus> even on one-core old CPU and BOINC will start 80 SETI tasks if there is enough RAM (but there is no advantage in doing so))


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1103776 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103844 - Posted: 6 May 2011, 12:25:18 UTC - in response to Message 1103556.  

[OS defence distraction]


Its matter of choice will Windows group running threads on one processor group or not. There are ways for proper enumeration of NUMA nodes and logical processor count per node. BOINC is not group aware application as it seems. It's not Windows "problem".


However you wish to describe it, whether a 'feature' or a 'bug', that feature appears to be unique to Windows and appears to be needed for the Windows processor scheduler. I suspect it is 'needed' to avoid the process scheduler itself gobbling up too much CPU time to decide what process-on-which-processor to schedule next.

In contrast for example, Linux has a choice of schedulers that efficiently schedule without suffering any slowdowns with very large numbers of processors or processes.


It is your choice as to whether you consider that worse/better or merely just 'different'. However, as a developer I appreciate not having to worry about scheduler process/processor groups (in Linux they don't exist!) for developing on large Linux systems. Multi-threaded applications just spawn whatever threads are needed and all CPU resources are automagically available. Easy. Meanwhile in contrast, there is a 'development bump' while Boinc must be specially programmed to cater for the 'special' Windows requirements for fully using systems with more than 64 processor threads.

For whatever operating system, the OS itself should be developed to make application development easy. There's a very good quote about how a good OS should never be noticed... (It should silently, easily, just helpfully simply 'work'.)


[/OS defence distraction]


Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103844 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1103851 - Posted: 6 May 2011, 12:41:37 UTC
Last modified: 6 May 2011, 12:46:08 UTC

Can you prove that Windows scheduler is "suffering" compared to Linux, if there were no processor groups implemented?
Or vice versa, can u prove Linux scheduler won't benefit if there were processor groups in Linux, thus making this current flat model best?

As from programmers point of view, I like to have options. In fact, your denial of this is contradiction to whole "Linux" idea.
ID: 1103851 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1103860 - Posted: 6 May 2011, 13:10:38 UTC - in response to Message 1103686.  

How does Todd Hebert have 64 processors on this system showing?

http://setiathome.berkeley.edu/show_host_detail.php?hostid=5480331

Edit: Hmm, just checked my account again and it now shows for my 40 core machine 60 processors...man, this is some goofy stuff...

When you get near limits wackiness normally ensues. :) Is the 40 core machine 80 logical processors? I'm guessing quad 10 core CPUs with HT. It seems quah 8 cores with HT isn't to hard, but maybe these 10 core beasts are a different animal all together.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1103860 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103906 - Posted: 6 May 2011, 16:11:42 UTC - in response to Message 1103860.  

Yeah. The Dell R910 can run with the new Westmere 10 core procs. Of course, you can also adjust how many cores you want to present to the OS in BIOS if power management is a concern. With BOINC this is not the issue. I want this thing to run full bore...40 cores, 80 logical with SMT enabled.

P.S. I did read Claggy's previous postings, but didn't make the connection to the xml edits as those were related to a different query from another forum member. Thanks again. All is good now :)
ID: 1103906 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103907 - Posted: 6 May 2011, 16:16:33 UTC

Ok, next thing I am noticing now is...

I now have 80 tasks running. However, I am only getting ~ 36% CPU usage overall according to Task Mangler. And when I look at the tasks in BOINC Mangler, all the astropulse ones are not running...they are just stopped at various percentages of completion. What gives?
ID: 1103907 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1103908 - Posted: 6 May 2011, 16:33:55 UTC - in response to Message 1103907.  

Ok, next thing I am noticing now is...

I now have 80 tasks running. However, I am only getting ~ 36% CPU usage overall according to Task Mangler. And when I look at the tasks in BOINC Mangler, all the astropulse ones are not running...they are just stopped at various percentages of completion. What gives?

What is the Status for these "stopped" tasks? If they are "waiting to run" then it might be ok. If they are "running" something weird might be going on.

With 36% that would only be about 28/80 CPUs in use. With 80 tasks running you might be hitting a bottleneck on the machine. Such as disk or memory running to far behind to keep up with all the CPU work. I'm not sure that would be the issue, but you could try reducing the number of running tasks to see if that helps. If you ran only 40 tasks & achieved 50% CPU load I think that would be a clean indication something is slowing things down. I'm sure using the built in tools to watch the memory and disk load could be done as well, but I don't always have faith in those for troubleshooting.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1103908 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103916 - Posted: 6 May 2011, 17:08:53 UTC - in response to Message 1103851.  
Last modified: 6 May 2011, 17:09:38 UTC

Can you prove that Windows scheduler is "suffering" compared to Linux, if there were no processor groups implemented?
Or vice versa, can u prove Linux scheduler won't benefit if there were processor groups in Linux, thus making this current flat model best?

As from programmers point of view, I like to have options. In fact, your denial of this is contradiction to whole "Linux" idea.

I take it that you are not a programmer or developer?

Very briefly:

Program for the simple case and on Linux, it simply works. On Windows you are limited by whatever arbitrary processor grouping is imposed upon you by the OS. On Windows, you must know about and explicitly go to extra work to include programming code to 'help' Windows make use of all available processors.

Hence this thread exists and a couple of others and there is extra development effort to modify the Boinc code uniquely for Windows for the Windows processor grouping. That's quite a lot of debug/reading/effort just to get Boinc to run with all available processors on Windows.


On Windows, the time the processor scheduler takes to schedule a process increases as you add more processes and processors. Add too many and your system slows to a crawl while the scheduler works to decide what to do next. Also, the "64 processors" max for a group may well be a hard-coded limit somewhere in the Windows code where single bit flags are used to enumerate a processor in a 64-bit word. Dividing the scheduler problem down into multiple groups of 64 is a good and quick cheaply made trade-off for Microsoft to keep things running. However, that is at the great expense of ALL Windows developers who must now specially program around that group limit for multi-threaded or multi-process applications.

Linux has schedulers that take a fixed time to schedule regardless of the number of processes and processors to schedule. There are no arbitrary limits for how many CPUs you can use or any worries about having to break your code down into never using more than a few CPUs within such as the Windows processor groups.


Note that the Windows processor groups are not 'optional'. You must program around them if you are to use many CPUs in parallel.

Hence all the time and effort in this thread and elsewhere!

Happy fast crunchin',
Martin


For anyone interested, an historical selection of Linux schedulers is:


O(n) scheduler

... If the number of processes is big, the scheduler may use notable amount of the processor time itself. Picking the next task to run requires iteration through all currently planned tasks...

Old and long since superseded. I suspect that Windows uses a variation of this or even a variation of just running through a circular queue. Anyone here know?


O(1) scheduler

... schedule processes within a constant amount of time, regardless of how many processes are running on the operating system. ...


Completely Fair Scheduler

... implementation of "fair scheduling" named Rotating Staircase Deadline, inspired Ingo Molnár to develop his CFS, as a replacement for the earlier O(1) scheduler...


And if you do have a 'simple' case such as for low power mobile devices or even home/office use desktops:

Brain F*** Scheduler

... BFS has been reported to improve responsiveness on light-NUMA (non-uniform memory access) Linux mobile devices and desktop computers with fewer than 16 cores. ...


You have a choice to use any of those as you wish. Or even to modify them for your own special cases, but then again there is many years of experience and development gone into those schedulers so that they work well on everything from mobile phones all the way up to supercomputers. Impressive stuff!

Hope of interest.
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103916 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103919 - Posted: 6 May 2011, 17:25:12 UTC - in response to Message 1103907.  

Ok, next thing I am noticing now is...

I now have 80 tasks running. However, I am only getting ~ 36% CPU usage overall according to Task Mangler. ... What gives?


How much system RAM does that machine have and how much is utilised? Is it suffering a lot of memory swapping to disk (paging/thrashing)?


Happy fast crunchin',
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103919 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1103929 - Posted: 6 May 2011, 18:07:23 UTC
Last modified: 6 May 2011, 18:20:09 UTC

I take it that you are not a programmer or developer?

I am.


Program for the simple case and on Linux, it simply works. On Windows you are limited by whatever arbitrary processor grouping is imposed upon you by the OS. On Windows, you must know about and explicitly go to extra work to include programming code to 'help' Windows make use of all available processors.


I'm afraid u have missed to read the thread. Above are given shots with all the CPUs loaded without any need to modify "not processor grouping"-aware code of BOINC. It runs the processes, Windows put them on all CPUs.


Hence this thread exists and a couple of others and there is extra development effort to modify the Boinc code uniquely for Windows for the Windows processor grouping. That's quite a lot of debug/reading/effort just to get Boinc to run with all available processors on Windows.

The problem is that BOINC does not get processor count properly, and just this. The code has been given above, single function call, which would resolve it.


On Windows, the time the processor scheduler takes to schedule a process increases as you add more processes and processors. Add too many and your system slows to a crawl while the scheduler works to decide what to do next.

It might be true for any scheduler. I'll read these links later to see what improvements those schedulers offer and on what price.


Also, the "64 processors" max for a group may well be a hard-coded limit somewhere in the Windows code where single bit flags are used to enumerate a processor in a 64-bit word. Dividing the scheduler problem down into multiple groups of 64 is a good and quick cheaply made trade-off for Microsoft to keep things running.

It makes perfect sense to me to keep(to have an option to keep) my processes "affinitized" over physically close processor and using same memory bus, on same node. It will make any necessary synchronization faster. In case of BOINC where number of mutually independent processes are running, this is not an issue.


However, that is at the great expense of ALL Windows developers who must now specially program around that group limit for multi-threaded or multi-process applications.

It doesn't seem great expense. In fact, the only effort needed is when u want to set process/thread affinities to certain group(which I explained why its logical and beneficial). Those efforts might be complex, but u can perfectly run without them - u dont need to invent logic to circumvent nodes.


Linux has schedulers that take a fixed time to schedule regardless of the number of processes and processors to schedule.

Which seems not so clever to me, having in mind I/O thread, suspended threads, thread is wait state, prioritization of tasks, GUI thread(foreground, background, media intevsive) IRQL levels etc.


There are no arbitrary limits for how many CPUs you can use or any worries about having to break your code down into never using more than a few CPUs within such as the Windows processor groups.

As we explained above before, BOINC runs 160 processes without any extra logic and they scale on all processors.


Note that the Windows processor groups are not 'optional'. You must program around them if you are to use many CPUs in parallel.

They are optional - u need to bother with them only if it is needed to set processes proximity and affinity. And grouping the treads may be from complex task(which may pay for the effort) to something trivial, like converting 32-bit image to paletted.
ID: 1103929 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103938 - Posted: 6 May 2011, 18:34:58 UTC - in response to Message 1103919.  
Last modified: 6 May 2011, 18:43:49 UTC

64GB RAM, 57GB available according to Task Mangler. 6.5GB paging.

It seems like when Multibeam is running, Astropulse is not...and perhaps vice-versa. Thoughts?

I just recently changed my settings to use Astropulse 5.05. I'm going to turn that off again and see if my cpu usage scales again...

Edit: Actually, i just looks like Astropulse is running significantly slower in comparison...it doesn't update in the Boinc Mangler nearly as quickly (with regard to percentage of task completed in the progress column) as the Multibeam tasks...
ID: 1103938 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1103945 - Posted: 6 May 2011, 19:11:58 UTC - in response to Message 1103938.  

Edit: Actually, i just looks like Astropulse is running significantly slower in comparison...it doesn't update in the Boinc Mangler nearly as quickly (with regard to percentage of task completed in the progress column) as the Multibeam tasks...

You could change to using the Optimised apps instead of the Stock apps, speed for CPU apps is approximately double the Stock apps, just need to get the Lunatics 0.37 installer, run it, choose the SSSE3 MB app and the SSE AP app and restart Boinc,

The downside is the apps won't auto update from the project, and you'd have to keep half an eye out for new apps eithier here or at Lunatics.

Claggy
ID: 1103945 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1103946 - Posted: 6 May 2011, 19:20:17 UTC - in response to Message 1103938.  

64GB RAM, 57GB available according to Task Mangler. 6.5GB paging.

It seems like when Multibeam is running, Astropulse is not...and perhaps vice-versa. Thoughts?

I just recently changed my settings to use Astropulse 5.05. I'm going to turn that off again and see if my cpu usage scales again...

Edit: Actually, i just looks like Astropulse is running significantly slower in comparison...it doesn't update in the Boinc Mangler nearly as quickly (with regard to percentage of task completed in the progress column) as the Multibeam tasks...

I wonder if the manager is spending to much time trying to get the status of 80 running tasks it is bogging everything down.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1103946 · Report as offensive
Profile gcpeters
Avatar

Send message
Joined: 20 May 99
Posts: 67
Credit: 109,352,237
RAC: 1
United States
Message 1103962 - Posted: 6 May 2011, 20:51:23 UTC - in response to Message 1103945.  

I've been wondering how people have been racking up the millions on dual core and four core machines when I'm just chugging along with my multiple 16+ core systems...
ID: 1103962 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1103969 - Posted: 6 May 2011, 21:15:24 UTC - in response to Message 1103929.  
Last modified: 6 May 2011, 21:18:13 UTC

I'm afraid u have missed to read the thread. Above are given shots with all the CPUs loaded without any need to modify "not processor grouping"-aware code of BOINC. It runs the processes, Windows put them on all CPUs.

Quite so...

My emphasis was that is only possible after Boinc has had to be blindly told to create more processes than Windows reported processors were available. Yes, Windows does the 'sensible' thing to allocate across all the processors but only after the user has specially intervened.

As a conscientious programmer, I would never blindly spew forth an arbitrary large number of processes. That is guaranteed to cripple most systems and leave the user frustrated. Hence, check first for how many processors are available. For Windows, you need to additionally know the special case of processor groups and specially program for that.


Hence this thread exists and a couple of others and there is extra development effort to modify the Boinc code uniquely for Windows for the Windows processor grouping. That's quite a lot of debug/reading/effort just to get Boinc to run with all available processors on Windows.

The problem is that BOINC does not get processor count properly, and just this. The code has been given above, single function call, which would resolve it.

The example for Boinc shows that even for the "single function call", that was still unexpected enough to have needed quite a few days to discover and fix.


On Windows, the time the processor scheduler takes to schedule a process increases as you add more processes and processors. Add too many and your system slows to a crawl while the scheduler works to decide what to do next.

It might be true for any scheduler. I'll read these links later to see what improvements those schedulers offer and on what price.

Comments and comparisons from an alternate view are most welcome.


Also, the "64 processors" max for a group may well be a hard-coded limit somewhere in the Windows code...

It makes perfect sense to me to keep(to have an option to keep) my processes "affinitized" over physically close processor and using same memory bus, on same node. It will make any necessary synchronization faster. In case of BOINC where number of mutually independent processes are running, this is not an issue.

You get NUMA[*] support for 'free' with the Linux schedulers. Linux has been NUMA-aware and NUMA-optimised for some time now so that application programmers do not need to lose time worrying over and programming for the low-level detail. You can usually gain performance if you program with a knowledge of, and a sympathy for, the underlying architecture. However, it helps greatly if you don't have to start worrying about NUMA-isms or processor affinities!


It doesn't seem great expense. In fact, the only effort needed is when u want to set process/thread affinities to certain group(which I explained why its logical and beneficial). Those efforts might be complex, but u can perfectly run without them - u dont need to invent logic to circumvent nodes.

It would be nice if the NUMA aspects and affinities were already automatically taken care of for the programmer by the OS in the first place beyond the 64 processor threads limit as has been hit here.


Linux has schedulers that take a fixed time to schedule regardless of the number of processes and processors to schedule.

Which seems not so clever to me, having in mind I/O thread, suspended threads, thread is wait state, prioritization of tasks, GUI thread(foreground, background, media intevsive) IRQL levels etc.

Too good to be true? The Linux schedulers are fully featured for optimising the use of all the processing resources. Look again?


Happy fast parallel computin',
Martin


* NUMA: Non-Uniform Memory Access

Also see: Operating system scheduler implementations (Any further detail anywhere for the present Windows scheduler?)
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1103969 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1103994 - Posted: 6 May 2011, 22:44:02 UTC - in response to Message 1103969.  
Last modified: 6 May 2011, 22:52:43 UTC


My emphasis was that is only possible after Boinc has had to be blindly told to create more processes than Windows reported processors were available. Yes, Windows does the 'sensible' thing to allocate across all the processors but only after the user has specially intervened.

Do you expect Windows to scale inadequate number of processes(bcs BOINC is doing this - spawning less processes than available CPUs) on all CPUs?
If u achieve this - to make parallel execution of single thread logic on more than one execution unit, u will get Nobel prize, for sure.
OTOH, if u expect Windows to use all the processors for less number processes, which cannot be parallelized - its not needed, it kills performance, bcs of cache trashing, running on not closely related nodes etc.
User intervention was actually telling to BOINC what is proper thing to do, not to Windows.


As a conscientious programmer, I would never blindly spew forth an arbitrary large number of processes. That is guaranteed to cripple most systems and leave the user frustrated. Hence, check first for how many processors are available. For Windows, you need to additionally know the special case of processor groups and specially program for that.

I'm not sure I understand u on this point, but knowing the right number of processor is matter of single function call. if this can make your life miserable, better quit programming.
I could only imagine what I would read from u if your try to circumvent write pipe on Alpha processors within StartIO or interrupt handler routine in driver.


The example for Boinc shows that even for the "single function call", that was still unexpected enough to have needed quite a few days to discover and fix.

Again, if u consider this a problem, quit programming, really. Such systems are rare enough to worry about on wide basis. If your application is targeted for such systems, you would be well aware what to except, or, u can learn it in 10 minutes, like I did from MSDN. Or, u will learn it in hard way, like BOINC did.


Comments and comparisons from an alternate view are most welcome.

Yeah, I was first to ask for comparisons. You was the first stated that Linux is better and Windows sux - prove it. In fact, u brought Linux in this thread out of nowhere. I'm not thankful for it.



You get NUMA[*] support for 'free' with the Linux schedulers. Linux has been NUMA-aware and NUMA-optimised for some time now so that application programmers do not need to lose time worrying over and programming for the low-level detail. You can usually gain performance if you program with a knowledge of, and a sympathy for, the underlying architecture. However, it helps greatly if you don't have to start worrying about NUMA-isms or processor affinities!

You don't need to worry about on Windows too, if u don't need to worry about where your processes run. With Windows, if u worry, u can take measures, on Linux - u can prey OS does best for you. No, thanks.


It would be nice if the NUMA aspects and affinities were already automatically taken care of for the programmer by the OS in the first place beyond the 64 processor threads limit as has been hit here.

There is no limit, seems I have to repeat again. This "limit" is put there to simplify(and actually maybe speed-up, bcs of being operation on single variable, probalbly atomic, no need to lock memory location, containing all 1024 bits) you affinity set calls, if u dont need to worry about running on several nodes.
It gives u choices.


Too good to be true?

Yeah, too good to be true. Implemented in Windows it is.


The Linux schedulers are fully featured for optimising the use of all the processing resources. Look again?

This can be said for any OS, probably by its advertisement department representative. Thus making this statement equal to null.
ID: 1103994 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : Multi core greater than 80 core


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.