Revisiting the Onboard Intel GPU Issue

Message boards : Number crunching : Revisiting the Onboard Intel GPU Issue
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 1996628 - Posted: 3 Jun 2019, 22:16:50 UTC - in response to Message 1996618.  

And it would be a nice ability, if it worked consistently, but I'm in my second post over issues where it isn't functioning on a trustworthy basis. It would almost seem that there would be an easier way for humans to make changes to these clients, or even let us identify the cards with a friendly name. My system has one intel and two differing AMD GPUs, and the difference between the AMD platforms is 20 compute units. Until recently, with some command line options, it was very difficult to balance the loads between the Vega and the RX. Then the exclude_gpu tag failed, and chaos ensued...

Some command line options don't work, i.e. the -total_GPU_instances_num 12 -instances_per_device 6 at least on Macs, but that don't allow me to make device specific changes. The app_config doesn't allow for any modifications based on which card, and if the OS is assigning device numbers, how do we reliably account for this given we don't have a way to reliably identify the hardware. Plug & play was supposed to give us these capabilities. I could complain, but that's really not the direction I'd like to really go. We are identifying an issue, relatively minor, annoying, and sometimes devastating where the client is just not cutting it the way it has been designed.

The client at least makes an attempt to identify everything in the client state, if you're like us and not afraid to peek and poke around inside some arbitrary file and make modifications, but then it seems there are files which override, certain other files, and trying to find reliable information on the internet is like finding a nice news source which doesn't tell us what to think. Even still, the client state has an obvious problem if cc_config says ignore a card completely, why does it still show up there? Should the final list of devices be in separate files? On and on it goes. I wish I had more time to devote to programming, because I'd really love to try to rethink the client and simplify it. Perhaps even test the machine for things like this GPU using shared memory and disallow it if it will not benefit the system in general, or worse yet, cause it to crash, slow down, or restrict the CPU from doing a more efficient job. Granted, these guys are likely doing much of this as part of a classroom activity at Berkeley, and really don't have time to get into software design, as opposed to what you'd find in places I've worked before, and they are doing a good job overall. There's just these annoying quirks which we know should not happen. If this seems overly harsh, please, don't think that. It's just an annoyance. If it offends, please pardon my outward frustration.
ID: 1996628 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1996634 - Posted: 3 Jun 2019, 22:31:38 UTC - in response to Message 1996628.  

Gpu_exclude should work as device identified in the Event Log. I should think your Vega and RX cards would have been identified uniquely enough to know which was which. But I am unfamiliar with AMD/ATI hardware. It's been mentioned a few times in the developers threads that the ability to set exclude per app_version and per project level would be most welcome which it can't be currently.
I ran into this last year with my issues with gpu excludes running with max concurrent statements that through BOINC completely out of whack. That lead to some major changes in the client code that haven't made into the mainline client yet.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1996634 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 1996668 - Posted: 3 Jun 2019, 23:53:20 UTC - in response to Message 1996634.  
Last modified: 3 Jun 2019, 23:54:37 UTC

Yes, I was looking at splitting my GPUs into a per app basis using the exclude for the purpose of fitting astropulse to one and setiathome to another card, which would have been great if there were any astropulse units to work on. But to have such radically different cards working on the same application, both with different capabilities, tweaking them becomes extremely difficult. The jobs I could push to the Vega would swamp the RX. Lesson learned for me, match the cards, hence my ordering two Frontier Editions to maximize my throughput equally. At least the memory sizes matched on the two cards.
ID: 1996668 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1996679 - Posted: 4 Jun 2019, 0:45:58 UTC - in response to Message 1996668.  

Richard Haselgrove explained it to me and lamented on the limitations in this message.
https://setiathome.berkeley.edu/forum_thread.php?id=83645&postid=1971079
That needs to be added to the "wish list" which is here.
https://setiathome.berkeley.edu/forum_forum.php?id=17
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1996679 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 1996689 - Posted: 4 Jun 2019, 2:35:56 UTC - in response to Message 1996679.  

Thanks! That cleared up a couple of small questions I have been looking for, but made me realize I need one more addition added to the list. One to flag the system as preparing to change hardware configuration stopping the client from processing any new work units and flagging the system to reprocess files when the configuration changes are complete. Not sure if I'm going to get the annoying GPU not found when I replace them, and force the system to download files to fit the new GPUs. Hopefully not as they're still AMD GPUs.
ID: 1996689 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 1996934 - Posted: 5 Jun 2019, 18:25:07 UTC - in response to Message 1996689.  
Last modified: 5 Jun 2019, 18:26:40 UTC

Ok, so I promised charts. Here's the graphical representations. The HP Windows machine still hasn't yet settled, and I'll keep watching it, but it was the most hampered by the iGPU it seems. The older Mac mini 2014 appears to be leveling out. The new Mac wasn't affected much at all, and my only reason for including it is that little uptick at the end, which will be the subject of my new post, perhaps a week from now regarding AMD GPUs, and the command line options. The new Frontier Edition cards will be here in the next few days, as well as a new Sonnet Break Away 650, so upgrade time.

https://imgur.com/5LSzGgT

https://imgur.com/s8E69qH

https://imgur.com/FnvUZKV
ID: 1996934 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 1996936 - Posted: 5 Jun 2019, 18:30:55 UTC - in response to Message 1996934.  

Numerical Representations (i7 is the HP, i5 the Mac mini):

https://imgur.com/AXn1tR3

https://imgur.com/89GBVBw
ID: 1996936 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Revisiting the Onboard Intel GPU Issue


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.