core - Opencl Workitems and streaming processors -
what relation between workitem , streaming processor(cuda core). read somewhere number of workitems should exceed number of cores, otherwise there no performance improvement. why so?? thought 1 core repsresents 1 workitem. can me understand this?
gpus , other hardware tend arithmetic faster can access of available memory. having many more work items have processors lets scheduler stagger memory use, while work items have read data using alu hardware processing.
here page optimization in opencl. scroll down " 2.4. removing 'costly' global gpu memory access", goes concept.
Comments
Post a Comment