纳金网

标题: Heterogeneous Particle-based Simulation [打印本页]

作者: 彬彬    时间: 2012-1-4 17:11
标题: Heterogeneous Particle-based Simulation
1 Introduction

Particle-based simulations have been used to simulate granular materials,

fluids, and rigid bodies. To achieve realistic behavior, a

large number of particles have to be simulated. Particle-based simulations

are suited for GPUs because the computation of each particle

is almost the same, (i.e., the granularity of the computation

is uniform over the particles). This is preferable for GPUs with a

wide SIMD architecture. However, particle-based simulation on the

GPU has been mostly restricted to simulating particles of identical

size [Harada et al. 2007]. This is because the work granularity is

non-uniform if there are particles with different radii, which leads to

inefficient use of the GPU. Heterogeneous CPU/GPU architectures,

such as AMD Fusion

R APUs, can solve this simulation efficiently

by using the CPU and the GPU at the same time. On a PC with

a CPU and a discrete GPU, whenever a computation is dispatched

to the GPU, the data has to be sent via PCI Express

R bus, which

introduces a latency. However, heterogeneous architectures have

a CPU and a GPU on the same die with a tightly coupled shared

memory, so the same memory space can be accessed from the GPU

and the CPU without any copying, which can facilitate a tight collaboration

between the two processors. In this paper, we describe

a particle-based simulation with particles of various sizes running

on a heterogeneous architecture by dispatching and simultaneously

processing work on the GPU and CPU depending on the granularity.

2 Method

The simulation we developed maximizes the use of all the available

resources of a heterogeneous architecture by performing computation

concurrently on both the CPU and the GPU components.

The simulation uses a CPU thread for dispatching work to the GPU

(GPU control thread) and multiple CPU threads for computation

(CPU computation threads), whereas an application using only the

GPU uses one CPU thread. The target architecture was an AMD ASeries

APU with four CPU cores and a GPU. Our implementation

e-mail: takahiro.harada@amd.com

!"#$

$$$

%"#$

$$$

&'()*$

+,,-$

./0',/'01$

..$

%2))(3(24$

.$

54/1607824$

9$

54/1607824$

99$

%2))(3(24$

&'()*$

+,,-$

./0',/'01$

.:4,;024(<7824$

"23(824$

=1)2,(/

!0(*$

>20,1$

.:4,;024(<7824$

9.$

%2))(3(24$

Figure 2: A step of the simulation using two CPU threads.

for the architecture used the GPU, a GPU control thread, and three

CPU computation threads. For simplicity, we first describe a simulation

using the GPU and two CPU threads (a GPU control thread

and a CPU computation thread). Then we describe how to scale the

simulation to the GPU, a GPU control thread, and multiple CPU

computation threads.

2.1 Simulation using the GPU, a GPU control thread

and a CPU computation thread

A simulation with particles of various sizes as shown in Fig.1 is

a coupling of two simulations: a simulation with identical-sized

particles (small particles colored with blue) and a simulation with

varying-sized particles (large particles colored with red and green).

If the interaction between large and small particles is not considered,

the simulation of small particles has a uniform work granularity.

Thus it is suited to be processed by the GPU. On the other

hand, using the CPU is a better choice for the computation of large

particles because the granularity of the simulation of large particles

is not uniform. Therefore, small and large particle simulations are

performed on the GPU and the CPU respectively. Note that they

are also running concurrently.

A simulation step consists of three steps; building an acceleration

structure, collision, and integration. Brute-force collision computation

is prohibitively expensive when there are a large number of

particles, so an acceleration structure is built to improve the efficiency

of collision. Colliding particles are searched for and repulsion

forces are calculated. The integration step updates particle velocity

and positions.

For a coupled simulation as shown in Fig.1, we have to think about

how to handle the collision between large and small particles (LS

collision). LS collision is performed by searching for colliding

small particles for each large particle and accumulating the forces

on the small and large particles. To improve the efficiency of the

search, we can reuse the data structure built for small-small (SS)

collision. For each large particle, a bounding box in the coordinate

system of the uniform grid is calculated and small particles found in

the grid cells overlapped with the bounding box. Work granularity

for each large particle depends on the size of the particle because

the number of overlapping cells depends on the size of a bounding

box. Therefore it is more efficient to perform LS collision on the

CPU computation thread.
作者: 彬彬    时间: 2012-1-13 10:51



作者: tc    时间: 2012-2-22 23:27
楼主收集的可真全哦

作者: 奇    时间: 2012-3-20 23:18
沙发不解释

作者: 菜刀吻电线    时间: 2012-4-19 23:26
水……生命之源……灌……

作者: tc    时间: 2012-5-27 23:20
凡系斑竹滴话要听;凡系朋友滴帖要顶

作者: C.R.CAN    时间: 2012-6-2 23:27
有意思!学习了!

作者: 晃晃    时间: 2012-8-6 00:28
响应天帅号召,顶

作者: 铁锹    时间: 2012-8-7 08:50
SketchUp学习与技巧



克莱斯勒将3D技术用于变速箱生产




虽有3D地图还须拿牌照_苹果意图很明显




Janus奖首次亮相2012大连设计节




“3D电影技术”促细胞吞噬研究

作者: 菜刀吻电线    时间: 2012-8-30 00:51
再看一看,再顶楼主

作者: tc    时间: 2012-9-5 23:39
路过、路过、快到鸟,列位请继续...ing

作者: 铁锹    时间: 2012-9-6 09:53
3D人体地图:人人都可以学解剖



松下展出业界最大的裸眼3D电视




3D打印火热_国内外试水企业颇多




美科研机构投资3D打印技术_意在重振制造业



作者: 晃晃    时间: 2012-9-19 23:30
加精、加亮滴铁子,尤其要多丁页丁页

作者: 晃晃    时间: 2012-12-6 23:21
我就看看,我不说话

作者: 菜刀吻电线    时间: 2013-2-5 23:26
“再次路过……”我造一个-----特别路过

作者: 奇    时间: 2013-2-12 23:27
跑着去顶朋友滴铁

作者: 奇    时间: 2013-3-16 23:19
不错哦,谢谢楼主





欢迎光临 纳金网 (http://go.narkii.com/club/) Powered by Discuz! X2.5