纳金网
标题:
Heterogeneous Particle-based Simulation
[打印本页]
作者:
彬彬
时间:
2012-1-4 17:11
标题:
Heterogeneous Particle-based Simulation
1 Introduction
Particle-based simulations have been used to simulate granular materials,
fluids, and rigid bodies. To achieve realistic behavior, a
large number of particles have to be simulated. Particle-based simulations
are suited for GPUs because the computation of each particle
is almost the same, (i.e., the granularity of the computation
is uniform over the particles). This is preferable for GPUs with a
wide SIMD architecture. However, particle-based simulation on the
GPU has been mostly restricted to simulating particles of identical
size [Harada et al. 2007]. This is because the work granularity is
non-uniform if there are particles with different radii, which leads to
inefficient use of the GPU. Heterogeneous CPU/GPU architectures,
such as AMD Fusion
R APUs, can solve this simulation efficiently
by using the CPU and the GPU at the same time. On a PC with
a CPU and a discrete GPU, whenever a computation is dispatched
to the GPU, the data has to be sent via PCI Express
R bus, which
introduces a latency. However, heterogeneous architectures have
a CPU and a GPU on the same die with a tightly coupled shared
memory, so the same memory space can be accessed from the GPU
and the CPU without any copying, which can facilitate a tight collaboration
between the two processors. In this paper, we describe
a particle-based simulation with particles of various sizes running
on a heterogeneous architecture by dispatching and simultaneously
processing work on the GPU and CPU depending on the granularity.
2 Method
The simulation we developed maximizes the use of all the available
resources of a heterogeneous architecture by performing computation
concurrently on both the CPU and the GPU components.
The simulation uses a CPU thread for dispatching work to the GPU
(GPU control thread) and multiple CPU threads for computation
(CPU computation threads), whereas an application using only the
GPU uses one CPU thread. The target architecture was an AMD ASeries
APU with four CPU cores and a GPU. Our implementation
e-mail: takahiro.harada@amd.com
!"#$
$$$
%"#$
$$$
&'()*$
+,,-$
./0',/'01$
..$
%2))(3(24$
.$
54/1607824$
9$
54/1607824$
99$
%2))(3(24$
&'()*$
+,,-$
./0',/'01$
.:4,;024(<7824$
"23(824$
=1)2,(/
!0(*$
>20,1$
.:4,;024(<7824$
9.$
%2))(3(24$
Figure 2: A step of the simulation using two CPU threads.
for the architecture used the GPU, a GPU control thread, and three
CPU computation threads. For simplicity, we first describe a simulation
using the GPU and two CPU threads (a GPU control thread
and a CPU computation thread). Then we describe how to scale the
simulation to the GPU, a GPU control thread, and multiple CPU
computation threads.
2.1 Simulation using the GPU, a GPU control thread
and a CPU computation thread
A simulation with particles of various sizes as shown in Fig.1 is
a coupling of two simulations: a simulation with identical-sized
particles (small particles colored with blue) and a simulation with
varying-sized particles (large particles colored with red and green).
If the interaction between large and small particles is not considered,
the simulation of small particles has a uniform work granularity.
Thus it is suited to be processed by the GPU. On the other
hand, using the CPU is a better choice for the computation of large
particles because the granularity of the simulation of large particles
is not uniform. Therefore, small and large particle simulations are
performed on the GPU and the CPU respectively. Note that they
are also running concurrently.
A simulation step consists of three steps; building an acceleration
structure, collision, and integration. Brute-force collision computation
is prohibitively expensive when there are a large number of
particles, so an acceleration structure is built to improve the efficiency
of collision. Colliding particles are searched for and repulsion
forces are calculated. The integration step updates particle velocity
and positions.
For a coupled simulation as shown in Fig.1, we have to think about
how to handle the collision between large and small particles (LS
collision). LS collision is performed by searching for colliding
small particles for each large particle and accumulating the forces
on the small and large particles. To improve the efficiency of the
search, we can reuse the data structure built for small-small (SS)
collision. For each large particle, a bounding box in the coordinate
system of the uniform grid is calculated and small particles found in
the grid cells overlapped with the bounding box. Work granularity
for each large particle depends on the size of the particle because
the number of overlapping cells depends on the size of a bounding
box. Therefore it is more efficient to perform LS collision on the
CPU computation thread.
作者:
彬彬
时间:
2012-1-13 10:51
作者:
tc
时间:
2012-2-22 23:27
楼主收集的可真全哦
作者:
奇
时间:
2012-3-20 23:18
沙发不解释
作者:
菜刀吻电线
时间:
2012-4-19 23:26
水……生命之源……灌……
作者:
tc
时间:
2012-5-27 23:20
凡系斑竹滴话要听;凡系朋友滴帖要顶
作者:
C.R.CAN
时间:
2012-6-2 23:27
有意思!学习了!
作者:
晃晃
时间:
2012-8-6 00:28
响应天帅号召,顶
作者:
铁锹
时间:
2012-8-7 08:50
SketchUp学习与技巧
克莱斯勒将3D技术用于变速箱生产
虽有3D地图还须拿牌照_苹果意图很明显
Janus奖首次亮相2012大连设计节
“3D电影技术”促细胞吞噬研究
作者:
菜刀吻电线
时间:
2012-8-30 00:51
再看一看,再顶楼主
作者:
tc
时间:
2012-9-5 23:39
路过、路过、快到鸟,列位请继续...ing
作者:
铁锹
时间:
2012-9-6 09:53
3D人体地图:人人都可以学解剖
松下展出业界最大的裸眼3D电视
3D打印火热_国内外试水企业颇多
美科研机构投资3D打印技术_意在重振制造业
作者:
晃晃
时间:
2012-9-19 23:30
加精、加亮滴铁子,尤其要多丁页丁页
作者:
晃晃
时间:
2012-12-6 23:21
我就看看,我不说话
作者:
菜刀吻电线
时间:
2013-2-5 23:26
“再次路过……”我造一个-----特别路过
作者:
奇
时间:
2013-2-12 23:27
跑着去顶朋友滴铁
作者:
奇
时间:
2013-3-16 23:19
不错哦,谢谢楼主
欢迎光临 纳金网 (http://go.narkii.com/club/)
Powered by Discuz! X2.5