Table Of Content

ThisspaceisreservedfortheProcediaheader,donotuseit 6 1 0 A Novel Implementation of QuickHull Algorithm 2 on the GPU y a M JiayinZhang1, GangMei1∗, NengxiongXu1, and KunyangZhao1 0 SchoolofEngineeringandTechnology,ChinaUniversityofGeosciences,100083,Beijing,China 2 {zhangjy,gang.mei,xunengxiong,kunyang}@cugb.edu.cn ] G C Abstract . s WepresentanovelGPU-acceleratedimplementationoftheQuickHullalgorihtmforcalculatingconvex c hulls of planarpointsets. We also describea practicalsolution to demonstratehow to efficiently im- [ plementatypicalDivide-and-ConqueralgorithmontheGPU.Wehighlyutilizetheparallelprimitives 3 providedbythelibraryThrustsuchastheparallelsegmentedscanforbetterefficiencyandsimplicity. v Toevaluatetheperformanceofourimplementation,wecarryoutfourgroupsofexperimentaltestsus- 6 ing two groupsof pointsets in two modeson the GPU K20c. Experimentalresults indicatethat: our 0 implementationcanachievethespeedupsofupto10.98xoverthestate-of-artCPU-basedconvexhull 7 4 implementationQhull[16]. Inaddition,ourimplementationcanfindtheconvexhullof20Mpointsin 0 about0.2seconds. . 1 Keywords: GPU;Parallelization;ConvexHull;QuickHull;Divide-and-Conquer 0 5 1 : 1 Introduction v i X Thecalculatingoftheconvexhullofasetofplanarpointsistofindtheconvexpolygonthatenclosesall r thepoints. Severalclassicalgorithmshavebeendevelopedsince1970,includingtheGrahamscan[8], a Giftwrapping[9],Incrementalmethod[11],Divide-and-Conquer[15],Monotonechain[1],andQuick- Hull [2]. Several recent efforts have also been conductedto develop efficient convexhull algorithms [24,12]. The finding of convex hulls of large sets of points is in general computationally expensive. An effectivestrategyfordealingwiththisproblemistocalculateconvexhullsinparallel. Inrecentyears, toimprovethecomputationalefficiencyofcalculatingconvexhulls,severalworthfulcontributionshave beenmadeto re-designandimplementsequentialconvexhullalgorithmsinparallelby exploitingthe power of massively computing on the GPU [18, 10, 19, 20, 21, 22, 23, 6, 7]. Most of these GPU- acceleratedimplementationsaredevelopedbaseduponthefamousQuickHullalgorithm[2]. Forexample,Srikanth,Kothapalli,GovindarajuluandNarayanan[18]firstparallelizedtheQuick- Hullalgorithmtoacceleratethecalculatingof2Dconvexhulls;andthenSrungarapu,Reddy,Kothapalli ∗Correspondingauthor:[email protected];[email protected] 1 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao and Narayanan[19] improvedthe abovework and achieved better efficiency. Tzeng and Owens [22] presentedaframeworkforacceleratingthecomputingofconvexhullintheDivide-and-Conquerfashion bytakingadvantageofQuickHull.Similarly,byalsoutilizingtheQuickHullapproach,Stein,Gevaand El-Sana[20]presentedanovelGPU-acceleratedimplementationof3Dconvexhullalgorithm. Inthis paper,we presenta novelGPU-acceleratedimplementationofthe famousQuickHullalgo- rithm for computing the convex hulls of planar point sets. We also describe a practical solution to demonstratehowtoimplementatypicalDivide-and-ConqueralgorithmontheGPU.Ourimplementa- tion is quite similar to the one introducedby Tzeng and Owens [22]; butthere are several significant differencesbetweenourimplementationandthatofTzengandOwens. Ourimplementationcanachieve aspeedupofupto10.98xoverastandardsequentialCPUimplementation,andcanfindtheconvexhull asetof20Mpointsinabout0.2seconds. The rest of the paper is organized as follows. Section 2 introduces several background concepts behind our implementation. Section 3 describes the basic ideas behind our implementation. Section 4 further presentssome implementationdetails. Then Section 5 givesseveralgroupsof experimental results,whileSection6discussestheresult. Finally,Section7concludesthiswork. 2 Segment and Segmented Scan Segmentsarecontiguouspartitionsofthedatawhicharemaintainedbysegmentflags[4,17]. Thereare typicallytwoformsforrepresentingsegments: thefirstoneistouseasetofheadflags; andtheother istouseasetofkeys;seeFigure1. Aheadflagmarksthebeginningofasegment(calledthesegment head).Akeyindicatestheindexofthesegmentthateachelement/valueinagivenarraybelongingto. Segmentedscangeneralizesthescanprimitivebyallowingscansonarbitrarysegments(“partitions”) oftheinputvector[4]. Toimplementsegmentedscaninparallel,Sengupta,Harris,ZhangandOwens [17]introducedthesegmentedscanprimitivetotheGPU.AndcurrentlyboththelibrariesCUDPP[5] andThrust[3]bothprovideefficientGPU-acceleratedsegmentedscanprimitives. ThrustisaC++templatelibraryforCUDAbasedontheStandardTemplateLibrary(STL).Thrust allows users to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C [14]. Thrust provides a rich collection of data structuresand data parallel primitivessuch as scan, sort, and reduce, which can be composedtogethertoimplementcomplexalgorithmswithconcise,readablesourcecode. Figure1: Segmentsandtworepresentationformsofsegments Segment 0 Segment 1 Segment 2 Segment 3 Value 10 56 22 35 12 89 17 87 11 45 30 19 29 40 41 37 27 20 13 21 Head 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 Key 0 0 0 0 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 First point 0 4 7 14 2 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao 3 Basic Ideas behind Our Implementation Themostimportantideabehindourimplementationistodirectlyoperatethedataintheinputarraysthat areoriginallyallocatedtostorethecoordinatesofinputpoints,ratherthanintheadditionallyallocated arraysorsplittingtheinputdataintoseparatearrays. TheQuickHullalgorithmisaDivide-and-Conquermethod,whichtendstodividetheinputdataset intosubsetsandthenhandlesthese subsetsrecursively. OntheGPU, aneffectivestrategyistodivide the inputdata set into subsets, butdo notstore them in separatedarrayswith differentsizes. Instead, all the data of the subsets are still stored in the input data array, but the data of each subset is stored intoaSegment(i.e.,aconsecutivepiece/partitionofdata)[4]. Operationscarriedoutforeachsubsetis exactlytheoperationsforeachsegment[22]. Weadoptthisstrategytodevelopourimplementation. Figure2:Procedureofthe2DCUDAQuickHullontheGPU(withoutpreprocessing) Procedure: 2D Quickhull on the GPU Input: a set of input points pt_in Output: convex hull ch_out First Split 1: Use parallel reduction to find the leftmost point Pminx and the rightmost point Pmaxx 2: Determine the positions of the rest points against the line PminxPmaxx Assign a flag value to indicate the position: flag = 1 when below; flag = 0 when above 3: Use parallel partition to split the points into two segments according to the flag values: the lower subset Slower and the upper subset Supper 4: Use parallel sorting to sort the Slower in x-ascending and Supper in x-descending Recursive Step Repeat for each segment (Pfirst, Plast) represented by two points Pfirst and Plast do 5: Find the farthest point Pfar from the line Pfirst Plast 6: Divide segment (Pfirst, Plast) into two new segments (Pfirst, Pfar) and (Pfar, Plast) 7: Update all segments (including head flags, keys, first points) 8: Detect interior points by determining the positions 9: Assign each point a state flag depending on its position to indicate the state: 1: current non-interior points; 0: determined interior points 10: Use parallel stable_partition to gather all interior points according to the state flags, then remove all interior points 11: Update only the first points of all segments Until there are no interior points can be found 12: Output the remaining points in pt_in as the extreme points of ch_out TheprocedureofourimplementationispresentedinFigure2. Inourimplementation,aftersplitting theinputsetofpointsintothelowerandtheuppersubsets,wesortthetwosubsetsseparatelyaccording to the x-coordinates. After this sorting, either the lower or the upper subset of sorted points can be consideredasaMonotoneChain[1];inaddition,boththeabovechainscanbefurtherconsideredastwo halvesofageneralpolygon.Ifwedetectandremovethoseverticesofthegeneralpolygonthathavethe interioranglesgreaterthan180degrees,thenwecanobtainthedesiredconvexhull. Theaboveideaof firstsortingandthenremovingnon-extremepoints/concaveverticeswasintroducedin[1]and[13]. Therefore,the basicidea behindourimplementationisto “Find-and-Remove”. Inthestep offirst split, we divide the input points and then sort them to virtually form a general polygon. In the subsequent recursive step, we recursively first find those non-extreme points, and then remove them to guaranteethatalltheremainingpointsarecompletelyextremepointsoftheexpectedconvexhull. 3 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao 4 Implementation Details 4.1 DataStorage andData Layout Weallocateseveralarraysonthedevicesidetostorethecoordinatesofplanarpoints,informationabout segments,andotherrequiredvaluessuchasdistances;seeTable1. Table1:Allocatedarraysforstoringdataonthedevice Array Usage float x[n] xcoordinates float y[n] ycoordinates float dist[n] Distances int head[n] Indicatorofthefirstpointofeachsegment (1:Headpoint;0:Notaheadpoint) int keys[n] Indexofthesegmentthateachpointbelongsto int first pts[n] Indexofthefirstpointofeachsegment int flag[n] Indicatewhetherapointisanextremepointoraninteriorpoint (1:Potentialextremepoint;0:Determinedinteriorpoint) 4.2 The Preprocessing Procedure BeforeperformingtheQuickHullalgorithmontheGPU,wefirstcarryoutapreprocessingprocedureto filtertheinputpoints. Theobjectiveofthispreprocessingprocedureistoreducethenumberofpoints bydiscardingthosepointsthatarenotneededforconsiderationinthesubsequentstageofcalculating thedesiredconvexhull. Weusetheparallelreductiontofindtheextremepointswithminormaxxorycoordinate. Inmore details, we adopt the thrust::minmax element(x.begin(), x.end()) to find the leftmost and the rightmost points, and similarly use the thrust::minmax element(y.begin(), y.end())toobtainthetopmostandthebottommostpoints. Thesefourextremepointsarethenused toformaconvexquadrilateral. We also design a simple CUDA kernel to check each pointto determine whetherit locates inside thequadrilateral.Inthekernel,eachthreadisresponsiblefordeterminingthepositionofonlyonepoint Pi, i.e., whether or not a point falls into the formed convexquadrilateral. If does, the corresponding indicatorvalueflag[i]willbesetto0,otherwise,thevalueflag[i]isstillkeptas1. 4.3 The FirstSplit The firstsplitof theQuichHullalgorithmis todividethe setofinputpointsinto two subsets, i.e., the lowerandtheuppersubsets,usingthelinesegmentLformedbytheleftmostandtherightmostpoints. ThosepointsthatlocatebelowtheLaregroupedintothelowersubset,whiletheonesdistributedabove theLarecontainedintheuppersubset. We developanotherquite simplekernelto performthe abovesplitprocedure. Inthis kernel, each threadtakestheresponsibilitytodeterminethepositionofonlyonepointwithrespecttothelinesegment L. Inthisstep,wetemporarilyusethevaluesintflag[n]toindicatethepositions: ifthepointPi locatesbelowtheL,inotherwords,ifPi belongstothelowersubset,thenthecorrespondingindicator valueflag[i]willbesetto1,otherwise0. 4 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao Afterdeterminingthepositionsofallpoints,itisneededtogatherthepointsbelongingtothesame subsetsuchastheloweronetogetheraccordingtotheindicatorvaluesintflag[n].Werealizedthis procedureby simply using the functionthrust::partition(). Those pointswith the indicator value1,i.e.,thelowerpoints,willbeplacedintothefirstconsecutivehalfoftheinputarray(asegment of points), while the upper points will be grouped into another consecutive half (another segment of points). Insubsequentsteps,operationswillbeperformedinthesegmentsofpoints. 4.4 The RecursiveProcedure 4.4.1 FindingtheFarthestPoint The first step in the recursiveprocedureis to find the farthestpointsin each segment, which includes tworemarkableissues. Thefirstistocalculatethedistanceforallpointsinparallel;andtheotheristo findthosepointswiththefarthestdistancesforallsegmentsinparallel. Calculatingthe Distance. The calculationof the distancefroma pointto a line is quite straight- forward. However,inourimplementation,itisneededto calculatethedistancesfromdifferentpoints to differentlines simultaneously in parallel. This calculation is not so easy to implementin practice. This is because that: (1) for each segment, it is needed to compute the distance from each of those pointsbelongingtothissegmenttothelineformedbythefirstpointsandthelastpoint;(2)foranytwo segments,theirfirstpointandlastpointsaredifferent. Inourimplementation,foreachpointPi,weusethevaluefirst pt[i]torecordtheindexofthe firstpointofthatsegmentitbelongsto. Sincesegmentsarestoredconsecutively,thefirstpointofthe (j+1)th segmentisexactlythelastpointofthejth segmentexceptforthelastsegment. Notethatthe lastpointofthelastsegmentisthepointP0. Therefore,itiseasytoobtaintheindexofthelastpointfor eachsegment,andthedistancefromthepointPitothelineformedbythefirstpointandthelastpoint. FindtheFarthestPoint. Aftercalculatingthedistancesforthosepointsindifferentsegments,itis neededtofindthefarthestpointineachsegment. Thefindingofthefarthestpointsforallsegmentsin parallelisnotso easy. Thisis becausethereare morethan two segmentsthatare neededto findtheir farthestpoints. Theirfarthestpointsareprivate,segment-specific. Inthiscase, itisunabletoperform a global parallel reduction, but is able to employ a segmented parallel reduction to find the greatest distanceforeachsegmentofpoints. ThesegmentedparallelreductionisdesignedinThrusttomakeaparallelreductionformorethan one segment of points. It can be used to find the min or max values in several segments in parallel. We employ the parallel primitive thrust::reduce by key() in our implementation to find the farthestpoints/maximumdistancesforallsegmentsinparallel. 4.4.2 TheFirstRoundofUpdatingSegments Afterfindingthefarthestpoints,eachsegmentisthentypicallydividedintotwosmallersubsegments usingthe farthestpoint. Thismeansthattheoldsegmentsarereplacedwith newsegments. Tocreate newsegments,thefollowinginformationofsegmentsisneededtobeupdated: (1)Headflags Theheadflagsofthefarthestpointsareneededtobemodifiedfrom0to1,whichmeanseachofthe farthestpointsbecomesthefirstpointofanewsegmentandadeterminedextremepointsofthedesired convexhull. Theheadflagsofotherpointsarekeptunchanged. (2)Keys Theupdatingofkeyscanbeveryeasilyrealizedbyperformingaglobalinclusivescanforthehead flags. Forthatineachsegmentonlytheheadflagofthefirstpointis1,thesumoftheheadflagscanbe consideredastheindexofthesegment(i.e.,thekeys).Noticeably,thesumoftheheadflagsisone-based 5 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao rather than zero-based(i.e., starting from 1 rather 0). To make the indicesbecome much easier to be used,wefurthermodifythekeysfromone-basedtozero-basedbyperformingaparallelsubtracting. (3)Indicesoffirstpoints Afterupdatingtheheadflagsandkeysofeachsegment,thecorrespondingindicesofthefirstpoints arenolongervalid,andthusneededtobeupdated. Beforeupdatingtheindicesofthefirstpoints,we firstassignaglobalindexforeachoftheremainingpoints,thencheckeachpointwhetheritisthefirst pointaccordingtotheheadflags. IftheheadflagofapointPi is1,thenthispointmustbeafirstpoint ofasegmentanditsindexisexactlyi. 4.4.3 DiscardingInteriorPoints Thediscardingofinteriorpointsistofirstcheckwhetherornotapointlocatesinsidethetriangleformed by the firstpointof the segment(denotedasA), the last pointofthe segment(denotedas B), andthe farthestpointinthissegment(denotedasC). Let△ACBdenotethetriangle,thedeterminingofthepoints’positionswithrespecttothetriangle △ACBistocheckwhetherthosepointsinthissegmentlocateontherightsideofthedirectedlineAC andthedirectedlineCB. Ifdoes,thenitisnotaninteriorpoint,anditscorrespondingindicatorvalue flag[i] is set to 1; otherwise, it is an interior point and the value flag[i] must modified to 0. Similarly, for each pointin the segment(C,B), it is onlyneededto checkwhether it falls onthe right sidethedirectedlineCB. Afterdeterminingallinteriorpointsin thisrecursion,we employa parallelpartitioningprocedure to gather all interior points together according to the indicator values int flag[n]; see a simple illustrationin Figure3. Noticeably,to maintainthe relativeorderof inputpoints, we usethe function thrust::stable partition()ratherthanthe functionthrust::partition(). Afterthe partitioning,wemakeanoperationresize()toremovealltheinteriorpointsfoundinthisrecursion. Figure3:Partitioningaccordingtoflags Flag 1 1 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 0 1 1 Value 10 56 22 35 12 89 17 87 11 45 30 19 29 40 41 37 27 20 13 21 Value 10 56 35 12 89 87 45 30 29 40 41 37 13 21 22 17 11 19 27 20 Flag 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 4.4.4 TheSecondRoundofUpdatingSegments Thisroundofupdatingthesegmentsisnearlythesametothefirstroundofupdating. Thereasonwhy thisroundofupdatingisneededtobeperformedisthat:afterremovingsomeinteriorpoints,thepoints belongingtosomesegmentsareremoved. Inthiscase,theglobalindicesofalltheremainingpointare notconsecutiveandthusneedtoberearranged;seethefirstroundofupdatingformoredetailsaboutthe aboveupdating.Noticeably,theheadflagsandthekeysfortheremainingpointsarestillcorrect,anddo notneedtobeupdated.Onlytheindexofthefirstpointofeachsegmentneedstobeupdated. 6 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao 5 Results WeperformourexperimentaltestsusingtheNVIDIATeslaK20cgraphicscardwith4GBmemoryand CUDAv5.5.TheCPUexperimentsareperformedonWindows7SP1withanIntelE5-2650(2.60GHz) and 96GB of RAM memory. Two groupsof test data is employedin two modes to evaluate the performance of our implementation. The first group of test data includes 8 sets of points that randomly distributedintheunitsquare. Theothergroupoftestdataisderivedfrompubliclyavailable3Dmesh modelsbyprojectingallverticesofeachmeshmodelintotheXYplane.Thesemeshmodelsaredirectly obtainedfromtheStanford3DScanningRepositoryandtheGITLargeGeometryModelsArchive. In addition,wecarryouttheexperimentaltestsintwomodes:(a)Mode1:inthismode,thepreprocessing procedureisemployed,(b)Mode2: inthismode,thepreprocessingprocedureisnotused. We test the efficiency of our implementationusing the points that are randomlydistributed in the unit square and the points derived from 3D mesh models, and then compare the efficiency with that of the Qhulllibrary; see Tables2 and3. Theexperimentalresultsshowthat: ourimplementationcan achievethespeedupsofupto10.98xand8.30xintheMode1andMode2,respectively. Inaddition,it costsabout0.2secondstocomputetheconvexhullof20Mpointsinthebestcase. Table2:Comparisonofrunningtime(/ms)forpointsdistributedinasquareonK20c Ourimplementation Speedup Size Qhull Mode1 Mode2 Mode1 Mode2 100K 16 37.2 61.7 0.43 0.26 200K 32 36.4 65.7 0.88 0.49 500K 62 41.7 69.2 1.49 0.90 1M 109 44.8 73.9 2.43 1.47 2M 234 55.9 86.2 4.19 2.71 5M 561 77.9 118.5 7.20 4.73 10M 1029 124.7 161.7 8.25 6.36 20M 2262 206.0 272.5 10.98 8.30 Table3: Comparisonofrunningtime(/ms)forpointsderivedfrom3DmodelsonK20c Ourimplementation Speedup 3DModel Size Qhull Mode1 Mode2 Mode1 Mode2 Armadillo 172K 16 47.5 61.7 0.34 0.26 Angel 237K 47 50.5 62.6 0.93 0.75 SkeletonHand 327K 32 49.5 61.8 0.65 0.52 Dragon 437K 62 55.9 67.4 1.11 0.92 HappyBuddha 543K 63 56.8 76.7 1.11 0.82 TurbineBlade 882K 125 64.3 72.3 1.94 1.73 VellumManuscript 2M 219 63.2 91.5 3.47 2.39 AsianDragon 3M 359 78.4 129.8 4.58 2.77 ThaiStatue 5M 515 84.4 142.6 6.10 3.61 Lucy 14M 1404 141.3 223.3 9.94 6.29 7 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao 6 Discussion 6.1 Comparison ThebasicideasbehindourimplementationissimilartothosebehindtheimplementationofTzengand Owens[22]. Thefirstofthesameideasisthat:weperformtheDivide-and-Conqueroperationsdirectly intheinputarrays(i.e.,theinputsetsofpoints),ratherthanonadditionallyallocatedarrays.Thesecond isthat: theDivide-and-ConquerproceduresofQuickHullalgorithmarerealizedbycreating,updating, orremovingSegments. Thethirdsimilarfeatureisthat: bothoftheimplementationsaredevelopedby stronglyexploitingthedata-parallelprimitives,parallel(global)scanandparallelsegmentedscan. However,wehaveourownideas;andthereareseveralsignificantdifferences.Thefirstdifferenceis that:afterdividingtheinputsetofpointsintothelowerandtheuppersubsets,wefurthersorttheabove two subsets separately according to the x-coordinates, and maintain the relative order of those sorted points unchanged in the subsequent procedure of recursively removing interior points. In contrast, TzengandOwens[22]donotsortthesubsetsofpointsorkeeptherelativeorderofpoints. The second difference is the creating and removing of segments: Tzeng and Owens create new segments using the farthest point by permuting the points located in the old segment, while we also createnewsegmentsusingthefarthestpointbutwedonotpermutepoints. Whenremovingsegments, weatleastretainthefirstpoint(i.e.,theheadpoint)ofeachsegmentforthatthispointisdefinitelyan extremepointofthedesiredconvexhull,whileintheimplementationofTzengandOwensasegment isprobablycompletelyremoved.Thisdifferenceisduetothedifferentschemesofupdatingsegments. Anotherdifferenceisthat:weadoptthelibraryThrust[3]fortheuseofseveralefficientdata-parallel primitivessuchasparallelscan,segmentedscan,reduction,andsorting,whilein[22]TzengandOwens developtheirimplementationbystronglyexploitingthelibraryCUDPP[5]. Thereasonwhywechoose to use thelibraryThrustratherthanCUDPP isthat: ThrusthasbeenintegratedinCUDA toolkit, and canbemucheasiertobeusedinpractice. 6.2 PerformanceImpactofPreprocessing We haveobservedthat: ourimplementationexecutingintheMode1ismuchfasterthanthatinMode 2; see Figure 4. This behavior is probablydue to the facts that: (1) more than 50% input points can bediscardedinthispreprocessing;(2)theperformancebenefitfromthediscardingofinteriorpointsis morethantheperformancepenaltyleadbythediscarding. Besidestheabovementionedimprovementincomputationalefficiency,anotherbenefitofadopting thepreprocessingprocedureisthat: thememoryusageonthedevicesideismuchless. Thisisbecause that:afterthepreprocessingonlylessthan50%inputpointsremain.Therefore,inthesubsequentproce- dureofcomputingtheconvexhull,itisonlyneededtoallocatemuchlessmemoryonthedevicesidefor thearrayssuchfloatx[n],y[n],dist[n]andint keys[n],first pts[n],flag[n]. 6.3 PerformanceofEach sub-procedure Therearethreemainsub-proceduresinourimplementationwhenadoptingthepreprocessing,i.e.,the preprocessingprocedure(pre-step),thesplittingofpointsintotwosubsets(1ststep),andtherecursive procedureoffindingtheexpectedconvexhull(2ndstep). Tofindthepotentialperformancebottleneck, wehaveinvestigatedthecomputationalefficiencyofthesethreesub-procedures;seethecomputational efficiency on K20c presented in Figure 5. We have found that in most cases: (1) the most computationally expensive step is the 2nd step; (2) the most computationallyinexpensiveone is the pre-step. Therefore, the potential performance bottleneck of our implementation is probably the 2nd step, and needstobeoptimizedinfurtherwork. 8 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao 12 12 Mode 1 Mode 2 Mode 1 Mode 2 10 10 8 8 p u eedup 6 Speed 46 p S 4 2 2 0 0 100K 200K 500K 1M 2M 5M 10M 20M Size 3D Model (a) Pointsdistributedinunitsquare (b) Pointsderivedfrom3Dmeshmodels Figure4: ComparisonofspeedupsintwomodesonK20c 100 200 2nd step 1st step Pre-step 2nd step 1st step 80 160 ms) ms) e (/ 60 e (/ 120 m m (cid:127) (cid:127) ng 40 ng 80 ni ni n n u u R 20 R 40 0 0 3D Model 3D Model (a) TestforpointsderivedfrommeshmodelsinMode1 (b) TestforpointsderivedfrommeshmodelsinMode2 Figure5: Computationalefficiencyofsub-proceduresonK20c 7 Conclusion We havepresenteda novelimplementationof the two-dimensionalQuickHullalgorithmon the GPU. Inourimplementation,wehavetransformedtheDivide-and-Conquerproceduresintotheoperationsfor segments directly in the inputarrays. We have strongly utilized several efficient primitives including parallel sort, scan, segmented scan, and partitioning provided by the library Thrust. We have also evaluatedtheperformanceofourimplementationusing: (1)pointsrandomlydistributedinunitsquare and(2)pointsderivedfrom3Dmeshmodelswithorwithoutemployingthepreprocessingprocedure. We havefoundthatourimplementationcanachievea speedupofupto10.98xovertheQhulllibrary. We havealsoobservedthatitcostabout0.2stofindtheconvexhullof20Mpoints. We hopethatour workcanhelpdevelopefficientimplementationsofotherDivide-and-ConqueralgorithmsontheGPU. Acknowledgments The authors are grateful to the anonymousreferee for helpful comments that improvedthispaper. ThisresearchwassupportedbytheNaturalScienceFoundationofChina(Grant Nos. 40602037and40872183). 9 QuickHullAlgorithmontheGPU... J.Zhang,G.Mei,N.XuandK.Zhao References [1] AlexMAndrew. Another efficient algorithmfor convex hullsintwodimensions. Information Processing Letters,9(5):216–219,1979. [2] CBradfordBarber,DavidPDobkin,andHannuHuhdanpaa. Thequickhullalgorithmforconvexhulls. ACM TransactionsonMathematicalSoftware(TOMS),22(4):469–483,1996. [3] NathanBellandJaredHoberock. Thrust: Aproductivity-orientedlibraryforcuda. GPUComputingGems, 7,2011. [4] GuyEBlelloch. Vectormodelsfordata-parallelcomputing,volume356. MITpressCambridge,1990. [5] CUDPP. http://cudpp.github.io/,2014. [6] Mingcen Gao, Thanh-Tung Cao, Ashwin Nanjappa, Tiow-Seng Tan, and Zhiyong Huang. ghull: A gpu algorithmfor3dconvexhull. ACMTransactionsonMathematicalSoftware(TOMS),40(1):3,2013. [7] MingcenGao,Thanh-TungCao,Tiow-SengTan,andZhiyongHuang. Flip-flop:convexhullconstructionvia star-shapedpolyhedronin3d.InProceedingsoftheACMSIGGRAPHSymposiumonInteractive3DGraphics andGames,pages45–54.ACM,2013. [8] RonaldL.Graham. Anefficientalgorithfordeterminingtheconvexhullofafiniteplanarset. Information processingletters,1(4):132–133,1972. [9] Ray A Jarvis. On the identification of the convex hull of a finite set of points in the plane. Information ProcessingLetters,2(1):18–21,1973. [10] TomaszJurkiewiczandPiotrDanilewski. Efficientquicksort and2dconvexhullforcuda, andmsimdasa realisticmodelofmassivelyparallelcomputations,2011. [11] Michael Kallay. The complexity of incremental convex hull algorithms in r¡ sup¿ d¡/sup¿. Information ProcessingLetters,19(4):197,1984. [12] RunzongLiu,BinFang,YuanYanTang,JingWen,andJiyeQian.Afastconvexhullalgorithmwithmaximum inscribedcircleaffinetransformation. Neurocomputing,77(1):212–221,2012. [13] AvrahamAMelkman. On-lineconstructionoftheconvexhullofasimplepolyline. InformationProcessing Letters,25(1):11–12,1987. [14] NVIDIA. Thrustquickstartguide,2014. [15] Franco P. Preparataand SeJune Hong. Convex hullsof finitesets of points intwo and threedimensions. CommunicationsoftheACM,20(2):87–93,1977. [16] Qhull. www.qhull.org,2014. [17] ShubhabrataSengupta,MarkHarris,YaoZhang,andJohnDOwens. Scanprimitivesforgpucomputing. In GraphicsHardware,volume2007,pages97–106,2007. [18] DSrikanth, KishoreKothapalli, RGovindarajulu, andPNarayanan. Parallelizingtwodimensional convex hullonnvidiagpuandcellbe. InInternational conferenceonhighperformancecomputing(HiPC),pages 1–5,2009. [19] Srikanth Srungarapu, Durga Prasad Reddy, Kishore Kothapalli, and PJ Narayanan. Fast two dimensional convexhullonthegpu. InAdvancedInformationNetworkingandApplications(WAINA),pages7–12,2011. [20] AyalStein,EranGeva,andJihadEl-Sana. Cudahull: Fastparallel3dconvexhullonthegpu. Computers& Graphics,36(4):265–271,2012. [21] Min Tang, Jie-yi Zhao, Ruo-feng Tong, and Dinesh Manocha. Gpu accelerated convex hull computation. Computers&Graphics,36(5):498–506,2012. [22] Stanley Tzeng and John D Owens. Finding convex hulls using quickhull on the gpu. arXiv preprint arXiv:1201.2936,2012. [23] Jeffrey M Whiteand Kevin A Wortman. Divide-and-conquer 3d convex hulls on the gpu. arXivpreprint arXiv:1205.1171,2012. [24] Changyuan Xing, Zhongyang Xiong, Yufang Zhang, Xuegang Wu, Jingpei Dan, andTingping Zhang. An efficientconvexhullalgorithmusingaffinetransformationinplanarpointset. ArabianJournalforScience andEngineering,pages1–9,2014. 10

A Novel Implementation of QuickHull Algorithm on the GPU PDF

0.37 MB·English

by Jiayin Zhang

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download A Novel Implementation of QuickHull Algorithm on the GPU PDF Free - Full Version

by Jiayin Zhang| 0.37| English

Download A Novel Implementation of QuickHull Algorithm on the GPU by Jiayin Zhang in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About A Novel Implementation of QuickHull Algorithm on the GPU

No description available for this book.

Detailed Information

Author:	Jiayin Zhang
Language:	English
File Size:	0.37
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free A Novel Implementation of QuickHull Algorithm on the GPU Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download A Novel Implementation of QuickHull Algorithm on the GPU PDF?

Yes, on https://PDFdrive.to you can download A Novel Implementation of QuickHull Algorithm on the GPU by Jiayin Zhang completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read A Novel Implementation of QuickHull Algorithm on the GPU on my mobile device?

After downloading A Novel Implementation of QuickHull Algorithm on the GPU PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of A Novel Implementation of QuickHull Algorithm on the GPU?

Yes, this is the complete PDF version of A Novel Implementation of QuickHull Algorithm on the GPU by Jiayin Zhang. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download A Novel Implementation of QuickHull Algorithm on the GPU PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.