Table Of Content

COLLABORATIVE OLAP WITH TAG CLOUDS Web 2.0 OLAP Formalism and Experimental Evaluation KamelAouiche,DanielLemireandRobertGodin Universite´duQue´beca` Montreál,100SherbrookeWest,Montreal,Canada [email protected],[email protected],[email protected] 8 0 Keywords: OLAP,DataWarehouse,BusinessIntelligence,TagCloud,SocialWeb 0 2 Abstract: Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data n sourcesandquickperusal. Meanwhile,tagcloudsareapopularcommunity-drivenvisualizationtechnique. a Hence,weinvestigatetag-cloudviewswithsupportforOLAPoperationssuchasroll-ups,slices,dices,clus- J tering, and drill-downs. As a case study, we implemented an application where users can upload data and 4 immediatelynavigatethroughitsadhocdimensions.Tosupportsocialnetworking,viewscanbeeasilyshared 1 andembeddedinotherWebsites. Algorithmically,ourtag-cloudviewsareapproximaterangetop-kqueries overspontaneousdatacubes.Wepresentexperimentalevidencethaticebergcuboidsprovideadequateonline ] approximations.Webenchmarkseveralbrowser-oblivioustag-cloudlayoutoptimizations. B D . s c 1 INTRODUCTION ing) (Codd, 1993) is a dominant paradigm in Busi- [ ness Intelligence (BI). OLAP allows domain experts 2 The Web 2.0, or Social Web, is about making avail- tonavigatethroughaggregateddatainamultidimen- v able social software applications on the Web in an sionaldatamodel. Standardoperationsincludedrill- 6 unrestricted manner. Enabling a wide range of dis- down, roll-up, dice, and slice. The data cube (Gray 5 tributed individuals to collaborate on data analysis et al., 1996) model provides well-defined semantics 1 tasksmayleadtosignificantproductivitygains(Heer and performance optimization strategies. However, 2 et al., 2007; Wattenberg and Kriss, 2006). Sev- OLAP requires much effort from database adminis- . 0 eral companies, like SocialText and IBM, are offer- trators even after the data has been cleaned, tuned 1 ing Web 2.0 solutions dedicated to enterprise needs. and loaded: schemas must be designed in collabo- 7 The data visualization Web sites Many Eyes (IBM, ration with users having fast changing needs and re- 0 : 2007) and Swivel (Swivel, Inc, 2007) have become quirements(Bodyetal., 2002; MorzyandWrembel, v partoftheWeb2.0landscape:over1milliondatasets 2004). Vendors such as Spotfire, Business Objects Xi wereuploadedtoSwivelinlessthan3months(But- and QlikTech have reacted by proposing a new class ler,2007). of tools allowing end-user to customize their appli- r a These Web 2.0 data visualization sites use tradi- cations and to limit the need for centralized schema crafting(Havenstein,2003). tionalpiechartsandhistograms, butalsotagclouds. Tagcloudsareaformofhistogramwhichcanrepre- OLAP itself has never been formally defined senttheamplitudeofoverahundreditemsbyvarying though rules have been proposed to recognize an thefontsize. Theuseofhyperlinksmakestagclouds OLAPapplication(Codd,1993). Inasimilarmanner, naturally interactive. Tag clouds are used by many weproposerulestorecognizeWeb2.0OLAPappli- Web2.0 sitessuchas Flickr, del.icio.usand Techno- cations(seealsoTable1): rati. Increasingly,e-CommercesitessuchasAmazon 1. Dataandschemasareprovidedautonomouslyby orO’ReillyMedia, areusingtagcloudstohelptheir users. usersnavigatethroughaggregateddata. Meanwhile, OLAP (On-Line Analytical Process- 2. ItisavailableasaWebapplication. 3. It supports complete online interaction over ag- 3 OLAP FORMALISM gregatedmultidimensionaldata. 4. Usersareencouragedtocollaborate. 3.1 ConventionalOLAPFormalism Tag clouds are well suited for Web 2.0 OLAP. They are flexible: a tag cloud can represent a dozen MostOLAPenginesrelyonadatacube(Grayetal., orhundreddifferentamplitudes. Andtheyareacces- 1996). A data cube C contains a non empty set of d sible: theonlyrequirementisabrowserthatcandis- dimensions D ={Di}1≤i≤d and a non empty set of playdifferentfontsizes. measures M. Data cubes are usually derived from a fact table (see Table 2) where each dimension and Wedescribeatag-cloudformalism,asaninstance measureisacolumnandallrows(orfacts)havedis- of Web 2.0 OLAP. Since we implemented a pro- joint dimension tuples. Figure 1(a) gives tridimen- totype, technical issues will be discussed regarding sionalrepresentationofthedatacube. application design. In particular, we used iceberg cubes (Carey and Kossmann, 1997) to generate tag cloudsonlinewhenthedataandschemaareprovided Table2:Facttableexample extemporaneously. Because tag clouds are meant to Dimensions Measures conveyageneralimpression,presentingapproximate location time salesman product cost profit measuresandclusteringissufficient:weproposespe- Montreal March John shoe 100$ 10$ cific metrics to measure the quality of tag-cloud ap- Montreal December Smith shoe 150$ 30$ proximations. We conclude the paper with experi- Quebec December Smith dress 175$ 45$ mentalresultsonrealandsyntheticdatasets. Ontario April Kate dress 90$ 10$ Paris March John shoe 100$ 20$ Paris March Marc table 120$ 10$ Table1:ConventionalOLAPversusWeb2.0OLAP Paris June Martin shoe 120$ 5$ Lyon April Claude dress 90$ 10$ ConventionalOLAP Web2.0OLAP NewYork October Joe chair 100$ 10$ recurringneeds ephemeralprojects NewYork May Joe chair 90$ 10$ predefinedschemas spontaneousschemas Detroit April Jim dress 90$ 10$ centralizeddesign userinitiative histograms tagclouds plotsandreports iframes,wikis,blogs Measures can be aggregated using several opera- accesscontrol socialnetworking torssuchasAVERAGE,MAX,MIN,SUM,andCOUNT. All of these measures and dimensions are typically prespecifiedinadatabaseschema. Databaseadminis- 2 RELATED WORK tratorspreaggregateviewstoacceleratequeries. Thedatacubesupportsthefollowingoperations: There are decentralized models (Taylor and Ives, • A slice specifies that you are only interested in 2006)andsystems(Greenetal.,2007)tosupportcol- some attribute values of a given dimension. For laborativedatasharingwithoutasingleschema. example, one may want to focus on one specific According to Wu et al., it is difficult to navigate product(seeFigure1(g)).Similarly,adiceselects an OLAP schema without help; they have proposed rangesofattributevalues(seeFigure1(e)). a keyword-driven OLAP model (Wu et al., 2007). • A roll-up aggregates the measures on coarser at- There are several OLAP visualization techniques in- tributevalues. Forexample,fromthesalesgiven cluding the Cube Presentation Model (CPM) (Mani- for every store, a user may want to see the sales atis et al., 2005), Multiple Correspondence Analysis aggregatedpercountry(seeFigure1(c)). Adrill- (MCA) (Ben Messaoud et al., 2006) and other inter- downisthereverseoperation: fromthesalesper activesystems(TechapichetvanichandDatta,2005). country, one may want to explore the sales per TagcloudshavebeenpopularizedbytheWebsite storeinonecountry. Flickr launched in 2004. Several optimization op- The various specific multidimensional views in portunities exist: similar tags can be clustered to- Figure1arecalledcuboids. gether(KaserandLemire,2007),tagscanbepruned automatically (Hassan-Montero and Herrero-Solana, 3.2 Tag-CloudOLAPFormalism 2006) or by user intervention (Millen et al., 2006), tags can be indexed (Millen et al., 2006), and so on. Tag clouds can be adapted to spatio-temporal AWeb2.0OLAPapplicationshouldbesupportedby data(Russell,2006;Jaffeetal.,2006). a flexible formalism that can adapt a wide range of country location Roll-up on product {New York 10 10 country location { US Detroit 10 ALL{CFarUannSacdea{{{NLODQPyaeenuorwtteinrsabo rYeiiotocrk20 5 Ta{pbrCloehdauicrt ALL CFarannacdea{{MLOQPyanuoorteinnsabtrreieocal31001100 5 4350 Montreal1{ahcrM0iAlrpyaMALnueJL reotcbO3emDbecre0 tSimhoDeeress ALL {ahcrM iAlrp yaMALnueJL reotcbO eDbecrme time (a) OLAPdatacube (b) Tag-clouddatacube (c) OLAProll-up (d) Tag-cloudroll-up Dice on the first year semester country location country locationSlice where product=`shoe’ {New York {New York 10 10 { US Detroit { US Detroit 10 ALL France{LPyaorins 20 ALL France{LPyaorins 1010 {Quebec product {Quebec 45 Canada Ontario Ta{bClehair Canada Ontario 10 Montreal10 ShoDeress ALL Montreal10 ahcrM lrpiA yaM time ahcrM iAlrp yaM nueJ reotcbO emDbecretime { { ALL ALL (e) OLAPdice (f) Tag-clouddice (g) OLAPslice (h) Tag-cloudslice Figure1:ConventionalOLAPoperationsvs.tag-cloudOLAPoperations data loaded by users. Processing time must be rea- sonableandbatchprocessingshouldbeavoided. Unlikeinconventionaldatacubes, wedonotex- pect that most dimensions have explicit hierarchies whentheyareloaded: instead,userscanspecifyhow thedataislaidout(seeSection5). Asarelatedissue, the dimensions are not orthogonal in general: there might be a “City” dimension as a well as “Climate Figure2:User-drivenschemadesign Zone”dimension. Itisuptotheusertoorganizethe citiesperclimatezoneorpercountry. Definition1(Tag) A tag is a term or phrase de- 3.3 Tag-CloudOperations scribing an object with corresponding non-negative weightsdeterminingitsrelativeimportance.Hence,a Inoursystem,userscanuploaddata,selectadataset, tagismadeofatriplet(term,object,weight). anddefineaschemabychoosingdimensions(seeFig- Asanexample,apicturemayhavebeenattributed ure 2). Then, users can apply various operations on the tags “dog” (12 times) and “cat” (20 times). In the data using a menu bar. On the one hand, OLAP a Business Intelligence context, a tag may describe operations such as slice, dice, roll-up and drill-down thecurrentstateofabusiness. Forexample,thetags generate new tag clouds and new cuboids from ex- “USA”(16,000$)and“Canada”(8,000$)describethe isting cuboids. Figures 1(d), 1(f) and 1(h), show the salesofagivenproductbyagivensalesman. results of a roll-up, a dice, and a slice as tag clouds. Wecanaggregateseveralattributevalues,suchas Ontheotherhand,wecanapplysomeoperationson “Canada” and “March,” into a single term, such as an existing tag cloud: sort by either the weights or “Canada–March.” Atagcomposedofkattributeval- the terms of tags, remove some tags, remove lesser ues is called a k-tag. Figure 1(b) shows a tag cloud weightedtags,andsoon.Weestimatethatatagcloud representationofTable2using3-tags. shouldnothavemorethan150tags. Each tag T is represented visually using a font Tag-cloud layout has measurable benefits when size, font color, background color, area or motif, de- trying to convey a general impression (Rivadeneira pendingonitsmeasurevalues. et al., 2007). Hence, we wish to optimize the visual clusteredbycountries. Without similarity Quebec-dress Detroit-dress Paris-table Ontario-dress Paris-shoe Montreal-shoe Lyon-dress New York-chair With similarity Detroit-dress New York-chair Quebec-dress Ontario-dressMontreal-shoeParis-tableParis-shoeLyon-dress Figure3:Choosingsimilaritydimensions Figure4:Tag-cloudreorderingbasedonsimilarity arrangement of tags. Chen et al. propose the computation of similarity measures between cuboids to 4 FAST COMPUTATION help users explore data (Chen et al., 2000): we ap- plythisideatodefinesimilaritiesbetweentags. First Because only a moderate number of tags can be dis- of all, users are asked to provide one or several di- played, the computation of tag clouds is a form of mensionstheywanttousetoclusterthetags. Choos- top-k query: given any user-specified range of cells, ing the “Country” dimension would mean that the we seek the top-k cells having the largest measures. user wants the tags rearranged by countries so that There is a little hope of answering such queries in “Montreal–April” and “Toronto–March” are nearby nearconstant-timewithrespecttothenumberoffacts (seeFigure3). Theclusteringdimensionsselectedby without an index or a buffer. Indeed, finding all theusertogetherwiththetag-clouddimensionsform and only the elements with frequency exceeding a a cuboid: in our example, we have the dimensions givenfrequencythreshold(CormodeandMuthukrish- “Country,” “City,” and “Time.” Since a tag contains nan, 2005) or merely finding the most frequent ele- a set of attribute values, it has a corresponding sub- ment(Alonetal., 1996)requiresΩ(m)bitswherem cuboiddefinedbyslicingthecuboid. isthenumberofdistinctitems. Several similarity measures can be applied be- Various efficient techniques have been proposed tween subcuboids: Jaccard, Euclidean distance, co- for the related range MAX problem (Chazelle, 1988; sine similarity, Tanimoto similarity, Pearson correla- Poon, 2003), but they do not necessarily generalize. tion,Hammingdistance,andsoon. Whichsimilarity Instead, for the range top-k problem, we can parti- measure is best depends on the application at hand, tionsparsedatacubesintocustomizeddatastructures so advanced users should be given a choice. Com- to speed up queries by an order of magnitude (Luo monly, similarity measures take up values in the in- et al., 2001; Loh et al., 2002a; Loh et al., 2002b). terval[−1,1]. Similaritymeasuresareexpectedtobe We can also answer range top-k queries using RD- reflexive(f(a,a)=1),symmetric(f(a,b)= f(b,a)) trees (Chung et al., 2007) or R-trees (Seokjin et al., andtransitive: ifaissimilartob, andbissimilarto 2005). Intagclouds,precisionisnotrequiredandac- c,thenaisalsosimilartoc. curacyislessimportant;onlythemostsignificanttags Recall that given two vectors v and w, the co- aretypicallyneeded. Further, ifalltagshavesimilar sine similarity measure is defined as cos(v,w) = weights, then any subset of tag may form an accept- (cid:113) abletagcloud. ∑iviwi/ ∑iv2i ∑iw2i = v/|v|·w/|w|. The Tani- A strategy to speed up top-k queries is to moto similarity is given by ∑iviwi/(∑iv2i +∑iw2i − transform them into comparatively easier iceberg ∑iviwi); it becomes the Jaccard similarity when the queries (Carey and Kossmann, 1997). For example, vectors have binary values. Both of these measures in computing the top-10 (k=10) best vendors, one are reflexive, symmetric and transitive. Specifically, couldstartbyfindingallvendorswitharatingabove the cosine similarity is transitive by this inequality: 4/5. If there are at least 10 such vendors, then sort- (cid:112) cos(v,z)≥cos(w,z)− 1−cos(v,w)2. To general- ingthissmallerlistisenough. Ifnot, onecanrestart izetheformulasfromvectorstocuboids,itsufficesto the query, seeking vendors with a rating above 3/5. replace the single summation by one summation per Givenahistogramorselectivityestimates,wecanre- dimension. Figure 4 shows an example of tag-cloud duce the number of expected iceberg queries (Don- reorderingtoclustersimilartags. Inthisexample,the jerkovic and Ramakrishnan, 1999). Unfortunately, “City–Product”tagswerecomparedaccordingtothe thisapproachisnotnecessarilyapplicabletomultidi- “Country” dimension. The result is that the tags are mensional data since even computing iceberg aggre- Giventag-clouddata,thetag-clouddrawingprob- lem is to optimally display the tags, generally using HTML,sothatsomedesirablepropertiesaremet,in- cluding the following: (1) the screen space usage is Figure5:Exampleofnoninformativetagcloud minimized;(2)whenapplicable,similartagsareclus- teredtogether. Typically,thewidthofthetagcloudis fixed,butitsheightcanvary. For practical reasons, we do not wish for the gates once for each query may be prohibitive. How- server to send all of the data to the browser, includ- ever,icebergcuboidscanstillbeputtogooduse.That ing a possibly large number of similarity measures is, one materializes the iceberg of a cuboid, small betweentags. Hence,someofthetag-clouddrawing enough to fit in main memory, from which the tag computations must be server-bound. There are two cloudsarecomputed. Intuitively,acuboidrepresent- possiblearchitectures.Thefirstscenarioisabrowser- ing the largest measures is likely to provide reason- awareapproach(KaserandLemire,2007): giventhe able tag clouds. Users mostly notice tags with large tag-cloud data provided by the server, the browser fontsizes(Rivadeneiraetal.,2007). Agoodapprox- sends back to the server some display-specific data, imation captures the tags having significantly larger suchastheboxdimensionsofvarioustagsusingdif- weights. To determine whether a tag cloud has such ferentfontsizes. Theserverthensendsbackanopti- significanttags,wecancomputetheentropy. mized tag cloud. The second approach is browser- Definition2(Entropyofatagcloud) LetT ∈T be oblivious: the server optimizes the display of the a tag from a tag cloud T, then entropy(T) = tag cloud without any knowledge of the browser by −∑ p(T)log(p(T))where p(T)= weight(T) . passing simple display hints. The browser can then T∈T ∑x∈Tweight(x) execute a final and inexpensive display optimiza- Theentropyquantifiesthedisparityofweightsbe- tion. Whilebrowser-obliviousoptimizationisneces- tweentags. Thelowertheentropy,themoreinterest- sarily limited, it has reduced latency and it is easily ingthecorrespondingtagcloudis. Indeed,tagclouds cacheable. with uniform tag weights have maximal entropy and Browser-oblivious optimization can take many arevisuallynotveryinformative(seeFigure5). forms. For example, we could send classes of tags We can measure the quality of a low-entropy tag and instruct the browser to display them on separate cloud by measuring false positives and negatives: lines(Hassan-MonteroandHerrero-Solana,2006).In false positive happens when a tag has been falsely our system, tags are sent to the browser as an or- added to a tag cloud whereas a false negative occurs dered list, using the convention that successive tags when a tag is missing. These measures of error as- are similar and should appear nearby. Given a simi- sumethatwelimitthenumberoftagstoamoderately laritymeasurewbetweentags, wewanttominimize smallnumber. Weusethefollowingqualityindexes; ∑ w(p,q)d(p,q) where d(p,q) is a distance func- p,q indexvaluesarein[0,1]andavalueof0isideal;they tion between the two tags in the list and the sum is arenotapplicabletohigh-entropytagclouds. over all tags. Ideally, d(p,q) should be the physi- Definition3 Givenapproximateandexacttagclouds cal distance between the tags as they appear in the AandE,thefalse-positiveandfalse-negativeindexes browser; we model this distance with the index dis- are maxt∈A,t(cid:54)∈Eweight(t) and maxt∈E,t(cid:54)∈Aweight(t). tance: if tag a appears at index i in the list and maxt∈Aweight(t) maxt∈Eweight(t) tag b appears at index j, their distance is the inte- ger |i− j|. This optimization problem is an instance of the NP-complete MINIMUM LINEAR ARRANGE- 5 TAG-CLOUD DRAWING MENT (MLA) problem: an optimal linear arrange- mentofagraphG=(V,E), isamap f fromV onto While we can ensure some level of device- {1,2,...,N}minimizing∑ |f(u)−f(v)|. u,v∈V independentdisplaysontheWeb,byusingimagesor Proposition1 Thebrowser-oblivioustag-cloudopti- plugins,textdisplayinHTMLmayvarysubstantially mizationproblemisNP-Complete. frombrowsertoanother. Thereisnocommonsetof √ fontbrowsersarerequiredtosupport, andWebstan- ThereisanO( lognloglogn)-approximationfor dardsdonotdictateline-breakingalgorithmsorother the MLA problem (Feige and Lee, 2007) in some typographicalissues.Itisnotpracticaltosimulatethe instances. However, for our generic purposes, the browseronaserver.Meanwhile,ifwewishtoremain greedy NEAREST NEIGHBOR (NN) algorithm might accessibleandtoabidebyopenstandards,producing suffice: insert any tag in an empty list, then repeat- HTMLandECMAScriptisthefavoriteoption. edly append a tag most similar to the latest tag in 1000 Fromeachdataset,wegenerateda4-dimensional Original data Iceberg data cube. We used the COUNT function to aggre- 100 gatedata. Tagcloudswerecomputedfromeachdata me (seconds) 1 01 cvimualbpueleesumsoeifnnlgtiemdthitee:xitachceetbnceourgmmpbapuerptarotoifoxfniamsctaustsirionentgawitneiemthdp.odWrifafereyareltsnao-t Ti bles. Wespecifieddifferentvaluesfortag-cloudsize, 0.1 limitingthemaximumnumberoftags. Foreachice- berglimitvalueandtag-cloudsize,wecomputedthe 0.01 3 4 5 6 7 8 9 10 11 entropyofthetagcloud,thefalse-positiveandfalse- # of dimensions negativeindexes,andprocessingtimeforbothofice- Figure6: Computingtagcloudsfromoriginaldatavs. ice- bergapproximationandexactcomputation. bergs: iceberglimitvaluesetat150andtag-cloudsizeis9 WeplottedinFigure7thefalse-positiveandfalse- (USIncome2000). negative indexes as a function of the relative entropy(entropy/log(tag-cloudsize))usingvariousice- berg limit values (150, 600, 1200, 4800, and 19600) the list, until all tags have been inserted. It runs in and various tag-cloud sizes (50, 100, 150, and 200), O(n2) time where n is the number of tags. Another foratotalof20tagcloudsperdimension. TheYaxis heuristic for the MLA problem is the PAIRWISE EX- is in a logarithmic scale. Points having their in- CHANGEMONTECARLO(PWMC)method(Bhasker dexes equal to zero are not displayed. As discussed andSahni,1987): afterapplyingNN,yourepeatedly inSection4,false-positiveandfalse-negativeindexes considertheexchangeoftwotagschosenatrandom, should be low when the entropy is low. We verify permutingthemifitreducestheMLAcost. Another thatforlow-entropyvalues(< 3log(tag-cloudsize)), MONTECARLO(MC)heuristicbeginswiththeappli- 4 the indexes are always close to zero which indicates cation of NN (Johnson et al., 2004): cut the list into a good approximation. Meanwhile, small iceberg two blocks at a random location, test if exchanging cuboidscanbeprocessedmuchfaster. thetwoblocksreducestheMLAcost, ifsoproceed; repeat. 6.2 SimilarityComputation Additionaldisplayhintscanbeinsertedinthislist. Forexample,iftwotagsmustabsolutelybeveryclose toeachother,aGLUEDtokencouldbeinserted. Also, Using our two data sets, we tested the NN, PWMC, if two tags can be permuted freely in the list, then a andMCheuristicsusingboththecosineandtheTan- PERMUTABLE tokencouldbeinserted: thelistcould imoto similarity measures. From data cubes made taketheformofaPQtree(BoothandLueker,1976). of all available dimensions, we used all possible 1- tagclouds,usingsuccessivelyallotherdimensionsas clusteringdimensionforatotalof2×(18×17+42× 6 EXPERIMENTS 41)=4056 layout optimizations. The iceberg limit value was set at 150. The MC heuristic never fared better than NN, even when considering a very large Throughouttheseexperiments,weusedtheJavaver- number of random block permutations: we rejected sion1.6.0 02fromSunMicrosystemsInc. onanAp- this heuristic as ineffective. However, as Figure 8 ple MacPro machine with 2 Dual-Core Intel Xeon shows, the PWMC heuristic can sometimes signifi- processorsrunningat2.66GHzand2GiBofRAM. cantlyoutperformNNwhenalargenumber(1000)of tagexchangesareconsidered,butitonlyoutperforms 6.1 Iceberg-BasedComputation NNbymorethan20%inlessthan5%ofalllayoutop- timizations. Meanwhile,PWMCcanbeseveralorder To validate the generation of tag clouds from ice- ofmagnitudesslowerthanNN:NNis10timesfaster bergs, we have run tests over the US Income 2000 thanPWMCwith100exchangesand70timesfaster data set (Hettich and Bay, 2000) (42 dimensions than PWMC with 1000 exchanges. Computing the and about 2×105 facts) as well as a synthetic similarity function over an iceberg cuboid was mod- data set (18 dimensions and 2×104 facts) provided erately expensive (0.07s) for a small iceberg cuboid by Swivel (http://www.swivel.com/data sets/ (limit set to 150 cells): the exact computation of the show/1002247). Figure6showsthatwhilesometag- similarityfunctioncandwarfthecostoftheheuristics cloudcomputationsrequireseveralminutes,iceberg- (NN and PWMC) over a moderately large data set. basedcomputationscanbemuchfaster. InformaltestssuggestthatNNcomputedoverasmall 1 1 State(52) Country of birth (43) False-positive and false-negative indexes 0 .00 .000.111 MSuidrdnlaeCmIinteyit i((a47l 21(270602))) False-positive and false-negative indexes 0 .00 .000.111 CaHpoituasl elohsosldeA s(g 9(e19 48(970180))) 0.0001 0.0001 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 entropy/log(tag-cloud size) entropy/log(tag-cloud size) (a) Swivel (b) USIncome2000 Figure7:False-negativeandfalse-positiveindexes(0isbest,1isworst),valuesunder0.0001arenotincluded 25000 1.12e+006 No Clustering No Clustering NN NN PWMC10 PWMC10 20000 PPWWMMCC1100000 1.1e+006 PPWWMMCC1100000 1.08e+006 A cost 15000 A cost 1.06e+006 ML 10000 ML 1.04e+006 5000 1.02e+006 0 1e+006 COSINE TANIMOTO COSINE TANIMOTO (a) Displayingdimension“Givenname”andclus-(b) Displayingdimension“HHDFMX”andclus- teringby“State”(Swivel) teringby“ARACE”(USIncome2000) Figure8:MLAcostsfortwoexamples:thePWMCheuristicwasappliedusing10,100and1000randomexchanges. icebergcuboidprovidessignificantvisuallayouts. hasalinearflowsuchastimeorlatitude. Amoreap- propriateapproachistoallowtheuseofaslider(Rus- sell, 2006) tying several tag clouds, each one corre- spondingtoagivenattributevalue. 7 CONCLUSION Accordingtoourexperimentalresults,precomputing ACKNOWLEDGMENTS a single iceberg cuboid per data cube allows to generate adequate approximate tag clouds online. Com- The second author is supported by NSERC bined with modern Web technologies such as AJAX grant 261437 and FQRNT grant 112381. The third andJSON,itprovidesaresponsiveapplication.How- author is supported by NSERC grant OGP0009184 ever, we plan to make more precise the relationship and FQRNT grant PR-119731. The authors wish to betweenicebergcubes,entropy,dimensionsizes,and thankOwenKaserfromUNBforhiscontributions. our quality indexes. Yet another approach to com- pute tag clouds quickly may be to use a bitmap index (O’Neil and Quass, 1997). While we built a REFERENCES Web 2.0 with support for numerous collaborations features such as permalinks, tag-cloud embeddings Alon, N., Matias, Y., andSzegedy, M.(1996). Thespace with iframe elements, we still need to experiment complexityofapproximatingthefrequencymoments. withliveusers.Ourapproachtomultidimensionaltag InSTOC’96,pages20–29. clouds has been to rely on k-tags. However, this ap- BenMessaoud,R.,Boussaid,O.,andLoudcherRabase´da, proach might not be appropriate when a dimension S.(2006). Efficientmultidimensionaldatarepresen- tationsbasedonmultiplecorrespondenceanalysis. In Jaffe, A., Naaman, M., Tassa, T., and Davis, M. (2006). KDD’06,pages662–667. Generatingsummariesandvisualizationforlargecol- lectionsofgeo-referencedphotographs. InMIR’06, Bhasker, J. andSahni, S. (1987). Optimal linear arrange- pages89–98. ment of circuit components. J. VLSI Comp. Syst., 2(1):87–109. Johnson, D., Krishnan, S., Chhugani, J., Kumar, S., and Venkatasubramanian, S. (2004). Compressing large Body, M., Miquel, M., Be´dard, Y., and Tchounikine, A. boolean matrices using reordering techniques. In (2002). A multidimensional and multiversion struc- ture for OLAP applications. In DOLAP ’02, pages VLDB’04,pages13–23. 1–6. Kaser,O.andLemire,D.(2007).Tag-clouddrawing:Algo- Booth,K.S.andLueker,G.S.(1976). Testingforthecon- rithmsforcloudvisualization. InWWW2007–Tag- secutiveonesproperty,intervalgraphs,andgraphpla- gingandMetadataforSocialInformationOrganiza- narityusingPQ-treealgorithms.JournalofComputer tion. andSystemSciences,13:335–379. Loh,Z.,Ling,T.,Ang,C.,andLee,S.(2002a). Adaptive Butler,D.(2007). Datasharing: thenextgeneration. Na- methodforrangetop-kqueriesinOLAPdatacubes. ture,446(7131):1–10. InDEXA’02,pages648–657. Carey,M.J.andKossmann,D.(1997). Onsaying“enough Loh,Z.X.,Ling,T.W.,Ang,C.H.,andLee,S.Y.(2002b). already!”inSQL. InSIGMOD’97,pages219–230. Analysis of pre-computed partition top method for rangetop-kqueriesinOLAPdatacubes.InCIKM’02, Chazelle, B.(1988). Afunctionalapproachtodatastruc- pages60–67. turesanditsuseinmultidimensionalsearching.SIAM J.Comput.,17(3):427–462. Luo, Z., Ling, T., Ang, C., Lee, S., and Cui, B. (2001). Range top/bottom k queries in OLAP sparse data Chen, Q., Dayal, U., and Hsu, M. (2000). OLAP-based cubes. InDEXA’01,pages678–687. data mining for business intelligence applications in telecommunications and e-commerce. In DNIS ’00, Maniatis, A., Vassiliadis, P., Skiadopoulos, S., Vassiliou, pages1–19. Y.,Mavrogonatos,G.,andMichalarias,I.(2005). A presentationmodel&non-traditionalvisualizationfor Chung,Y.,Yang,W.,andKim,M.(2007). Anefficient,ro- OLAP. International Journal of Data Warehousing bustmethodforprocessingofpartialtop-k/bottom-k andMining,1:1–36. queriesusingtheRD-treeinOLAP.DecisionSupport Systems,43(2):313–321. Millen, D. R., Feinberg, J., and Kerr, B. (2006). Dogear: Social bookmarking in the enterprise. In CHI ’06, Codd,E.(1993). ProvidingOLAP(on-lineanalyticalpro- pages111–120. cessing) to user-analysis: an IT mandate. Technical report,E.F.CoddandAssociates. Morzy,T.andWrembel,R.(2004). Onqueryingversions ofmultiversiondatawarehouse.InDOLAP’04,pages Cormode,G.andMuthukrishnan,S.(2005).What’shotand what’snot:trackingmostfrequentitemsdynamically. 92–101. ACMTrans.DatabaseSyst.,30(1):249–278. O’Neil, P. and Quass, D. (1997). Improved query perfor- mancewithvariantindexes. InSIGMOD’97, pages Donjerkovic,D.andRamakrishnan,R.(1999). Probabilis- 38–49. ticoptimizationoftopnqueries. InVLDB’99,pages 411–422. Poon, C. (2003). Dynamic orthogonal range queries in OLAP. Theoretical Computer Science, 296(3):487– Feige,U.andLee,J.R.(2007). Animprovedapproxima- 510. tion ratio for the minimum linear arrangement problem. Inf.Process.Lett.,101(1):26–29. Rivadeneira, A. W., Gruen, D. M., Muller, M. J., and Millen,D.R.(2007). Gettingourheadintheclouds: Gray, J., Bosworth, A., Layman, A., and Pirahesh, H. toward evaluation studies of tagclouds. In CHI’07, (1996). Data cube: A relational aggregation opera- pages995–998. torgeneralizinggroup-by,cross-tab,andsub-total. In ICDE’96,pages152–159. Russell,T.(2006).cloudalicious:folksonomyovertime.In JCDL’06,pages364–364. Green, T. J., Karvounarakis, G., Taylor, N. E., Biton, O., Ives,Z.G.,andTannen,V.(2007).ORCHESTRA:fa- Seokjin,H.,Moon,B.,andSukho,L.(2005).Efficientexe- cilitatingcollaborativedatasharing. InSIGMOD’07, cutionofrangetop-kqueriesinaggregater-trees. IE- pages1131–1133,NewYork,NY,USA.ACM. ICE–TransactionsonInformationandSystems,E88- D(11):2544–2554. Hassan-Montero, Y. and Herrero-Solana, V. (2006). Im- provingtag-cloudsasvisualinformationretrievalin- Swivel, Inc (2007). Swivel. http://www.swivel.com. terfaces. InInSciT’06. [Online;accessed7-6-2007]. Havenstein, H. (2003). BI vendors seek to tap end-user Taylor, N. E. and Ives, Z. G. (2006). Reconciling while power: New class of tools built to reap user knowl- toleratingdisagreementincollaborativedatasharing. edgeforcustomizinganalyticapplications.InfoWorld, InSIGMOD’06,pages13–24,NewYork,NY,USA. 22:20–21. ACM. Heer, J., Vie´gas, F. B., and Wattenberg, M. (2007). Voy- Techapichetvanich,K.andDatta,A.(2005). Interactivevi- agers and voyeurs: supporting asynchronous collab- sualizationforOLAP. InICCSA’05,pages206–214. orativeinformationvisualization. InCHI’07, pages Wattenberg,M.andKriss,J.(2006). Designingforsocial 1029–1038. dataanalysis.IEEETransactionsonVisualizationand Hettich, S. and Bay, S. D. (2000). The UCI KDD ComputerGraphics,12(4):549–557. archive. http://kdd.ics.uci.edu. [Online; ac- cessed21/12/2007]. Wu, P., Sismanis, Y., and Reinwald, B. (2007). Towards keyword-driven analytical processing. In SIGMOD IBM (2007). Many Eyes. http://services. ’07,pages617–628. alphaworks.ibm.com/manyeyes/. [Online; ac- cessed7-6-2007].

Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation PDF

0.81 MB·English

by Kamel Aouiche

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation PDF Free - Full Version

by Kamel Aouiche| 0.81| English

Download Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation by Kamel Aouiche in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation

No description available for this book.

Detailed Information

Author:	Kamel Aouiche
Language:	English
File Size:	0.81
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation PDF?

Yes, on https://PDFdrive.to you can download Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation by Kamel Aouiche completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation on my mobile device?

After downloading Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation?

Yes, this is the complete PDF version of Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation by Kamel Aouiche. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.