Table Of ContentTPL Dataflow by Example
Dataflow and Reactive Programming in .Net
Matt Carkci
Thisbookisforsaleathttp://leanpub.com/tpldataflowbyexample
Thisversionwaspublishedon2014-05-29
ThisisaLeanpubbook.LeanpubempowersauthorsandpublisherswiththeLeanPublishing
process.LeanPublishingistheactofpublishinganin-progressebookusinglightweighttoolsand
manyiterationstogetreaderfeedback,pivotuntilyouhavetherightbookandbuildtractiononce
youdo.
©2014MattCarkci
Contents
OtherDataflowBooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
CodeExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
1 GettingStarted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 WhatisTPLDataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 WhatisDataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 WhereisDataflowUsed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 TPLDataflowvs.Rx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 InstallingTPLDataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 TPLDataflowBasics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 ExecutionBlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1.1 ActionBlock<T> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1.2 TransformBlock<T1,T2> . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.3 BlockConfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.4 ExecutionBlockOptions . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 BufferingBlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2.1 BufferBlock<T> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2.2 BroadcastBlock<T> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2.3 WriteOnceBlock<T> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2.4 DataflowBlockOptions . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.3 GroupingBlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.3.1 BatchBlock<T> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.3.2 JoinBlock<T1,T2> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.3.3 BatchedJoinBlock<T1,T2> . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.3.4 GroupingDataflowBlockOptions . . . . . . . . . . . . . . . . . . . . . 29
2.1.4 BlockCompletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 DataflowLinkOptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 UsingTPLDataflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1 AGeneratorBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
CONTENTS
3.2 HowMessagesareTransmitted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 RuntimeModification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 MaintainingStateInsideanExecutionBlock . . . . . . . . . . . . . . . . . . . . . . 46
3.5 ConvertingaStatefulBlocktobeStateless . . . . . . . . . . . . . . . . . . . . . . . 47
4 DataflowProgramDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 BlockDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.1 DesignforReuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2 CreateBlocksfromScratchonlywhenNecessary . . . . . . . . . . . . . . . . 48
4.1.3 DevelopYourOwnBlockInterface . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.4 CarefulwithRetainingStateinBlocks . . . . . . . . . . . . . . . . . . . . . . 49
4.2 FavorApplicationSpecificBlocksOverPredefinedBlocks . . . . . . . . . . . . . . . 49
4.3 AvoidExcessSynchronizationandBlocks . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 DealingwithLoopsandCycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.5 PreventLargeBuffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 DataShouldbeImmutable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 UseSingleProducerConstrainedifPossible . . . . . . . . . . . . . . . . . . . . . . . 51
Other Dataflow Books
In depth coverage of Dataflow and Reactive
Programmingincluding:
• Sample code for three styles of reac-
tive programming systems with chap-
tersthatexplaintheiroperations
• Summaries of Flow Based Program-
ming, the Actor Model and Communi-
catingSequentialProcesses(CSP)
• Explanation of all features found in re-
activesystems
Visit http://DataflowBook.com for more in-
formation
DataflowandReactiveProgrammingSystems
Code Examples
Allofthecodeinthisbookcanbe
downloadedfromftp://DataflowBook.com.
Visithttp://DataflowBook.comformoreinformation
andblogpostsaboutdataflowandreactiveprogramming.
[email protected]
1 Getting Started
1.1 What is TPL Dataflow
The TPL Dataflow Library (also called TDF) is built on top of the existing Task Parallel Library
(TPL). It allows you to easily create parallel applications without worrying about low-level details
byimplementingatimetestedprogrammingmodelcalled“Dataflow.”
1.2 What is Dataflow
Dataflowisaboutthemovementofdata.Itstandsincontrastto“control-flow”whichisusedinmost
mainstreamprogramminglanguageslikeC#andVisualBasic.Thecommoncontrol-flowstatements
like “if” and “for” loops do not exist in dataflow. Instead, it is the data that determines how the
programexecutes.
Dataflow consists of “blocks” and “links”. A block is a container for code and a link transfers data
from one block to another. Blocks also have inputs to receive data and outputs to send data. Think
ofablockasafunctionwhoseargumentsaretheinputsandwhosereturnvalueistheoutputofthe
block.
Whenablockhasdataonitsinput,itwillconsumethedataandpassittotheinternalfunction.The
result of the function is then placed on the block’s output. If the output of the block is connected
to the input of another block, the data is sent along the link to the input of the next block and the
processstartsalloveragain.
Thekeypointtounderstandisthatit’sthedataarrivingatablock’sinputthatcausesittoexecute.
Theprogrammerdoesn’tneedtospecificallypolltheinputtoseeifdataisavailable.Theblockreacts
to the existence of data. That is why the term “Reactive Programming” has been used for decades
todescribedataflow.
1.3 Where is Dataflow Used?
Dataflowexcelsinthreeareas:
• Processingstreamsofdata
• Parallelization
• Reactingtochangingdata
GettingStarted 2
Inanydomainwherestreamsofdataareessential,dataflowcanrepresenttheprogrammuchmore
directly than our common, imperative programming techniques. Audio and video processing are
twocommonexamples.
Evenifyouarenotdealingwithdatastreams,dataflowcanstillmakeasynchronousprocesseseasier
tobuildbyhandlingthethreadingissues.Microprocessorspeedsarenotincreasingliketheydidfrom
1980s to 2000s. But we are getting more cores per chip. The problem has always been the difficulty
of handling asynchronous computations combined with mutable state. Dataflow encapsulates the
stateinsideoftheblockswhilethechannelsbetweenblocksaretheonlydependencies.
Dataflow is also starting to take over batch processing of “big data.” Instead of running a nightly
batch,dataflowprogramsallowforreal-timeanalysisofthedata.
Recently a new term has become popular, Reactive Programming. While dataflow has been
described as reactive for long time, currently programmers use the term to mean a better way
to handle event-driven programming without callbacks. Microsoft’s Reactive Extensions lead this
resurgenceofdataflow.
1.4 TPL Dataflow vs. Rx
TheReactiveExtensions(Rx)areformanagingastreamofeventsbyusingstandardLINQoperations
whileTPLDataflowisadeptathandlingstreamsofanytypeofdata.
1.5 Installing TPL Dataflow
TPL Dataflow is distributed separately from the .Net Framework. Currently there are official
versions for .Net versions 4.5 and 3.5. For .Net 4.0 there is an unofficial port from the Mono project
butIhavenottestedit.
The recommended way to download the library is through NuGet although it is also possible to
downloaditdirectlyfromtheMicrosoftwebsite.
2 TPL Dataflow Basics
2.1 Blocks
In dataflow, blocks (or nodes) are entities that may send and receive data and are the basic unit of
composition. The TPL Dataflow Library comes with a handful of predefined blocks, while they’re
verybasic,theyshouldcover99%ofyourneeds.Usingthesepredefinedblocks,youcanbuildyour
ownapplicationspecificblocks.
Think of the predefined blocks as being equivalent to keywords in C#. You build C# applications
by using the keywords. The predefined blocks similarly define the basic operations of dataflow
programsthatyouusetobuildyourdataflowapplication.
Of the predefined blocks offered by the TPL Dataflow library, we can categorize them into three
groups, blocks that process data (execution blocks), blocks that buffer or store data (buffer blocks)
and blocks that group data together into collections (grouping blocks). In the follow sections we’ll
examineeachcategoryofblockanddiscoverhowtheyworkwithsimplecodeexamples.
2.1.1 Execution Blocks
Executionblocksprocessdataverysimilartohowmethodsacceptdataandpossiblyreturnsavalue.
At creation you pass either aFunc or an Action that defines what the execution block will do with
thedata.
Allexecutionblockscontainaninternalbufferthatdefaultstoanunboundedcapacity.
2.1.1.1 ActionBlock<T>
AnActionBlock<T>hasasingleinputandnooutput.Itisusedwhenyouneedtodosomethingwith
the input data but won’t need to pass it along to other blocks. It is the equivalent to theAction<T>
class.Indataflow,thistypeofblockisoftencalleda“sink”becausethedatasinksintoitlikeablack
hole,nevertoemergeagain.
Atcreation,anActionBlock<T>acceptsanAction<T>thatiscalledwhendataarrivesattheinput.
AninternalbufferispresentontheinputofanActionBlock<T>.Thebufferdefaultstoanunbounded
capacitybutthiscanbechangedbyusingDataflowBlockOptionsmentionedlater.
ActionBlock<T>Example1
TPLDataflowBasics 4
Basicusage,Blockthreading,Post()
1 using System;
2 using System.Threading.Tasks.Dataflow;
3
4 namespace TPLDataflowByExample
5 {
6 static class ActionBlockExample1
7 {
8 static public void Run() {
9
10 var actionBlock = new ActionBlock<int>(n => Console.WriteLine(n));
11
12 for (int i = 0; i < 10; i++) {
13 actionBlock.Post(i);
14 }
15
16 Console.WriteLine("Done");
17 }
18 }
19 }
This example shows the basic usage of an ActionBlock<T> and how to send data to all types of
blocksthatacceptinputs.
ThePost()functionsendsdatasynchronouslytoblocksandreturnstrueifthedatawassuccessfully
accepted. If the block refuses the data, the function returns false and it will not attempt to resend
it.
Inthisexamplethenumbers0through9arepushedtoactionBlock.Theblocktakeseachvalueand
calls the Action<T> that was given at creation. Since, in this example, our action simply prints the
receiveddata,theoutputtotheconsolelookslike:
Done
0
1
2
3
4
5
6
..