Table Of ContentMasterofScienceThesisinElectricalEngineering&BiomedicalEngineering
DepartmentofBiomedicalEngineering,LinköpingUniversity,2018
Generative Adversarial
Networks for
Image-to-Image Translation
on Street View and MR
Images
Simon Karlsson & Per Welander
MasterofScienceThesisinElectricalEngineering&BiomedicalEngineering
GenerativeAdversarialNetworksforImage-to-ImageTranslationonStreet
ViewandMRImages
SimonKarlsson&PerWelander
LIU-IMT-TFK-A—18/554—SE
Supervisor: MartinDanelljan
isy,LinköpingUniversity
PerCronvall
Veoneer
GustavJagbrant
Veoneer
Examiner: AndersEklund
imt,LinköpingUniversity
DepartmentofBiomedicalEngineering
LinköpingUniversity
SE-58183Linköping,Sweden
Copyright©2018SimonKarlsson&PerWelander
Abstract
GenerativeAdversarialNetworks(GANs)isadeeplearningmethodthathasbeen
developedforsynthesizingdata. Oneapplicationforwhichitcanbeusedforis
image-to-imagetranslations. Thiscouldprovetobevaluablewhentrainingdeep
neural networks for image classification tasks. Two areas where deep learning
methods are used are automotive vision systems and medical imaging. Auto-
motive vision systems are expected to handle a broad range of scenarios which
demandtrainingdatawithahighdiversity. Thescenariosinthemedicalfieldare
fewerbuttheproblemisinsteadthatitisdifficult,timeconsumingandexpensive
tocollecttrainingdata.
This thesis evaluates different GAN models by comparing synthetic MR images
produced by the models against ground truth images. A perceptual study is
also performed by an expert in the field. It is shown by the study that the im-
plemented GAN models can synthesize visually realistic MR images. It is also
shownthatmodelsproducingmorevisuallyrealisticsyntheticimagesnotneces-
sarilyhavebetterresultsinquantitativeerrormeasurements,whencomparedto
ground truth data. Along with the investigations on medical images, the thesis
explores the possibilities of generating synthetic street view images of different
resolution,lightandweatherconditions. DifferentGANmodelshavebeencom-
pared,implementedwithourownadjustments,andevaluated. Theresultsshow
thatitispossibletocreatevisuallyrealisticimagesfordifferenttranslationsand
imageresolutions.
iii
Acknowledgments
We would like to express our thanks to the Classification team at Veoneer for a
greatenvironmenttoworkin,forsharingtheirexperienceandinvitingustobe
a part of their Monday fika. Special thanks goes to Michael Sörsäter, Johan Byt-
tner and Juozas Vaicenavicius for interesting discussions and inputs regarding
the work. Our greatest thanks go to our supervisors Per Cronvall and Gustav
Jagbrantfortheirinterestandcontinuoussupportthroughoutthework. Theyac-
curatelyansweredourquestionsandtooktimetoexplorethisexitingfieldwith
us. Ourinterestofmakingthisthesisacombinationofthemedicalandautomo-
tivefieldwouldnothavebeenpossiblewithoutErikaAnderssonandwewould
like to give special thanks for her collaboration and always meeting us with a
smileattheoffice.
WealsowanttothankouracademicsupervisorMartinDanelljanforthevaluable
discussions we had throughout the work and for all the feedback on the thesis.
Finally,wewouldliketothankourexaminerAndersEklundforparticipatingas
anexpertintheperceptualstudyandformakingthisprojectpossible.
Linköping,June2018
SimonKarlsson&PerWelander
v
Contents
Notation ix
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Delimitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 ThesisOutline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Theoryandrelatedwork 9
2.1 Convolutionalneuralnetworks . . . . . . . . . . . . . . . . . . . . 9
2.2 Lossfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 ResidualBlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Variationalautoencoders . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Generativeadversarialnetworks . . . . . . . . . . . . . . . . . . . . 12
2.7 GAN based Image-to-Image Translation using unpaired training
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 DataaugmentationusingGANs . . . . . . . . . . . . . . . . . . . . 16
2.9 GANsinmedicalimages . . . . . . . . . . . . . . . . . . . . . . . . 17
2.10 HighresolutionstreetviewimagesbyGANs . . . . . . . . . . . . . 17
3 Method 19
3.1 ComparisonofunsupervisedGANs . . . . . . . . . . . . . . . . . . 19
3.2 Image-to-ImagetranslationusingCycleGAN. . . . . . . . . . . . . 21
3.2.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Image-to-ImagetranslationusingUNIT . . . . . . . . . . . . . . . 26
3.3.1 ModelFramework . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Variationalautoencoder . . . . . . . . . . . . . . . . . . . . 28
3.3.3 Weight-sharinglayers . . . . . . . . . . . . . . . . . . . . . . 29
3.3.4 GANcomponents . . . . . . . . . . . . . . . . . . . . . . . . 29
vii
viii Contents
3.3.5 Cycle-consistency . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.6 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 EvaluationProcedure 35
4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 MRIdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Streetviewdata . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 SyntheticMRdataevaluation . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Quantitativeevaluation . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Qualitativeevaluation . . . . . . . . . . . . . . . . . . . . . 39
4.3 Syntheticstreetviewdataevaluation . . . . . . . . . . . . . . . . . 40
4.3.1 Trainingwithidentitymapping . . . . . . . . . . . . . . . . 41
4.3.2 Higherresolutionimages . . . . . . . . . . . . . . . . . . . . 41
4.3.3 Largerdataset . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.4 Othertranslations . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Results 43
5.1 SyntheticMRdataresults. . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Quantitativeresults . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.2 Qualitativeresults . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Syntheticstreetviewdataresults . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Resultsafteridentitytraining . . . . . . . . . . . . . . . . . 52
5.2.2 Higherresolutionimages . . . . . . . . . . . . . . . . . . . . 55
5.2.3 Resultsonlargerdataset . . . . . . . . . . . . . . . . . . . . 57
5.2.4 Othertranslations . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Discussion 61
6.1 ResultsonMRimages . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.1.1 Quantitativeresults . . . . . . . . . . . . . . . . . . . . . . . 61
6.1.2 Qualitativeresults . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Resultsonstreetviewimages . . . . . . . . . . . . . . . . . . . . . 64
6.2.1 Identitytraining . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.2 Higherresolutionimages . . . . . . . . . . . . . . . . . . . . 65
6.2.3 Effectsofalargerdataset . . . . . . . . . . . . . . . . . . . . 66
6.2.4 Othertranslations . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7 Conclusion 69
7.1 Answerstotheresearchquestions . . . . . . . . . . . . . . . . . . . 69
7.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.3 Futurework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography 73
Notation
Abbreviations
Abbreviation Definition
GAN GenerativeAdversarialNetwork
MRI Magneticresonanceimaging
VAE VariationalAutoencoder
ReLU Rectifiedlinearunit
MAE Meanabsoluteerror
MSE Meansquarederror
PSNR Peaksignalnoiseratio
MI Mutualinformation
Modelparameters
Notation Definition
A,B Differentimagedomains.
a RealimagefromdomainA.
b RealimagefromdomainB.
ˆ Indicates a synthetic image. The letter specifies the
image domain, e.g. aˆ is a synthetic image in domain
A.
Gen orG Generatornetwork.
Dis orD Discriminatornetwork.
Enc orE Encodernetwork.
E_Z Encoder network with shared weights between do-
mains.
DE_Z Decoder network with shared weights between do-
mains.
ix
Description:Generative Adversarial Networks (GANs) is a deep learning method that has been Along with the investigations on medical images, the thesis .. created graphical landscapes could alleviate the manual labor, with no limitation.