Table Of ContentLinköpingStudiesinScienceandTechnology
ThesisNo.1361
Completing the Picture — Fragments and
Back Again
by
Martin Karresand
SubmittedtoLinköpingInstituteofTechnologyatLinköpingUniversityinpartial
fulfilmentoftherequirementsforthedegreeofLicentiateofEngineering
DepartmentofComputerandInformationScience
Linköpingsuniversitet
SE-58183Linköping,Sweden
Linköping2008
Completing the Picture — Fragments and Back
Again
by
MartinKarresand
May2008
ISBN978-91-7393-915-7
LinköpingStudiesinScienceandTechnology
ThesisNo.1361
ISSN0280–7971
LiU–Tek–Lic–2008:19
ABSTRACT
Bettermethodsandtoolsareneededinthefightagainstchildpornography. Thisthesispresentsa
methodforfiletypecategorisationofunknowndatafragments,amethodforreassemblyofJPEG
fragments,andtherequirementsputonanartificialJPEGheaderforviewingreassembledimages.
Toenableempiricalevaluationofthemethodsanumberoftoolsbasedonthemethodshavebeen
implemented.
ThefiletypecategorisationmethodidentifiesJPEGfragmentswithadetectionrateof100%anda
falsepositivesrateof0.1%.Themethodusesthreealgorithms,ByteFrequencyDistribution(BFD),
RateofChange(RoC),and2-grams.Thealgorithmsaredesignedfordifferentsituations,depending
ontherequirementsathand.
Thereconnectionmethodcorrectlyreconnects97%ofaRestart(RST)markerenabledJPEGimage,
fragmentedinto4KiBlargepieces. Whendealingwithfragmentsfromseveralimagesatonce,the
methodisabletocorrectlyconnect70%ofthefragmentsatthefirstiteration.
TwoparametersinaJPEGheaderarecrucialtothequalityoftheimage;thesizeoftheimageand
thesamplingfactor(actuallyfactors)oftheimage. Thesizecanbefoundusingbruteforceandthe
samplingfactorsonlytakeonthreedifferentvalues. HenceitispossibletouseanartificialJPEG
headertoviewfullofpartsofanimage. TheonlyrequirementisthatthefragmentscontainRST
markers.
Theresultsoftheevaluationsofthemethodsshowthatitispossibletofind,reassemble,andview
JPEGimagefragmentswithhighcertainty.
ThisworkhasbeensupportedbyTheSwedishDefenceResearchAgencyandtheSwedishArmedForces.
DepartmentofComputerandInformationScience
Linköpingsuniversitet
SE-58183Linköping,Sweden
Acknowledgements
This licentiate thesis would not have been written without the invaluable sup-
portofmysupervisorProfessorNahidShahmehri. Iwouldliketothankherfor
keepingmeandmyresearchontrackandhavingfaithinmewhenthegoinghas
been tough. She is a good role model and always gives me support, encourage-
ment,andinspirationtobringmyresearchforward.
Many thanks go to Helena A, Jocke, Jonas, uncle Lars, Limpan, Micke F,
MickeW,Mirko, andMårten. Withouthesitationyouletmeintoyourhomes
throughthelensesofyourcameras. Ifapictureisworthathousandwords,Iowe
yourmorethanninemillions! IalsoowealotofwordstoBrittanyShahmehri.
Herpromptandthoroughproof-readinghasindeedincreasedthereadabilityof
mythesis.
I would also like to thank my colleagues at the Swedish Defence Research
Agency(FOI),myfriendsattheNationalLaboratoryofForensicScience(SKL)
andtheNationalCriminalInvestigationDepartment(RKP),andmyfellowPhD
studentsattheLaboratoryforIntelligentInformationSystems(IISLAB)andthe
DivisionforDatabaseandInformationTechniques(ADIT). Youinspiredmeto
embarkonthisjourney. Thankyouall,youknowwhoyouare!
AndlastbutnotleastIwouldliketothankmybelovedwifeHelenaandour
lovelynewborndaughter. Youbringhappinessandjoytomylife.
FinallyIacknowledgethefinancialsupportbyFOIandtheSwedishArmed
Forces.
MartinKarresand
Linköping,14th April2008
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 OutlineofMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 OutlineofThesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 IdentifyingFragmentTypes 9
2.1 CommonAlgorithmicFeatures . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Lengthofdataatoms . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 MeasuringDistance . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 ByteFrequencyDistribution . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 RateofChange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 2-Grams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 MicrosoftWindowsPEfiles. . . . . . . . . . . . . . . . . . . . 25
2.5.2 Encryptedfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.3 JPEGfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.4 MP3files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.5 Zipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.6 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.1 MicrosoftWindowsPEfiles. . . . . . . . . . . . . . . . . . . . 32
2.6.2 Encryptedfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6.3 JPEGfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.4 MP3files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.5 Zipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.6 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
i
3 PuttingFragmentsTogether 43
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 ParametersUsed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Correctdecoding. . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.3 Non-zerofrequencyvalues . . . . . . . . . . . . . . . . . . . . 50
3.3.4 LuminanceDCvaluechains . . . . . . . . . . . . . . . . . . . 51
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.1 Singleimagereconnection . . . . . . . . . . . . . . . . . . . . . 53
3.4.2 Multipleimagereconnection . . . . . . . . . . . . . . . . . . . 53
3.5 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.1 Singleimagereconnection . . . . . . . . . . . . . . . . . . . . . 54
3.5.2 Multipleimagereconnection . . . . . . . . . . . . . . . . . . . 57
4 ViewingDamagedJPEGImages 59
4.1 StartofFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 DefineQuantizationTable . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 DefineHuffmanTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 DefineRestartInterval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 StartofScan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 CombinedErrors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.7 UsinganArtificialJPEGHeader . . . . . . . . . . . . . . . . . . . . . 75
4.8 ViewingFragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Discussion 79
5.1 FileTypeCategorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 FragmentReconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 ViewingFragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6 RelatedWork 85
7 FutureWork 93
7.1 TheFileTypeCategorisationMethod . . . . . . . . . . . . . . . . . . 94
7.2 TheImageFragmentReconnectionMethod . . . . . . . . . . . . . . 95
7.3 ArtificialJPEGHeader . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Bibliography 97
A Acronyms 103
B HardDiskAllocationStrategies 105
C ConfusionMatrices 107
ii
List of Figures
2.1 Bytefrequencydistributionof.exe . . . . . . . . . . . . . . . . . . . . 13
2.2 BytefrequencydistributionofGPG . . . . . . . . . . . . . . . . . . . 13
2.3 BytefrequencydistributionofJPEGwithRST . . . . . . . . . . . . 14
2.4 BytefrequencydistributionofJPEGwithoutRST . . . . . . . . . . 15
2.5 BytefrequencydistributionofMP3. . . . . . . . . . . . . . . . . . . . 15
2.6 BytefrequencydistributionofZip . . . . . . . . . . . . . . . . . . . . 16
2.7 RateofChangefrequencydistributionfor.exe . . . . . . . . . . . . 18
2.8 RateofChangefrequencydistributionforGPG . . . . . . . . . . . 18
2.9 RateofChangefrequencydistributionforJPEGwithRST . . . . 19
2.10 RateofChangefrequencydistributionforMP3. . . . . . . . . . . . 20
2.11 RateofChangefrequencydistributionforZip . . . . . . . . . . . . 20
2.12 2-gramfrequencydistributionfor.exe . . . . . . . . . . . . . . . . . . 22
2.13 BytefrequencydistributionofGPGwithCAST5 . . . . . . . . . . 25
2.14 ROCcurvesforWindowsPEfiles. . . . . . . . . . . . . . . . . . . . . 33
2.15 ROCcurvesforanAESencryptedfile . . . . . . . . . . . . . . . . . . 34
2.16 ROCcurvesforfilesJPEGwithoutRST . . . . . . . . . . . . . . . . 34
2.17 ROCcurvesforJPEGwithoutRST;2-gramalgorithm . . . . . . . 35
2.18 ROCcurvesforfilesJPEGwithRST. . . . . . . . . . . . . . . . . . . 36
2.19 ROCcurvesforMP3files . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.20 ROCcurvesforMP3files;0.5%falsepositives. . . . . . . . . . . . . 38
2.21 ROCcurvesforZipfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.22 Contourplotfora2-gramZipfilecentroid . . . . . . . . . . . . . . . 40
3.1 Thefrequencydomainofadataunit . . . . . . . . . . . . . . . . . . . 45
3.2 Thezig-zagorderingofadataunittraversal . . . . . . . . . . . . . . 46
3.3 Thescanpartbinaryformatcoding. . . . . . . . . . . . . . . . . . . . 49
4.1 Theoriginalundamagedimage . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 TheStartOfFrame(SOF)markersegment. . . . . . . . . . . . . . . 60
4.3 Quantizationtableswithswappedsamplerate . . . . . . . . . . . . . 62
4.4 Luminancetablewithhighsamplerate. . . . . . . . . . . . . . . . . . 62
4.5 Luminancetablewithlowsamplerate . . . . . . . . . . . . . . . . . . 64
4.6 Swappedchrominancecomponentidentifiers . . . . . . . . . . . . . 64
4.7 Swappedluminanceandchrominancecomponentidentifiers . . . 65
4.8 Moderatelywrongimagewidth . . . . . . . . . . . . . . . . . . . . . . 65
iii
4.9 TheDefineQuantizationTable(DQT)markersegment . . . . . . 66
4.10 LuminanceDCcomponentsetto0xFF . . . . . . . . . . . . . . . . . 68
4.11 ChrominanceDCcomponentsetto0xFF . . . . . . . . . . . . . . . 68
4.12 TheDefineHuffmanTable(DHT)markersegment . . . . . . . . . 69
4.13 ImagewithforeignHuffmantablesdefinition . . . . . . . . . . . . . 71
4.14 TheDefineRestartInterval(DRI)markersegment. . . . . . . . . . 71
4.15 Shortrestartintervalsetting. . . . . . . . . . . . . . . . . . . . . . . . . 71
4.16 TheStartOfScan(SOS)markersegment . . . . . . . . . . . . . . . . 72
4.17 LuminanceDCHuffmantablesettochrominanceditto . . . . . . 74
4.18 CompleteexchangeofHuffmantablepointers . . . . . . . . . . . . 74
4.19 Acorrectsequenceoffragments . . . . . . . . . . . . . . . . . . . . . . 78
4.20 Anincorrectsequenceoffragments . . . . . . . . . . . . . . . . . . . . 78
5.1 Possiblefragmentparts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
iv
Description:fragments, and the requirements put on an artificial JPEG header for viewing reassembled images. To enable and the National Criminal Investigation Department (RKP), and my fellow PhD students at the ages of portrait and landscape orientation are shown on separate lines, because their image