Table Of Content

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Deep Neural Networks for Inverse De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug Reactions EVA-LISA MELDAU KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Deep Neural Networks for Inverse De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug Reactions EVA-LISA MELDAU Master in Computer Science Date: February 24, 2018 Supervisor at KTH: Joel Brynielsson Supervisor at the Uppsala Monitoring Centre: Niklas Norén Examiner: Olle Bälter Swedish title: Djupa neuronnät för omvänd avidentifiering av medicinska fallbeskrivningar i biverkningsrapporter School of Electrical Engineering and Computer Science iii Abstract Medical research requires detailed and accurate information on indi- vidual patients. This is especially so in the context of pharmacovigilance which amongst others seeks to identify previously unknown adverse drug reactions. Here, the clinical stories are often the start- ing point for assessing whether there is a causal relationship between the drug and the suspected adverse reaction. Reliable automatic de- identification of medical case narratives could allow to share this patient data without compromising the patient’s privacy. Current re- searchonde-identificationfocusedonsolvingthetaskoflabellingthe tokens in a narrative with the class of sensitive information they be- longto. In this Master’s thesis project, we explore an inverse approach to thetaskofde-identification. Thismeansthatde-identificationofmed- ical case narratives is instead understood as identifying tokens which do not need to be removed from the text in order to ensure patient confidentiality. Ourresultsshowthatthisapproachcanleadtoamore reliable method in terms of higher recall. We achieve a recall of sensitive information of 99.1% while the precision is kept above 51% for the 2014-i2b2 benchmark data set. The model was also fine-tuned on case narratives from reports of suspected adverse drug reactions, wherearecallofsensitiveinformationofmorethan99%wasachieved. Although the precision was only at a level of 55%, which is lower than in comparable systems, an expert could still identify informa- tionwhichwouldbeusefulforcausalityassessmentinpharmacovigi- lanceinmostofthecasenarrativeswhichwerede-identifiedwithour method. Inmorethan50%ofthecasenarrativesnoinformationuseful forcausalityassessmentwasmissingatall. iv Sammanfattning Tillgångtilldetaljeradekliniskadataärenförutsättningförattbedriva medicinsk forskning och i förlängningen hjälpa patienter. Säker avidentifiering av medicinska fallbeskrivningar kan göra det möjligt att delasådaninformationutanattäventyrapatientersskyddavpersonli- ga data. Tidigare forskning inom området har sökt angripa problemet genom att märka ord i en text med vilken typ av känslig information de förmedlar. I detta examensarbete utforskar vi möjligheten att angripa problemet på omvänt vis genom att identifiera de ord som inte behöver avlägsnas för att säkerställa skydd av känslig patientinfor- mation. Våra resultat visar att detta kan avidentifiera en större andel av den känsliga informationen: 99,1% av all känslig information avi- dentifieras med vår metod, samtidigt som 51% av alla uteslutna ord verkligen förmedlar känslig information, vilket undersökts för 2014- i2b2 jämförelse datamängden. Algoritmen anpassades även till fallbe- skrivningarfrånbiverkningsrapporter,ochidettafallavidentifierades 99,1%avallkänsliginformationmedan55%avallauteslutnaordför- medlar känslig information. Även om denna senare andel är lägre än förjämförbarasystemsåkundeenexperthittainformationsomäran- vändbar för kausalitetsvärdering i flertalet av de avidentifierade rap- porterna; i mer än hälften av de avidentifierade fallbeskrivningarna saknadesingeninformationmedvärdeförkausalitetsvärdering. Contents 1 Introduction 1 1.1 PurposeandProblemStatement . . . . . . . . . . . . . . 2 2 Background 4 2.1 Pharmacovigilance . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 CausalityAssessment . . . . . . . . . . . . . . . . 5 2.1.2 WorldHealthOrganization(WHO)International DrugMonitoringProgramme . . . . . . . . . . . . 8 2.1.3 VigiBase . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 ProtectedHealthInformation . . . . . . . . . . . . . . . . 11 2.2.1 Health Insurance Portability and Accountability Act . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 EuropeanUnion . . . . . . . . . . . . . . . . . . . 11 2.2.3 ComparisonBetweenCountries . . . . . . . . . . 13 2.3 RelatedWork: De-IdentificationSystems . . . . . . . . . . 15 2.3.1 SystemsUsingHand-EngineeredFeatures . . . . 16 2.3.2 FeatureLearningNeuralNetworkSystems . . . . 18 2.3.3 InverseApproachSystems . . . . . . . . . . . . . 20 3 Theory 22 3.1 ArtificialNeuralNetworks . . . . . . . . . . . . . . . . . . 23 3.2 DeepLearning . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 FeatureLearning . . . . . . . . . . . . . . . . . . . 24 3.2.2 Pre-TrainingandFine-Tuning . . . . . . . . . . . . 25 3.3 DeepFeedForwardNeuralNetworks . . . . . . . . . . . 26 3.3.1 Training . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 RecurrentNeuralNetworks . . . . . . . . . . . . . . . . . 30 3.4.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . 35 v vi CONTENTS 3.4.3 BidirectionalRecurrentNeuralNetwork . . . . . 38 3.4.4 LongShort-TermMemory . . . . . . . . . . . . . . 39 3.5 WordVectors . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.6 Linear-ChainConditionalRandomField . . . . . . . . . . 43 3.7 EvaluationMeasures . . . . . . . . . . . . . . . . . . . . . 43 4 Methodology 45 4.1 DataSets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 2014-i2b2DataSet . . . . . . . . . . . . . . . . . . 46 4.1.2 VigiBaseDataSet . . . . . . . . . . . . . . . . . . . 48 4.2 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.1 WHODrug . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.2 MedicalDictionaryforRegulatoryActivities . . . 49 4.3 De-IdentificationMethods . . . . . . . . . . . . . . . . . . 49 4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 AnnotatingandPre-Processing . . . . . . . . . . . 52 4.3.3 Rule-BasedApproachUsingDictionaryLook-ups 52 4.3.4 DeepLearningApproachUsingLongShort-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.5 CombinationStrategy . . . . . . . . . . . . . . . . 62 4.3.6 ModelSelection . . . . . . . . . . . . . . . . . . . . 63 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.1 Recall and Precision for Protected Health Infor- mation . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.2 RetainmentofValuableInformation . . . . . . . . 67 5 Results 68 5.1 2014-i2b2DataSet . . . . . . . . . . . . . . . . . . . . . . . 68 5.1.1 EvaluationoftheHybridDe-Identifier . . . . . . 68 5.1.2 EvaluationoftheDeepDe-Identifier . . . . . . . . 70 5.1.3 Comparisons . . . . . . . . . . . . . . . . . . . . . 70 5.1.4 ResultsPerCategory . . . . . . . . . . . . . . . . . 73 5.1.5 ExampleOutputs . . . . . . . . . . . . . . . . . . . 74 5.2 VigiBaseDataSet . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 GeneralResults . . . . . . . . . . . . . . . . . . . . 79 5.2.2 ResultsPerCategory . . . . . . . . . . . . . . . . . 80 5.2.3 ExamplesofLeakedProtectedHealthInformation 83 5.2.4 ValuableInformationforCausalityAssessment . 84 6 Discussion 85 CONTENTS vii 7 Conclusion 95 Bibliography 96

Description:

Thus, sharing this information in form of electronic medical records or digital reports . with doses and dates which the patient was taking, indication for treat- centres and to the pharmaceutical companies which hold the market- .. Deep neural networks, artificial neural networks with multiple

Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... PDF

114 Pages·2017·0.75 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Download Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... PDF Free - Full Version

by Unknow| 2017| 114 pages| 0.75| English

Download Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ...

Detailed Information

Author:	Unknown
Publication Year:	2017
Pages:	114
Language:	English
File Size:	0.75
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... PDF?

Yes, on https://PDFdrive.to you can download Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... on my mobile device?

After downloading Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ...?

Yes, this is the complete PDF version of Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Automatic De-Identification of Medical Case Narratives in Reports of Suspected Adverse Drug ... PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.