I am parsing XML files using XML :: LibXML. For the following XML entry, I get an error:
Malformed UTF-8 character (fatal) at C:/Perl64/site/lib/XML/LibXML/Error.pm line 217
which the
$context=~s/[^\t]/ /g;
The entry in XML is as follows
<MedlineCitation Owner="NLM" Status="MEDLINE"> <PMID Version="1">15177811</PMID> <DateCreated> <Year>2004</Year> <Month>06</Month> <Day>04</Day> </DateCreated> <DateCompleted> <Year>2004</Year> <Month>08</Month> <Day>11</Day> </DateCompleted> <DateRevised> <Year>2011</Year> <Month>04</Month> <Day>07</Day> </DateRevised> <Article PubModel="Print"> <Journal> <ISSN IssnType="Print">0278-2626</ISSN> <JournalIssue CitedMedium="Print"> <Volume>55</Volume> <Issue>2</Issue> <PubDate> <Year>2004</Year> <Month>Jul</Month> </PubDate> </JournalIssue> <Title>Brain and cognition</Title> <ISOAbbreviation>Brain Cogn</ISOAbbreviation> </Journal> <ArticleTitle>Efficiency of orientation channels in the striate cortex for distributed categorization process.</ArticleTitle> <Pagination> <MedlinePgn>352-4</MedlinePgn> </Pagination> <Affiliation>Cognitive Science Department, Université de Liège, Belgium. mmermillod@ulg.ac.be</Affiliation> <AuthorList CompleteYN="Y"> <Author ValidYN="Y"> <LastName>Mermillod</LastName> <ForeName>Martial</ForeName> <Initials>M</Initials> </Author> <Author ValidYN="Y"> <LastName>Chauvin</LastName> <ForeName>Alan</ForeName> <Initials>A</Initials> </Author> <Author ValidYN="Y"> <LastName>Guyader</LastName> <ForeName>Nathalie</ForeName> <Initials>N</Initials> </Author> </AuthorList> <Language>eng</Language> <PublicationTypeList> <PublicationType>Journal Article</PublicationType> </PublicationTypeList> </Article> <MedlineJournalInfo> <Country>United States</Country> <MedlineTA>Brain Cogn</MedlineTA> <NlmUniqueID>8218014</NlmUniqueID> <ISSNLinking>0278-2626</ISSNLinking> </MedlineJournalInfo> <CitationSubset>IM</CitationSubset> <CommentsCorrectionsList> <CommentsCorrections RefType="ErratumIn"> <RefSource>Brain Cogn. 2005 Jul;58(2):245</RefSource> </CommentsCorrections> <CommentsCorrections RefType="RepublishedIn"> <RefSource>Brain Cogn. 2005 Jul;58(2):246-8</RefSource> <PMID Version="1">16044513</PMID> </CommentsCorrections> </CommentsCorrectionsList> <MeshHeadingList> <MeshHeading> <DescriptorName MajorTopicYN="Y">Neural Networks (Computer)</DescriptorName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Neurons</DescriptorName> <QualifierName MajorTopicYN="N">physiology</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Orientation</DescriptorName> <QualifierName MajorTopicYN="Y">physiology</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Pattern Recognition, Visual</DescriptorName> <QualifierName MajorTopicYN="Y">physiology</QualifierName> </MeshHeading> <MeshHeading> <DescriptorName MajorTopicYN="N">Visual Cortex</DescriptorName> <QualifierName MajorTopicYN="Y">physiology</QualifierName> </MeshHeading> </MeshHeadingList> </MedlineCitation>
But what I want from this entry is PMID, DateRevised, PubDate, ArticleTitle, CommentsCorrectionList and MeshHeadingList. But if I delete Affiliation that contains some other character, this error will no longer be. How to fix this error?
perl parsing utf-8 xml-libxml
smandape Oct 05 2018-11-11T00: 00Z
source share