Aperture Pipelet - Processing Config - Extended Metadata Extraction

eccenca Documentation

Aperture Pipelet - Processing Config - Extended Metadata Extraction

Metadata extraction based on XPath extraction from RDF produced by Aperture. RDF contains information common for all document types as well as document type specific information. In this section we will show how to extract common information.

Namespaces and Prefixes used

Prefix used Namespace
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
nie http://www.semanticdesktop.org/ontologies/2007/01/19/nie#
nco http://www.semanticdesktop.org/ontologies/2007/03/22/nco#
nfo http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#

Metadata properties XPaths used

Property Description XPath
Generator document generator (/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:generator)[1]
Title document title (/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:title)[1]
Subject document subject /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:subject
Description document description /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:description
Creator fullename of user created document (/rdf:RDF/rdf:Description[@rdf:about = /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nco:creator/@rdf:resource]/nco:fullname)[1]
Contributor fullname of user contributed document (/rdf:RDF/rdf:Description[@rdf:about = /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nco:contributor/@rdf:resource]/nco:fullname)[1]
DocumentCreated document creation datetime (/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:contentCreated)[1]
DocumentModified last document content modification datetime (/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:contentLastModified)[1]
PageCount number of pages in document (/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nfo:pageCount)[1]
Keywords document keywords /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:keyword

Complete conversion pipeline to extract metadata.

Pipiline may be downloaded here ApertureMetadataPipeline.bpel

Detected metadata property extracted by aperture vs. document type compatibility table

Microsoft document types

  ext Generator Title Subject Description Creator Contributor DocumentCreated DocumentModified PageCount Keywords
Microsoft Excel 2000 xls + + + + + + + +   +
Microsoft Excel 2007 xlam + + + + + + + +   +
Microsoft Excel 2007 xlsb + + + + + + + +   +
Microsoft Excel 2007 xlsm + + + + + + + +   +
Microsoft Excel 2007 xlsx + + + + + + + +   +
Microsoft Excel 2007 xltm + + + + + + + +   +
Microsoft Excel 2007 xltx + + + + + + + +   +
Microsoft Powerpoint 2000 ppt + + + + + + + +   +
Microsoft Powerpoint 2007 potm + + + + + + + +   +
Microsoft Powerpoint 2007 potx + + + + + + + +   +
Microsoft Powerpoint 2007 ppsm + + + + + + + +   +
Microsoft Powerpoint 2007 ppsx + + + + + + + +   +
Microsoft Powerpoint 2007 pptm + + + + + + + +   +
Microsoft Powerpoint 2007 pptx + + + + + + + +   +
Microsoft Publisher 2003 pub   + + + +         +
Microsoft Visio vsd   + + + +         +
Microsoft Word 97-2003 doc + + + + + + + +   +
Microsoft Word 2000 rtf                    
Microsoft Word 2007 docm + + + + + + + + + +
Microsoft Word 2007 docx + + + + + + + + + +
Microsoft Word 2007 dotm + + + + + + + + + +
Microsoft Word 2007 dotx + + + + + + + + + +
Microsoft Works Spreadsheet 4.0-2000 wks                    
Microsoft Works Spreadsheet 7.0 xlr                    
Microsoft Works Word Processor 3.0 wps                    
Microsoft Works Word Processor 4.0 wps                    
Microsoft Works Word Processor 2000 wps                    
Microsoft Works Word Processor 7.0 wps                    

Open Office document types

  ext Generator Title Subject Description Creator Contributor DocumentCreated DocumentModified PageCount Keywords
OpenOffice 2.4 Calc ods +       +   +      
OpenOffice 2.4 Impress odp + +     +   +      
OpenOffice 2.4 Writer odt +       +   +   +  
OpenOffice 3.0 Calc ods + + + + +   +     +
OpenOffice 3.0 Impress odp + + + + +   +     +
OpenOffice 3.0 Writer odt + + + + +   +   + +

Corel document types

  ext Generator Title Subject Description Creator Contributor DocumentCreated DocumentModified PageCount Keywords
Corel Presentations 3.0 shw                    
Corel Presentations x3 shw   + + + + +       +
Corel Quattro Pro 7 wb3 + + + + + + + +   +
Corel Quattro Pro x3 qpw + + + + + + + +   +
Corel Wordperfect 5.0 wp                    
Corel Wordperfect 5.1 wp                    
Corel Wordperfect x3 wp                    

PDF generators

  ext Generator Title Subject Description Creator Contributor DocumentCreated DocumentModified PageCount Keywords
Total pdf + + +   +   + + + +
Distiller 6 pdf + +     +   + + +  
OpenOffice Writer  1.1.5 pdf + + +   +   +   + +
OpenOffice Writer  2.0 pdf + + +   +   +   + +
PdfCreator 0.8.0 for Word 2000 pdf + + +   +   + + + +
PdfMaker 7.0 for Word 2000 pdf + + +   +   + + + +
PdfWriter 7.0 for Word 2000 pdf + +     +   + + +  

Labels

quick_pipelet_aperture_sub quick_pipelet_aperture_sub Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.