Aperture Pipelet - Processing Config - Extended Metadata Extraction
Metadata extraction based on XPath extraction from RDF produced by Aperture. RDF contains information common for all document types as well as document type specific information. In this section we will show how to extract common information.
Namespaces and Prefixes used
Metadata properties XPaths used
| Property |
Description |
XPath |
| Generator |
document generator |
(/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:generator)[1] |
| Title |
document title |
(/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:title)[1] |
| Subject |
document subject |
/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:subject |
| Description |
document description |
/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:description |
| Creator |
fullename of user created document |
(/rdf:RDF/rdf:Description[@rdf:about = /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nco:creator/@rdf:resource]/nco:fullname)[1] |
| Contributor |
fullname of user contributed document |
(/rdf:RDF/rdf:Description[@rdf:about = /rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nco:contributor/@rdf:resource]/nco:fullname)[1] |
| DocumentCreated |
document creation datetime |
(/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:contentCreated)[1] |
| DocumentModified |
last document content modification datetime |
(/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:contentLastModified)[1] |
| PageCount |
number of pages in document |
(/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nfo:pageCount)[1] |
| Keywords |
document keywords |
/rdf:RDF/rdf:Description[starts-with(@rdf:about,'file:')]/nie:keyword |
Complete conversion pipeline to extract metadata.
Pipiline may be downloaded here ApertureMetadataPipeline.bpel
Detected metadata property extracted by aperture vs. document type compatibility table
Microsoft document types
| |
ext |
Generator |
Title |
Subject |
Description |
Creator |
Contributor |
DocumentCreated |
DocumentModified |
PageCount |
Keywords |
| Microsoft Excel 2000 |
xls |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xlam |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xlsb |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xlsm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xlsx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xltm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Excel 2007 |
xltx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2000 |
ppt |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
potm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
potx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
ppsm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
ppsx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
pptm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Powerpoint 2007 |
pptx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Publisher 2003 |
pub |
|
+ |
+ |
+ |
+ |
|
|
|
|
+ |
| Microsoft Visio |
vsd |
|
+ |
+ |
+ |
+ |
|
|
|
|
+ |
| Microsoft Word 97-2003 |
doc |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Microsoft Word 2000 |
rtf |
|
|
|
|
|
|
|
|
|
|
| Microsoft Word 2007 |
docm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
| Microsoft Word 2007 |
docx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
| Microsoft Word 2007 |
dotm |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
| Microsoft Word 2007 |
dotx |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
| Microsoft Works Spreadsheet 4.0-2000 |
wks |
|
|
|
|
|
|
|
|
|
|
| Microsoft Works Spreadsheet 7.0 |
xlr |
|
|
|
|
|
|
|
|
|
|
| Microsoft Works Word Processor 3.0 |
wps |
|
|
|
|
|
|
|
|
|
|
| Microsoft Works Word Processor 4.0 |
wps |
|
|
|
|
|
|
|
|
|
|
| Microsoft Works Word Processor 2000 |
wps |
|
|
|
|
|
|
|
|
|
|
| Microsoft Works Word Processor 7.0 |
wps |
|
|
|
|
|
|
|
|
|
|
Open Office document types
| |
ext |
Generator |
Title |
Subject |
Description |
Creator |
Contributor |
DocumentCreated |
DocumentModified |
PageCount |
Keywords |
| OpenOffice 2.4 Calc |
ods |
+ |
|
|
|
+ |
|
+ |
|
|
|
| OpenOffice 2.4 Impress |
odp |
+ |
+ |
|
|
+ |
|
+ |
|
|
|
| OpenOffice 2.4 Writer |
odt |
+ |
|
|
|
+ |
|
+ |
|
+ |
|
| OpenOffice 3.0 Calc |
ods |
+ |
+ |
+ |
+ |
+ |
|
+ |
|
|
+ |
| OpenOffice 3.0 Impress |
odp |
+ |
+ |
+ |
+ |
+ |
|
+ |
|
|
+ |
| OpenOffice 3.0 Writer |
odt |
+ |
+ |
+ |
+ |
+ |
|
+ |
|
+ |
+ |
Corel document types
| |
ext |
Generator |
Title |
Subject |
Description |
Creator |
Contributor |
DocumentCreated |
DocumentModified |
PageCount |
Keywords |
| Corel Presentations 3.0 |
shw |
|
|
|
|
|
|
|
|
|
|
| Corel Presentations x3 |
shw |
|
+ |
+ |
+ |
+ |
+ |
|
|
|
+ |
| Corel Quattro Pro 7 |
wb3 |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Corel Quattro Pro x3 |
qpw |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
+ |
| Corel Wordperfect 5.0 |
wp |
|
|
|
|
|
|
|
|
|
|
| Corel Wordperfect 5.1 |
wp |
|
|
|
|
|
|
|
|
|
|
| Corel Wordperfect x3 |
wp |
|
|
|
|
|
|
|
|
|
|
PDF generators
| |
ext |
Generator |
Title |
Subject |
Description |
Creator |
Contributor |
DocumentCreated |
DocumentModified |
PageCount |
Keywords |
| Total |
pdf |
+ |
+ |
+ |
|
+ |
|
+ |
+ |
+ |
+ |
| Distiller 6 |
pdf |
+ |
+ |
|
|
+ |
|
+ |
+ |
+ |
|
| OpenOffice Writer 1.1.5 |
pdf |
+ |
+ |
+ |
|
+ |
|
+ |
|
+ |
+ |
| OpenOffice Writer 2.0 |
pdf |
+ |
+ |
+ |
|
+ |
|
+ |
|
+ |
+ |
| PdfCreator 0.8.0 for Word 2000 |
pdf |
+ |
+ |
+ |
|
+ |
|
+ |
+ |
+ |
+ |
| PdfMaker 7.0 for Word 2000 |
pdf |
+ |
+ |
+ |
|
+ |
|
+ |
+ |
+ |
+ |
| PdfWriter 7.0 for Word 2000 |
pdf |
+ |
+ |
|
|
+ |
|
+ |
+ |
+ |
|