- AperturePipelet - detect type and convert document into RDF or Text
- ApertureMimeTypeIdentifier - separate mime-type identification service that may be used for other purposes.
Overview
| Name | Aperture Pipelet |
|---|---|
| Vendor | brox IT-Solutions GmbH |
| Authors | |
| Homepage | http://www.brox.de |
| Issue Management | http://support.eccenca.com |
| Continuous Integration | n/a |
| Categories | Pipelet |
| Most Recent Version (see older versions) | Version 0.5.0 |
| Availability (see older versions) | eccenca / SMILA... |
| State | Stable |
| Support | |
| License | Aperture License Pipelet: Freeware eccenca Component License |
| Price | Free |
| Release Docs | |
| Java API Docs | n/a |
| Download Source | |
| Download JAR |
Description/Features
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites,..) and the file formats (e.g. documents, images) occurring in these systems.
- detect mime-type of document
- extract plaint text content of the document
- extract RDF in XML format
Example of using AperturePipelet is covered in How to use Aperture pipelet
How to configure it.
Configuration properties
- AttachmentContent - the name of attachment where document stored (required)
- AttachmentMimeType - the name of attribute where suggested document mime-type stored or will be stored after conversion (optional)
- FileExtensionAttribute - the name of attribute where file extension stored - it will help to detect mime-type (optional)
- AttachmentText - the name of attachment where extracted text will be stored (optional, but one of AttachmentText/AttachmentRdf required)
- AttachmentRdf - the name of attachment where extracted RDF will be stored (optional, but one of AttachmentText/AttachmentRdf required)
The simplest pipelet invocation:
<extensionActivity name="invokeApertureConversion"> <proc:invokePipelet> <proc:pipelet class="com.eccenca.processing.pipelets.aperture.AperturePipelet" /> <proc:variables input="request" output="request" /> <proc:PipeletConfiguration> <proc:Property name="AttachmentContent"> <proc:Value>Content</proc:Value> </proc:Property> <proc:Property name="AttachmentText"> <proc:Value>Text</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
This invocation causes that pipelet will get attachment with name "Content", detects mime-type, and convert it into text and store text back into attachment with name "Content"
Pipelet invocation when file extension is known:
<extensionActivity name="invokeApertureConversion"> <proc:invokePipelet> <proc:pipelet class="com.eccenca.processing.pipelets.aperture.AperturePipelet" /> <proc:variables input="request" output="request" /> <proc:PipeletConfiguration> <proc:Property name="AttachmentContent"> <proc:Value>Content</proc:Value> </proc:Property> <proc:Property name="AttachmentText"> <proc:Value>Text</proc:Value> </proc:Property> <proc:Property name="FileExtensionAttribute"> <proc:Value>Extension</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
This invocation causes that pipelet will try to use known file extension to detect mime-type more carefully. After that conversion will be applied as in the previous sample.
Pipelet invocation when mime-type is known:
<extensionActivity name="invokeApertureConversion"> <proc:invokePipelet> <proc:pipelet class="com.eccenca.processing.pipelets.aperture.AperturePipelet" /> <proc:variables input="request" output="request" /> <proc:PipeletConfiguration> <proc:Property name="AttachmentContent"> <proc:Value>Content</proc:Value> </proc:Property> <proc:Property name="AttachmentText"> <proc:Value>Text</proc:Value> </proc:Property> <proc:Property name="AttachmentMimeType"> <proc:Value>MimeType</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
This invocation causes that pipelet will try to use known mime-type. After that conversion will be applied as in the first sample.
Extracts RDF and store it into to attachment
<extensionActivity name="invokeApertureConversion"> <proc:invokePipelet> <proc:pipelet class="com.eccenca.processing.pipelets.aperture.AperturePipelet" /> <proc:variables input="request" output="request" /> <proc:PipeletConfiguration> <proc:Property name="AttachmentContent"> <proc:Value>Content</proc:Value> </proc:Property> <proc:Property name="AttachmentRdf"> <proc:Value>RDF</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
This invocation causes that pipelet will detects mime-type as in the first sample, convert document into RDF and store it into record attachment with the name "RDF"
The most complete pipelet invocation configuration:
<extensionActivity name="invokeApertureConversion"> <proc:invokePipelet> <proc:pipelet class="com.eccenca.processing.pipelets.aperture.AperturePipelet" /> <proc:variables input="request" output="request" /> <proc:PipeletConfiguration> <proc:Property name="AttachmentContent"> <proc:Value>Content</proc:Value> </proc:Property> <proc:Property name="AttachmentText"> <proc:Value>Text</proc:Value> </proc:Property> <proc:Property name="AttachmentRdf"> <proc:Value>RDF</proc:Value> </proc:Property> <proc:Property name="AttachmentMimeType"> <proc:Value>MimeType</proc:Value> </proc:Property> <proc:Property name="FileExtensionAttribute"> <proc:Value>Extension</proc:Value> </proc:Property> </proc:PipeletConfiguration> </proc:invokePipelet> </extensionActivity>
