Introduction: What is Aperture?Aperture is an Open Source Project that enables to
In eccenca Aperture stands for a BPEL engine pipelet that wraps the aperture libraries. Because OpenRDF is an XML format it is not suitable to be indexed directly but one needs to use the XPathExtractorPipelet to get the the specific meta data items.
Scopewe show
Within this guide the configurations provided will be for the existing FileIndex and its Crawling and Processing Configuration named file that is shiped with eccenca CE by default. StepsStep 1 - Installing ApertureFollow the steps on How to install a new component to install the Aperture pipelet. Step 2 - Update Crawler ConfigurationThis step will include files other than the dafault text and html files for processing. Make sure that the folder is correct and contains such files.
Step 3 - Update Processing Configuration IIn this step we add the Aperture pipelet to the BPEL workflow.
Step 4 - Update Processing Configuration IIAs mentioned above, the metadata is extracted by Aperture into RDF. The specifc meta items need to be extracted from the RDF with the XPathExtractorPipelet pipelet that is shipped with SMILA.
Your complete Processing Configuration should now look like so: Aperture Pipelet - Processing Config - Final XML ConclusionYou have seen how the aperture piplet is incorporated into a crawl process and how titles are extracted from documents. This HowTo doesn not explain how a new index field is created that may recieve the titel and make it available in search. Plz refere to 5 Minutes to success - RegExpTransformer pipelet for an example on this. |