What is FilePath2URLPipelet?
It's a conversion pipelet which transforms a path from a filesystem into a corresponding URL.
How to use FilePath2URLPipelet in Eccenca.CE.
In this page is described how to
- Install required pipelet from update site,
- Add new field to FileIndex collection to store the file URL
- Invoke URL calculation from conversion pipeline.
- Crawl, search and check results
Workflow
Install Basic Pipelet feature from Eccenca.CE update site


after pressing "Apply changes" pipelet will be installed
Modify collection by adding new field for storing URL
Remove physical index
- Navigate to the "Collections" tab
- Remove index for FileIndex collection.
Add new text field to IndexStructure
- Navigate to FileIndex collection
- Navigate to IndexStructure tab of Collection and add field to configuration XML. It will be field with number 6.
<IndexField FieldNo="6" IndexValue="true" Name="URL" StoreText="true" Tokenize="true" Type="Text"/>
<IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="FileIndex">
<Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<IndexField FieldNo="6" IndexValue="true" Name="URL" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="5" IndexValue="true" Name="Content" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="4" IndexValue="true" Name="Date" StoreText="true" Tokenize="false" Type="Date"/>
<IndexField FieldNo="3" IndexValue="true" Name="Size" StoreText="true" Tokenize="false" Type="Number"/>
<IndexField FieldNo="2" IndexValue="true" Name="Extension" StoreText="true" Tokenize="false" Type="Text"/>
<IndexField FieldNo="1" IndexValue="true" Name="Filename" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="0" IndexValue="true" Name="Path" StoreText="true" Tokenize="true" Type="Text"/>
</IndexStructure>
Change search result and default search configuration
- Navigate to Configuration tab of Collection and add field to Result and DefaultConfig
<Configuration xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration
../xml/DataDictionaryConfiguration.xsd"
xmlns="http://www.anyfinder.de/DataDictionary/Configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<DefaultConfig>
<Field FieldNo="6">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="5">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="4">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTDate">
<Parameter xmlns="http://www.anyfinder.de/Search/DateField" />
</FieldConfig>
</Field>
<Field FieldNo="3">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTNumber">
<Parameter xmlns="http://www.anyfinder.de/Search/NumberField" />
</FieldConfig>
</Field>
<Field FieldNo="2">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="1">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="0">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
</DefaultConfig>
<Result Name="">
<ResultField FieldNo="6" Name="URL" />
<ResultField FieldNo="4" Name="Date" />
<ResultField FieldNo="3" Name="Size" />
<ResultField FieldNo="2" Name="Extension" />
<ResultField FieldNo="1" Name="Filename" />
<ResultField FieldNo="0" Name="Path" />
</Result>
<HighlightingResult Name="">
<HighlightingResultField FieldNo="5" Name="Content" xsi:type="HLTextField">
<HighlightingTransformer Name="urn:Sentence">
<ParameterSet xmlns="http://www.brox.de/ParameterSet">
<Parameter Name="MaxLength" xsi:type="Integer">
<Value>500</Value>
</Parameter>
<Parameter Name="MaxHLElements" xsi:type="Integer">
<Value>999</Value>
</Parameter>
<Parameter Name="MaxSucceedingCharacters" xsi:type="Integer">
<Value>50</Value>
</Parameter>
<Parameter Name="SucceedingCharacters" xsi:type="String">
<Value>...</Value>
</Parameter>
<Parameter Name="SortAlgorithm" xsi:type="String">
<Value>Occurrence</Value>
</Parameter>
<Parameter Name="TextHandling" xsi:type="String">
<Value>ReturnSnipplet</Value>
</Parameter>
</ParameterSet>
</HighlightingTransformer>
<HighlightingParameter xmlns="http://www.anyfinder.de/DataDictionary/Configuration/TextHighlighting" />
</HighlightingResultField>
</HighlightingResult>
</Configuration>
Update mappings from record into index
- Navigate to Mapping tab of Collection and add mapping from attribute "URL" to field 6.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Mapping xmlns="http://www.eccenca.com/eccenca/lucene" indexName="FileIndex">
<Attributes>
<Attribute fieldNo="0" name="Path" />
<Attribute fieldNo="1" name="Filename" />
<Attribute fieldNo="2" name="Extension" />
<Attribute fieldNo="3" name="Size" />
<Attribute fieldNo="4" name="LastModifiedDate" />
<Attribute fieldNo="6" name="URL" />
</Attributes>
<Attachments>
<Attachment fieldNo="5" name="Text" />
</Attachments>
</Mapping>
Save collection and create physical index
- Press Save and you will see message "Collection 'FileIndex' was successfully updated."
- Press create index icon and index will be created.
Update index order conversion pipeline
- Navigate to Index Orders for collection and press on "file" index order to edit. We have to change only processing pipeline to add URL calculation.
- Navigate to "Processing pipeline" tab and add FilePath2URL pipelet invocation.
<process name="Convert_FileIndex_file" targetNamespace="http://www.eclipse.org/smila/processor"
xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:id="http://www.eclipse.org/smila/id"
xmlns:proc="http://www.eclipse.org/smila/processor" xmlns:rec="http://www.eclipse.org/smila/record"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<import importType="http://schemas.xmlsoap.org/wsdl/" location="../processor.wsdl"
namespace="http://www.eclipse.org/smila/processor" />
<partnerLinks>
<partnerLink myRole="service" name="Pipeline" partnerLinkType="proc:ProcessorPartnerLinkType" />
</partnerLinks>
<extensions>
<extension mustUnderstand="no" namespace="http://www.eclipse.org/smila/processor" />
</extensions>
<variables>
<variable messageType="proc:ProcessorMessage" name="request" />
</variables>
<sequence>
<receive createInstance="yes" name="start" operation="process" partnerLink="Pipeline"
portType="proc:ProcessorPortType" variable="request" />
<extensionActivity name="invokeSimpleMimeTypeIdentification">
<proc:invokeService>
<proc:service name="MimeTypeIdentifyService" />
<proc:variables input="request" output="request" />
</proc:invokeService>
</extensionActivity>
<if name="conditionIsText">
<condition>($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/plain")
or ($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/plain")</condition>
<extensionActivity name="invokeCopyText">
<proc:invokePipelet>
<proc:pipelet class="com.brox.anyfinder.processing.utils.CopyAttachmentPipelet" />
<proc:variables input="request" output="request" />
<proc:PipeletConfiguration>
<proc:Property name="source">
<proc:Value>Content</proc:Value>
</proc:Property>
<proc:Property name="target">
<proc:Value>Text</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
</if>
<if name="conditionIsHtml">
<condition>($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/html")
or ($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/xml")</condition>
<extensionActivity name="invokeHtml2Txt">
<proc:invokePipelet>
<proc:pipelet class="org.eclipse.smila.processing.pipelets.HtmlToTextPipelet" />
<proc:variables input="request" output="request" />
<proc:PipeletConfiguration>
<proc:Property name="inputType">
<proc:Value>ATTACHMENT</proc:Value>
</proc:Property>
<proc:Property name="outputType">
<proc:Value>ATTACHMENT</proc:Value>
</proc:Property>
<proc:Property name="inputName">
<proc:Value>Content</proc:Value>
</proc:Property>
<proc:Property name="outputName">
<proc:Value>Text</proc:Value>
</proc:Property>
<proc:Property name="meta:title">
<proc:Value>Title</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
</if>
<extensionActivity name="invokeFilePath2URL">
<proc:invokePipelet>
<proc:pipelet class="com.eccenca.pipelets.basic.FilePath2URLPipelet" />
<proc:variables input="request" />
<proc:PipeletConfiguration>
<proc:Property name="Source">
<proc:Value>Path</proc:Value>
</proc:Property>
<proc:Property name="Target">
<proc:Value>URL</proc:Value>
</proc:Property>
<proc:Property name="SourceType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="TargetType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="Protocol">
<proc:Value>http</proc:Value>
</proc:Property>
<proc:Property name="Hostname">
<proc:Value>myhost</proc:Value>
</proc:Property>
<proc:Property name="PathPrefix">
<proc:Value>c:\data</proc:Value>
</proc:Property>
<proc:Property name="UrlPrefix">
<proc:Value>my_app</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
<reply name="end" operation="process" partnerLink="Pipeline" portType="proc:ProcessorPortType" variable="request" />
<exit />
</sequence>
</process>
Press save button and you will see ok message "Index Order 'file' was updated successfully".
Create index, run index order, then perform the search and check the URLs generated:
