What is RegExpTransformer?
It's a conversion pipelet which transforms a string using regular expressions
How to use RegExpTransformer in Eccenca.CE.
In this "5 Minutes to success" page we will
- Install required pipelet from update site,
- Add new field to FileIndex collection to store the file dir extracted from its path
- Invoke dir calculation from conversion pipeline.
- Crawl, search and check results
Workflow
Install Basic Pipelet feature from Eccenca.CE update site


after pressing "Apply changes" pipelet will be installed
Modify collection by adding new field for storing file dir
Remove physical index
- Navigate to the "Collections" tab
- Remove index for FileIndex collection.
Add new text field to IndexStructure
- Navigate to FileIndex collection
- Navigate to IndexStructure tab of Collection and add field to configuration XML. It will be field with number 6.
<IndexField FieldNo="6" IndexValue="true" Name="Dir" StoreText="true" Tokenize="true" Type="Text"/>
<IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="FileIndex">
<Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<IndexField FieldNo="6" IndexValue="true" Name="Dir" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="5" IndexValue="true" Name="Content" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="4" IndexValue="true" Name="Date" StoreText="true" Tokenize="false" Type="Date"/>
<IndexField FieldNo="3" IndexValue="true" Name="Size" StoreText="true" Tokenize="false" Type="Number"/>
<IndexField FieldNo="2" IndexValue="true" Name="Extension" StoreText="true" Tokenize="false" Type="Text"/>
<IndexField FieldNo="1" IndexValue="true" Name="Filename" StoreText="true" Tokenize="true" Type="Text"/>
<IndexField FieldNo="0" IndexValue="true" Name="Path" StoreText="true" Tokenize="true" Type="Text"/>
</IndexStructure>
Change search result and default search configuration
- Navigate to Configuration tab of Collection and add field to Result and DefaultConfig
<Configuration xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration
../xml/DataDictionaryConfiguration.xsd"
xmlns="http://www.anyfinder.de/DataDictionary/Configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<DefaultConfig>
<Field FieldNo="6">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="5">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="4">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTDate">
<Parameter xmlns="http://www.anyfinder.de/Search/DateField" />
</FieldConfig>
</Field>
<Field FieldNo="3">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTNumber">
<Parameter xmlns="http://www.anyfinder.de/Search/NumberField" />
</FieldConfig>
</Field>
<Field FieldNo="2">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="1">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
<Field FieldNo="0">
<FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">
<Parameter Operator="OR" Tolerance="exact" xmlns="http://www.anyfinder.de/Search/TextField" />
</FieldConfig>
</Field>
</DefaultConfig>
<Result Name="">
<ResultField FieldNo="6" Name="Dir" />
<ResultField FieldNo="4" Name="Date" />
<ResultField FieldNo="3" Name="Size" />
<ResultField FieldNo="2" Name="Extension" />
<ResultField FieldNo="1" Name="Filename" />
<ResultField FieldNo="0" Name="Path" />
</Result>
<HighlightingResult Name="">
<HighlightingResultField FieldNo="5" Name="Content" xsi:type="HLTextField">
<HighlightingTransformer Name="urn:Sentence">
<ParameterSet xmlns="http://www.brox.de/ParameterSet">
<Parameter Name="MaxLength" xsi:type="Integer">
<Value>500</Value>
</Parameter>
<Parameter Name="MaxHLElements" xsi:type="Integer">
<Value>999</Value>
</Parameter>
<Parameter Name="MaxSucceedingCharacters" xsi:type="Integer">
<Value>50</Value>
</Parameter>
<Parameter Name="SucceedingCharacters" xsi:type="String">
<Value>...</Value>
</Parameter>
<Parameter Name="SortAlgorithm" xsi:type="String">
<Value>Occurrence</Value>
</Parameter>
<Parameter Name="TextHandling" xsi:type="String">
<Value>ReturnSnipplet</Value>
</Parameter>
</ParameterSet>
</HighlightingTransformer>
<HighlightingParameter xmlns="http://www.anyfinder.de/DataDictionary/Configuration/TextHighlighting" />
</HighlightingResultField>
</HighlightingResult>
</Configuration>
Update mappings from record into index
- Navigate to Mapping tab of Collection and add mapping from attribute "Dir" to field 6.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Mapping xmlns="http://www.eccenca.com/eccenca/lucene" indexName="FileIndex">
<Attributes>
<Attribute fieldNo="0" name="Path" />
<Attribute fieldNo="1" name="Filename" />
<Attribute fieldNo="2" name="Extension" />
<Attribute fieldNo="3" name="Size" />
<Attribute fieldNo="4" name="LastModifiedDate" />
<Attribute fieldNo="6" name="Dir" />
</Attributes>
<Attachments>
<Attachment fieldNo="5" name="Text" />
</Attachments>
</Mapping>
Save collection and create physical index
- Press Save and you will see message "Collection 'FileIndex' was successfully updated."
- Press create index icon and index will be created.
Update index order conversion pipeline
- Navigate to Index Orders for collection and press on "file" index order to edit. We have to change only processing pipeline to add Dir calculation.
- Navigate to "Processing pipeline" tab and add RegExpTransformer pipelet invocation.
<process name="Convert_FileIndex_file" targetNamespace="http://www.eclipse.org/smila/processor"
xmlns="http://docs.oasis-open.org/wsbpel/2.0/process/executable" xmlns:id="http://www.eclipse.org/smila/id"
xmlns:proc="http://www.eclipse.org/smila/processor" xmlns:rec="http://www.eclipse.org/smila/record"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<import importType="http://schemas.xmlsoap.org/wsdl/" location="../processor.wsdl"
namespace="http://www.eclipse.org/smila/processor" />
<partnerLinks>
<partnerLink myRole="service" name="Pipeline" partnerLinkType="proc:ProcessorPartnerLinkType" />
</partnerLinks>
<extensions>
<extension mustUnderstand="no" namespace="http://www.eclipse.org/smila/processor" />
</extensions>
<variables>
<variable messageType="proc:ProcessorMessage" name="request" />
</variables>
<sequence>
<receive createInstance="yes" name="start" operation="process" partnerLink="Pipeline"
portType="proc:ProcessorPortType" variable="request" />
<extensionActivity name="invokeSimpleMimeTypeIdentification">
<proc:invokeService>
<proc:service name="MimeTypeIdentifyService" />
<proc:variables input="request" output="request" />
</proc:invokeService>
</extensionActivity>
<if name="conditionIsText">
<condition>($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/plain")
or ($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/plain")</condition>
<extensionActivity name="invokeCopyText">
<proc:invokePipelet>
<proc:pipelet class="com.brox.anyfinder.processing.utils.CopyAttachmentPipelet" />
<proc:variables input="request" output="request" />
<proc:PipeletConfiguration>
<proc:Property name="source">
<proc:Value>Content</proc:Value>
</proc:Property>
<proc:Property name="target">
<proc:Value>Text</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
</if>
<if name="conditionIsHtml">
<condition>($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/html")
or ($request.records/rec:Record[1]/rec:A[@n="MimeType"]/rec:L/rec:V = "text/xml")</condition>
<extensionActivity name="invokeHtml2Txt">
<proc:invokePipelet>
<proc:pipelet class="org.eclipse.smila.processing.pipelets.HtmlToTextPipelet" />
<proc:variables input="request" output="request" />
<proc:PipeletConfiguration>
<proc:Property name="inputType">
<proc:Value>ATTACHMENT</proc:Value>
</proc:Property>
<proc:Property name="outputType">
<proc:Value>ATTACHMENT</proc:Value>
</proc:Property>
<proc:Property name="inputName">
<proc:Value>Content</proc:Value>
</proc:Property>
<proc:Property name="outputName">
<proc:Value>Text</proc:Value>
</proc:Property>
<proc:Property name="meta:title">
<proc:Value>Title</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
</if>
<extensionActivity name="invokeRegExpTransformer">
<proc:invokePipelet>
<proc:pipelet class="com.eccenca.pipelets.basic.RegExpTransformer" />
<proc:variables input="request" />
<proc:PipeletConfiguration>
<proc:Property name="Source">
<proc:Value>Path</proc:Value>
</proc:Property>
<proc:Property name="Target">
<proc:Value>Dir</proc:Value>
</proc:Property>
<proc:Property name="SourceType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="TargetType">
<proc:Value>ATTRIBUTE</proc:Value>
</proc:Property>
<proc:Property name="Value">
<proc:Value>^(.+)[\\/]+[How to use RegExpTransformer pipelet^\\/]+$</proc:Value>
</proc:Property>
<proc:Property name="ValueIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="Translation">
<proc:Value>$1</proc:Value>
</proc:Property>
<proc:Property name="TranslationIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="IgnoreCase" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
</proc:PipeletConfiguration>
</proc:invokePipelet>
</extensionActivity>
<reply name="end" operation="process" partnerLink="Pipeline" portType="proc:ProcessorPortType" variable="request" />
<exit />
</sequence>
</process>
Press save button and you will see ok message "Index Order 'file' was updated successfully".
Create index, run index order, then perform the search and check the file dirs:

More configuration examples
Appending a configured string
This configuration appends a "World" string to the source string, making "Hello World" out of source "Hello ":
<proc:Property name="Value">
<proc:Value>(^(?:.|\n)*$)</proc:Value>
</proc:Property>
<proc:Property name="ValueIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="Translation">
<proc:Value>$1World</proc:Value>
</proc:Property>
<proc:Property name="TranslationIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
Prepending a prefix
This configuration appends a "Hello" string to the source string, making "Hello World" out of source " World":
<proc:Property name="Value">
<proc:Value>(^(?:.|\n)*$)</proc:Value>
</proc:Property>
<proc:Property name="ValueIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="Translation">
<proc:Value>Hello$1</proc:Value>
</proc:Property>
<proc:Property name="TranslationIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
Replacing source string with a configured one
This configuration replaces the whole source string with a "MyReplace":
<proc:Property name="Value">
<proc:Value>^(?:.|\n)*$</proc:Value>
</proc:Property>
<proc:Property name="ValueIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="Translation">
<proc:Value>MyReplace</proc:Value>
</proc:Property>
<proc:Property name="TranslationIsPattern" type="java.lang.Boolean">
<proc:Value>false</proc:Value>
</proc:Property>
Stripping unwanted characters
This configuration removes all characters from the source string except for the "A-Z", "a-z", 0-9 and whitespaces:
<proc:Property name="Value">
<proc:Value>([How to use RegExpTransformer pipelet^A-Za-z0-9\s])</proc:Value>
</proc:Property>
<proc:Property name="ValueIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>
<proc:Property name="Translation">
<proc:Value></proc:Value>
</proc:Property>
<proc:Property name="TranslationIsPattern" type="java.lang.Boolean">
<proc:Value>true</proc:Value>
</proc:Property>