public interface CustomConverterPlugin extends ConverterBase
A minimal custom converter needs to implement convert
method, which applies the conversion to each Document
object, and
publishes possibly multiple result Document
objects via given
publisher of type Publisher<? super Document>.
Developers can also supply
canProcess
,
canReadMetadataFrom
, and
postProcess
, in order to control which crawl-data
to process by this
converter, which crawl-data
to be used as metadata for conversion,
and to set content-type to the resulting crawl-data
. For example, we
typically implement
canProcess
as follows.
@Override
public boolean canProcess(CrawlData data) {
if (getGeneralSettings() != null) {
return CustomConverterPlugin.super.canProcess(data);
} else {
return "text/csv".equals(data.getContentType());
}
}
If a converter configuration is available, this uses the default
implementation of the method which checks typeIn
and
conditional
fields of general settings section of the configuration.
Otherwise it checks the content-type
of the input crawl data
manually.
ConverterBase.ConverterStatus
Modifier and Type | Method and Description |
---|---|
default void |
configure(java.util.Map<java.lang.String,java.lang.String> parameters)
Interface method to configure this converter plugin by given parameters.
|
default ConverterBase.ConverterStatus |
convert(Document document,
AXMLBase.Publisher<? super Document> documentPublisher,
java.lang.Iterable<Document> metaDocumentList)
Interface method for implementing
document -level conversion. |
boolean |
forcesDeletion()
The implementation of this method defines a flag as to whether or not the
deletion is issued for any old document with
url that matches by
prefix to url of crawl-data that is processed by this
converter, before the results of this converter are added to the dataset. |
default ConverterConfiguration |
getConfiguration() |
static void |
main(java.lang.String[] args)
Command line test runner's
main method. |
boolean |
removesMetadata()
The implementation of this method defines a flag as to whether or not after
the conversion, this converter removes metadata, i.e.,
crawl-data for
which
canReadMetadataFrom is true , from the output. |
canProcess, canReadMetadataFrom, getGeneralSettings, logger, postProcess
boolean removesMetadata()
ConverterBase
crawl-data
for
which
canReadMetadataFrom
is true
, from the output.removesMetadata
in interface ConverterBase
true
, if metadata needs to be removed.boolean forcesDeletion()
ConverterBase
url
that matches by
prefix to url
of crawl-data
that is processed by this
converter, before the results of this converter are added to the dataset.
*forcesDeletion
in interface ConverterBase
true
, if the deletion of old documents with matching
url
should be issued.default void configure(java.util.Map<java.lang.String,java.lang.String> parameters)
parameters
- key-value mapdefault ConverterConfiguration getConfiguration()
getConfiguration
in interface ConverterBase
ConverterConfiguration
of this
converter, which is set by the converter pipeline framework.default ConverterBase.ConverterStatus convert(Document document, AXMLBase.Publisher<? super Document> documentPublisher, java.lang.Iterable<Document> metaDocumentList) throws AmaIngestionException
document
-level conversion. This
method is called on each AXML document
, if enclosing
crawl-data
has appropriate content-type checked by
canProcess()
.document
- input AXML document, enclosed in crawl-data
for which
canProcess
is true.documentPublisher
- publisher for output AXML documentsmetaDocumentList
- metadata documents used for conversion, extracted from
crawl-data
for which
canReadMetadataFrom
is true.AmaIngestionException
static void main(java.lang.String[] args) throws java.lang.Exception
main
method. Use this
method to test and debug your converter plugin as a stand-alone application,
e.g., from your Java IDE.
java com.ibm.es.ama.zing.common.model.converter.CustomConverterPlugin <class name> <input axml> [<key>=<value>]*
configure
method
Note that when running your plugin on the test runner, we don't supply
converter configuration, so that values of getConfiguration
and getGeneralSettings
are
both null
.
args
- command line args.java.lang.Exception