public interface ConverterBase
| Modifier and Type | Interface and Description |
|---|---|
static class |
ConverterBase.ConverterStatus
Representing the status of conversion.
|
| Modifier and Type | Method and Description |
|---|---|
default boolean |
canProcess(CrawlData data)
The implementation of this method tests whether or not the given
crawl-data can be processed by this converter, typically by examining
the value of content-type and url of the crawl-data. |
default boolean |
canReadMetadataFrom(CrawlData data)
The implementation of this method tests whether or not the given
crawl-data can be used as metadata during the conversion. |
default boolean |
forcesDeletion()
The implementation of this method defines a flag as to whether or not the
deletion is issued for any old document with
url that matches by
prefix to url of crawl-data that is processed by this
converter, before the results of this converter are added to the dataset. |
ConverterConfiguration |
getConfiguration() |
default ConverterConfiguration.GeneralSettings |
getGeneralSettings() |
default java.util.logging.Logger |
logger() |
default void |
postProcess(CrawlData crawlData,
CrawlData.Builder builder)
The implementation of this method modifies
crawl-data after the
conversion. |
default boolean |
removesMetadata()
The implementation of this method defines a flag as to whether or not after
the conversion, this converter removes metadata, i.e.,
crawl-data for
which
canReadMetadataFrom is true, from the output. |
default java.util.logging.Logger logger()
ConverterConfiguration getConfiguration()
ConverterConfiguration of this
converter, which is set by the converter pipeline framework.default ConverterConfiguration.GeneralSettings getGeneralSettings()
ConverterConfiguration.GeneralSettings section of the converter configuration, which
includes information about typeIn, conditional,
typeOut, and typeOut.default boolean canProcess(CrawlData data)
crawl-data can be processed by this converter, typically by examining
the value of content-type and url of the crawl-data.
If getGeneralSettings returns non-null value, the default
implementation judges it from typeIn and conditional
parameters of given converter configuration. Otherwise it returns
false.data - crawl-datatrue, if crawl-data should be processed for
conversion.default boolean canReadMetadataFrom(CrawlData data)
crawl-data can be used as metadata during the conversion. If
getGeneralSettings returns non-null value, the default
implementation of this method examines whether or not url of the
crawl-data matches typeMeta parameter of given converter
configuration. Otherwise it returns null. *data - crawl-datatrue, if crawl-data can be used as metadata.default void postProcess(CrawlData crawlData, CrawlData.Builder builder)
crawl-data after the
conversion. If getGeneralSettings returns non-null value, the
default implementation sets the value of typeOut of given converter
configuration.crawlData - input crawl-data before conversionbuilder - builder for new crawl-data after conversiondefault boolean removesMetadata()
crawl-data for
which
canReadMetadataFrom is true, from the output.true, if metadata needs to be removed.default boolean forcesDeletion()
url that matches by
prefix to url of crawl-data that is processed by this
converter, before the results of this converter are added to the dataset.
*true, if the deletion of old documents with matching
url should be issued.