public interface ConverterBase
Modifier and Type | Interface and Description |
---|---|
static class |
ConverterBase.ConverterStatus
Representing the status of conversion.
|
Modifier and Type | Method and Description |
---|---|
default boolean |
canProcess(CrawlData data)
The implementation of this method tests whether or not the given
crawl-data can be processed by this converter, typically by examining
the value of content-type and url of the crawl-data . |
default boolean |
canReadMetadataFrom(CrawlData data)
The implementation of this method tests whether or not the given
crawl-data can be used as metadata during the conversion. |
default boolean |
forcesDeletion()
The implementation of this method defines a flag as to whether or not the
deletion is issued for any old document with
url that matches by
prefix to url of crawl-data that is processed by this
converter, before the results of this converter are added to the dataset. |
ConverterConfiguration |
getConfiguration() |
default ConverterConfiguration.GeneralSettings |
getGeneralSettings() |
default java.util.logging.Logger |
logger() |
default void |
postProcess(CrawlData crawlData,
CrawlData.Builder builder)
The implementation of this method modifies
crawl-data after the
conversion. |
default boolean |
removesMetadata()
The implementation of this method defines a flag as to whether or not after
the conversion, this converter removes metadata, i.e.,
crawl-data for
which
canReadMetadataFrom is true , from the output. |
default java.util.logging.Logger logger()
ConverterConfiguration getConfiguration()
ConverterConfiguration
of this
converter, which is set by the converter pipeline framework.default ConverterConfiguration.GeneralSettings getGeneralSettings()
ConverterConfiguration.GeneralSettings
section of the converter configuration, which
includes information about typeIn
, conditional
,
typeOut
, and typeOut
.default boolean canProcess(CrawlData data)
crawl-data
can be processed by this converter, typically by examining
the value of content-type
and url
of the crawl-data
.
If getGeneralSettings
returns non-null
value, the default
implementation judges it from typeIn
and conditional
parameters of given converter configuration. Otherwise it returns
false
.data
- crawl-data
true
, if crawl-data
should be processed for
conversion.default boolean canReadMetadataFrom(CrawlData data)
crawl-data
can be used as metadata during the conversion. If
getGeneralSettings
returns non-null
value, the default
implementation of this method examines whether or not url
of the
crawl-data
matches typeMeta
parameter of given converter
configuration. Otherwise it returns null
. *data
- crawl-data
true
, if crawl-data
can be used as metadata.default void postProcess(CrawlData crawlData, CrawlData.Builder builder)
crawl-data
after the
conversion. If getGeneralSettings
returns non-null
value, the
default implementation sets the value of typeOut
of given converter
configuration.crawlData
- input crawl-data
before conversionbuilder
- builder for new crawl-data
after conversiondefault boolean removesMetadata()
crawl-data
for
which
canReadMetadataFrom
is true
, from the output.true
, if metadata needs to be removed.default boolean forcesDeletion()
url
that matches by
prefix to url
of crawl-data
that is processed by this
converter, before the results of this converter are added to the dataset.
*true
, if the deletion of old documents with matching
url
should be issued.