Content Rewriter Module

Content Rewriter module defines simple APIs to bridge any content rewriting libraries such as HtmlCleaner. By default, this module includes HtmlCleaner based content rewriter component as well as simple text-line based content rewriter component.


If you use Apache Maven in your project, then you can add the following dependency to use this module in your project.


Class Diagram and Descriptions

Here is a class diagram showing the major interfaces and classes.

Class Diagram
Interface or Class Description
ContentRewriter The main interface of the Content Rewriter module, defining a simple #rewrite(source, sink, context) method which should do content rewriting from source to target with using the given context.
Source Abstraction of input source which can give either or from the underlying content data source.
Sink Abstraction of output target which can give either or to the underlying content data sink.
ContentRewritingContext Abstraction of the current content rewriting context which allows to set/get/iterate attribute objects by name.
This context abstraction is designed for the case where the caller needs to set some environment specific attributes to be referenced by the content rewriter component during the process.
AbstractTextLineContentRewriter An abstract simple content rewriting component implementation class which reads each line from the source and tries to rewrite it to the sink.
Derived classes may simply override #rewriteLine(String line, ContentRewritingContext context) method by reading the line and context attributes and transforming the line to return. For example, a derived class may use regular expressions to transform the line.
HtmlCleanerContentRewriter HtmlCleaner based content rewriting component implementation class which provides properties for the underlying HtmlCleaner instance and simplified usages for most use cases.


You can find example code in the unit tests: