Overview

The Reverse Proxy Service is a servlet component which can be used as a Reverse Proxy. Normally, this reverse proxy service is used by the Reverse Proxy IFrame portlet by replacing the orignial SRC URLs by proxied URLs to this reverse proxy service.

The Reverse Proxy Service is basically stateless like other normal proxy servers. Client states such as cookies are just passed via the Reverse Proxy Service between clients and the target servers. This is a difference from other HTTP Client based Web Content Portlets which normally manage client states in the server side.

By using Reverse Proxy service component, more sophisticated content can be served by setting content rewriting configurations, and also Cross-Domain Scripting can be enabled. The Web Content Portlet Application includes Reverse Proxy Service servlet and related components.

Note: The ReverseProxyService should be able to access application level session attribute shared by a portlet of IFrameGenericPortlet or its descendant, which is conforming with the Portlet Specification. If you are using Tomcat, then be sure to set the following in the server.xml to enable this proper Portlet API session management requirements. Modify the Connector element (default on port 8080) by adding the folowing attribute:

emptySessionPath="true"

The ReverseProxyService component is initialized by the ReverseProxyServlet configured in the web.xml as follows:

<!-- Default Reverse Proxy Servlet -->
<servlet>
  <servlet-name>ReverseProxyServlet</servlet-name>
  <servlet-class>org.apache.portals.applications.webcontent.proxy.impl.DefaultHttpReverseProxyServlet</servlet-class>
  <init-param>
    <param-name>reverseproxy.configuration</param-name>
    <param-value>/WEB-INF/conf/reverseproxy*.properties</param-value>
  </init-param>
  <init-param>
    <param-name>reverseproxy.configuration.refresh.delay</param-name>
    <param-value>60000</param-value>
  </init-param>
  <load-on-startup>11</load-on-startup>
</servlet>

<!-- Map /rproxy path to the Default Reverse Proxy Servlet -->
<servlet-mapping>
  <servlet-name>ReverseProxyServlet</servlet-name>
  <url-pattern>/rproxy/*</url-pattern>
</servlet-mapping>
          

The above servlet can have the following init parameters:

Name Example Value Description
reverseproxy.configuration /WEB-INF/conf/reverseproxy.properties
or
/WEB-INF/conf/reverseproxy*.properties
or
file:///etc/portals/reverseproxy.properties
or
file:///etc/portals/reverseproxy*.properties
or
classpath:/META-INF/conf/reverseproxy.properties or
classpath:/META-INF/conf/reverseproxy*.properties
The context relative path of the configuration properties file.
The configuration path can be an absolute file path prefixed by 'file:', or it can be a classpath resource prefixed by 'classpath:'.
Also, the configuration path can contain glob expression, in which case all configuration files matched to the glob expression are loaded.
reverseproxy.configuration.refresh.delay 60000 The milliseconds of automatic refreshing interval. If this value is set to a positive number, then the servlet checks if the configuration file has been modified and it tries to reload the configuration if there's any changes. By default, this value is set to zero without automatic refreshment.

In the above servlet mapping configuration, the entry path mapping of this reverse proxy servlet is set to '/rproxy/*'. So, the remaining local path info after the '/rproxy' is used to map the path to the remote url. This mapping configurations and other sophisticated http parameters are configured in '/WEB-INF/conf/reverseproxy.properties' by default. A simple configuration example can be as follows:

# A very simple configuration of reverseproxy.properties
#
# Proxy Pass Reverse Mapping configurations for each category
# ... Put the path item names here. Each path item will be evaluated by the order. 
proxy.reverse.pass = apache, portals

proxy.reverse.pass.apache.local = /apache/
proxy.reverse.pass.apache.remote = http://www.apache.org/

proxy.reverse.pass.portals.local = /portals/
proxy.reverse.pass.portals.remote = http://portals.apache.org/
          

In the above example, just two path mappings are defined: One for http://www.apache.org/ and the other for http://portals.apache.org/. For http://www.apache.org/, the local path, /apache/, is mapped. And for http://portals.apache.org/, the local path, /portals/, is mapped.

That is, the servlet path mappings would be like the followings:

  • /webcontent/rproxy/apache/* --> http://www.apache.org/*
  • /webcontent/rproxy/portals/* --> http://portals.apache.org/*
Note: '/webcontent' is just a context path and '/rproxy' is just a servlet mapping for the reverse proxy servlet.

So, if you visit the reverse proxy url like 'http://localhost:8080/webcontent/rproxy/portals/index.html in your browser, then you can browse the Apache Portals homepage via the reverse proxy component!

Reverse Proxy Service Configurations

The Reverse Proxy Service uses Commons Configuration to read configuration properties files. So, you can leverage the power of Commons Configurations to configure properties. Here are two useful tips for that:

  • Tip #1: Variables, which are previously defined, can be expanded when the configuration has the variable references wrapped by '${' and '}' like the following example:
    my.cookie.policy = netscape
    proxy.http.client.param.cookiePolicy = ${my.cookie.policy}
                    
  • Tip #2: String array typed configuration variables can be defined in one comma separated line or in multiple lines. The following two examples are equivalent each other:
    proxy.reverse.pass.site1.response.cookie.path.rewrite.include = JSESSIONID, PHPSESSIONID
                    
    proxy.reverse.pass.site1.response.cookie.path.rewrite.include = JSESSIONID
    proxy.reverse.pass.site1.response.cookie.path.rewrite.include = PHPSESSIONID
                    

Property Default Value Example Value Description
proxy.http.client.param.allowCircularRedirects false false Flag whether the internal http client object should allow circular redirects.
proxy.http.client.param.cookiePolicy best-match netscape Flag whether the internal http client object should allow circular redirects. Please see the documentation of httpclient 4.x on cookie policies.
proxy.http.client.default.proxy proxyserver1, proxyserver2 The system default comma delimited HTTP proxy names. Each proxy name should be used in the following configuration properties to set detailed http connection options for each proxy.
proxy.http.client.default.proxy.<proxyname>.hostname proxyserver1 <proxyname> should be replaced by the real proxy name. With this example, you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The host name of the target of this proxy.
proxy.http.client.default.proxy.<proxyname>.port 10080 <proxyname> should be replaced by the real proxy name. With this example, you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The port number of the target of this proxy. If you don't set this property, it means ANY port is allowed for this proxy.
proxy.http.client.default.proxy.<proxyname>.scheme http <proxyname> should be replaced by the real proxy name. With this example, you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The scheme of the target of this proxy. If you don't set this property, it means ANY scheme is allowed for this proxy.
proxy.http.connManager.param.maxTotalConnections 2 200 The maximum http connection counts. If there's any http connections available, then it will block the request until it gets a connection.
proxy.http.connManager.param.timeout 0 10000 The maximum waiting time to create http connection. If this is set to zero, it waits without timeout option, so it depends on the system configuration.
proxy.http.route.param.defaultMaxPerRoute 20 The default maximum http connection count per route.
proxy.http.route.<routename>.target.hostname portals.apache.org <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
The host name of the target of this route.
proxy.http.route.<routename>.target.port 80 <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
The port number of the target of this route. If you don't set this property, it means ANY port is allowed for this route.
proxy.http.route.<routename>.target.scheme http <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
The scheme of the target of this route. If you don't set this property, it means ANY scheme is allowed for this route.
proxy.http.route.<routename>.maxConnections 40 <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
The maximum http connection count of the target of this route.
proxy.http.route.<routename>.local <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
The local address to connect from.
proxy.http.route.<routename>.secure false <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
Whether the route is (supposed to be) secure.
proxy.http.route.<routename>.tunnelled plain tunneled <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
Whether the the route is tunnelled through the proxy.
proxy.http.route.<routename>.layered plain layered <routename> should be replaced by the real route name. With this example, you may use 'apache' or 'portals'.
Whether the route is layered.
proxy.http.route.<routename>.proxy proxyserver1, proxyserver2 The comma delimited HTTP proxy names. Each proxy name should be used in the following configuration properties to set detailed http connection options for each proxy.
proxy.http.route.<routename>.proxy.<proxyname>.hostname proxyserver1 <routename> and <proxyname> should be replaced by the real route name and proxy name. With this example, you may use 'apache' or 'portals' for <routename> and you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The host name of the target of this proxy.
proxy.http.route.<routename>.proxy.<proxyname>.port 10080 <routename> and <proxyname> should be replaced by the real route name and proxy name. With this example, you may use 'apache' or 'portals' for <routename> and you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The port number of the target of this proxy. If you don't set this property, it means ANY port is allowed for this proxy.
proxy.http.route.<routename>.proxy.<proxyname>.scheme http <routename> and <proxyname> should be replaced by the real route name and proxy name. With this example, you may use 'apache' or 'portals' for <routename> and you may use 'proxyserver1' or 'proxyserver2' for <proxyname>.
The scheme of the target of this proxy. If you don't set this property, it means ANY scheme is allowed for this proxy.
proxy.reverse.pass.dynamicProxyPathMapperCacheCount 1000 2000 The cache count of proxy path mappers which are dynamically created by glob style mappings.
proxy.reverse.pass.maxMatchingPathPartCount 2 3 The max matching path part count.
proxy.reverse.pass.reverseProxyRequestContextProviderClassName The request context information provider class name, which should implement org.apache.portals.applications.webcontent.proxy.ReverseProxyRequestContextProvider. The reverse proxy service will use this provider to check if the user of the request is in the specific role when the reverse proxy path resource is secured and so some specificed allowed roles are configured.
If not configured, then the default implementation checks if the user is in the specified role via the provided HttpServletRequest object.
proxy.reverse.pass.<pathname>.local /portals/
or
/*.apache/
<pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The base local path info of the reverse proxy mapping. For example, if the relative url is '/webcontent/rproxy/portals/index.html', then because the path info is '/portals/index.html', this path mapping is selected.
This property can have a glob expression with '*'. Each '*' expression is translated into regular expression variable references in the remote URL values.
proxy.reverse.pass.<pathname>.remote http://portals.apache.org/
or
http://$1.apache.org/
<pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The base remote url of the reverse proxy mapping. For example, if the relative url is '/webcontent/rproxy/portals/index.html', then because the path info is '/portals/index.html', this path mapping is selected and the translated remote url can be 'http://portals.apache.org/index.html'.
proxy.reverse.pass.<pathname>.roles.allow dev
or
account, engineering
<pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
When this property is set, the reverse proxy path resource is secured. The request user should be in the specified roles to access this resource.
By default, it is checked via javax.servlet.http.HttpServletRequest#isUserInRole(role). However, you could provide a customized request context provider implementation with proxy.reverse.pass.reverseProxyRequestContextProviderClassName property.
proxy.reverse.pass.<pathname>.rewriter.basic org.apache.portals.applications. webcontent.rewriter. WebContentRewriter <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The basic content rewriter class name.
proxy.reverse.pass.<pathname>.rewriter.basic.property.<propertyName> propertyValue <pathname> and >propertyName< should be replaced by the real property name.
Sets property with the value on the basic rewriter bean instance.
proxy.reverse.pass.<pathname>.rewriter.rulebased org.apache.portals.applications. webcontent.rewriter. WebContentRewriter <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The rule-based content rewriter class name.
proxy.reverse.pass.<pathname>.rewriter.rulebased.property.<propertyName> propertyValue <pathname> and >propertyName< should be replaced by the real property name.
Sets property with the value on the rule-based rewriter bean instance.
proxy.reverse.pass.<pathname>.rewriter.parserAdaptor html, xml <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The comma delimited parser adaptor names. Each parser adaptor name should be used in the following configuration properties to set detailed parser adaptor properties.
proxy.reverse.pass.<pathname>.rewriter.parserAdaptor.<parserAdaptorName> org.apache.portals.applications .webcontent.rewriter.html.neko .NekoParserAdaptor <pathname> and <parserAdaptorName> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname> and you may use 'html' or 'xml' for <parserAdaptorName>.
The parser adaptor class name.
proxy.reverse.pass.<pathname>.rewriter.parserAdaptor.<parserAdaptorName>.mimeType text/html <pathname> and <parserAdaptorName> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname> and you may use 'html' or 'xml' for <parserAdaptorName>.
The mime type which this parser adaptor concerns.
proxy.reverse.pass.<pathname>.rewriter.parserAdaptor.<parserAdaptorName>.<propertyName> propertyValue <pathname>, <parserAdaptorName> and >propertyName< should be replaced by the real property name.
Sets property with the value on the rule-based rewriter bean instance.
proxy.reverse.pass.<pathname>.rewriter.ruleMappings /WEB-INF/conf/rewriter-rules-mapping.xml <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The rewriting rule mappings configuration and rewriter rule definition.
The configuration path can be an absolute file path prefixed by 'file:', or it can be a classpath resource prefixed by 'classpath:'.
proxy.reverse.pass.<pathname>.rewriter.rules /WEB-INF/conf/default-rewriter-rules.xml <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The rewriter rules definition.
The configuration path can be an absolute file path prefixed by 'file:', or it can be a classpath resource prefixed by 'classpath:'.
proxy.reverse.pass.<pathname>.request.header.<headerName> proxy.reverse.pass.somewhere.request.header.Accept-Language = en <pathname> and <headerName> should be replaced by the real path name and header name. With this example, you may use 'apache' or 'portals' for <pathname>.
The default request header values which are sent to the target remote url.
proxy.reverse.pass.<pathname>.request.cookie.<cookieName> proxy.reverse.pass.somewhere.request.cookie.Custom1 = Value1 <pathname> and <cookieName> should be replaced by the real path name and cookie name. With this example, you may use 'apache' or 'portals' for <pathname>.
The default request cookies which are sent to the target remote url.
proxy.reverse.pass.<pathname>.response.cookie.path.rewrite.include JSESSIONID, PHPSESSIONID <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The cookies which should have rewritten paths. By default, every cookie path is rewritten if there's no inclusion/exclusion configuration.
proxy.reverse.pass.<pathname>.response.cookie.path.rewrite.exclude CUSTOM1, CUSTOM2 <pathname> should be replaced by the real path name. With this example, you may use 'apache' or 'portals' for <pathname>.
The cookies which should have the original path as provided by the target web site. If a cookie name is configured as an exclusion one, then the path of the cookie is not rewritten. By default, every cookie path is rewritten if there's no inclusion/exclusion configuration.

Example Reverse Proxy Mapping Configurations

Simple Reverse Proxy Mapping

In this example, we configure a simple reverse proxy mapping from the local path, /apache/, to the remote target path, http://apache.org/. With this example configuration, the mappings are like the following:

  • /webcontent/rproxy/apache/index.html --> http://apache.org/index.html
  • /webcontent/rproxy/apache/foundation/ --> http://apahce.org/foundataion/
  • ...
# Registers a mapping named 'apache' by the following line
proxy.reverse.pass = apache
# Sets the local path
proxy.reverse.pass.apache.local = /apache/
# Sets the remote target path
proxy.reverse.pass.apache.remote = http://apache.org/
# Sets the default web content rewriter to rewrite contents such as links
proxy.reverse.pass.apache.rewriter.basic = org.apache.portals.applications.webcontent.rewriter.WebContentRewriter
# Registers the mime type named 'html' for a parser adaptor 
proxy.reverse.pass.apache.rewriter.parserAdaptor = html
# Sets the parse adaptor for the html mime type name.
proxy.reverse.pass.apache.rewriter.parserAdaptor.html = org.apache.portals.applications.webcontent.proxy.impl.DefaultReverseProxyLinkRewritingParserAaptor
# Sets the mime type string for the registered 'html' mime type name
proxy.reverse.pass.apache.rewriter.parserAdaptor.html.mimeType = text/html
# Sets the flag if the content rewriter should check all reverse proxy mappings during link rewriting
proxy.reverse.pass.apache.rewriter.parserAdaptor.html.property.lookUpAllMappings = true
          

Secured Reverse Proxy Mapping

In this example, we configure a secured reverse proxy mapping from the local path to the secured remote target path. The only difference is that the remote target path starts with 'https:' instead of 'http:'. With this example configuration, the mappings are like the following:

  • /webcontent/rproxy/secure/blogs/ --> https://blogs.apache.org/
  • ...
proxy.reverse.pass = secure_blogs
proxy.reverse.pass.secure_blogs.local = /secure/blogs/
proxy.reverse.pass.secure_blogs.remote = https://blogs.apache.org/
proxy.reverse.pass.secure_blogs.rewriter.basic = org.apache.portals.applications.webcontent.rewriter.WebContentRewriter
proxy.reverse.pass.secure_blogs.rewriter.parserAdaptor = html
proxy.reverse.pass.secure_blogs.rewriter.parserAdaptor.html = org.apache.portals.applications.webcontent.proxy.impl.DefaultReverseProxyLinkRewritingParserAaptor
proxy.reverse.pass.secure_blogs.rewriter.parserAdaptor.html.mimeType = text/html
proxy.reverse.pass.secure_blogs.rewriter.parserAdaptor.html.property.lookUpAllMappings = true
          

Glob-based Reverse Proxy Mapping

In this example, we configure a glob-based reverse proxy mapping from the local paths to the remote target paths. With this example configuration, you can add multiple mappings like the followings:

  • /webcontent/rproxy/www_apache/* --> http://www.apache.org/*
  • /webcontent/rproxy/projects_apache/* --> http://projects.apache.org/*
  • /webcontent/rproxy/people_apache/* --> http://people.apache.org/*
  • /webcontent/rproxy/blogs_apache/* --> http://blogs.apache.org/*
  • ...
# Registers a mapping named 'all_apache' by the following line
proxy.reverse.pass = all_apache
# Sets the local path with glob expression
proxy.reverse.pass.all_apache.local = /*_apache/
# Sets the remote target path with the regular expression references. So, the first glob matched variable will be used as a replacement for $1.
proxy.reverse.pass.all_apache.remote = http://$1.apache.org/
proxy.reverse.pass.all_apache.rewriter.basic = org.apache.portals.applications.webcontent.rewriter.WebContentRewriter
proxy.reverse.pass.all_apache.rewriter.parserAdaptor = html
proxy.reverse.pass.all_apache.rewriter.parserAdaptor.html = org.apache.portals.applications.webcontent.proxy.impl.DefaultReverseProxyLinkRewritingParserAaptor
proxy.reverse.pass.all_apache.rewriter.parserAdaptor.html.mimeType = text/html
proxy.reverse.pass.all_apache.rewriter.parserAdaptor.html.property.lookUpAllMappings = true
          

Advanced HTTP Components Configurations

Because the Reverse Proxy Service uses Apache HTTP Components and it exposes some important configurable properties, you can leverage the power of the configurability of Apache HTTP Components.

Configurations on HTTP Connections

In an enterprise environment, it could be very important to limit the maximum network resources for system availability or performance. You can configure the following property to limit the maximum total http connection count. With the following example, the HTTP connections will not increase over 200.

proxy.http.connManager.param.maxTotalConnections = 200
        

Also, you can configure the timeout for an HTTP connection. With the following example, the timeout is set to 10000 milliseconds.

proxy.http.connManager.param.timeout = 10000
        

Configurations per HTTP Route

The default maximum HTTP connection can be configured per route with the following property.

proxy.http.route.param.defaultMaxPerRoute = 20
        

If the above configuration has been set, the maximum HTTP connections per each route will be limited to 20 even though the total connections are fewer than the maximum total connection count, 200.

You can also configure other HTTP Connection properties per each HTTP route. Here's an example for a specific HTTP route.

proxy.http.route = apache
proxy.http.route.apache.target.hostname = www.apache.org
proxy.http.route.apache.target.port = 80
proxy.http.route.apache.maxConnections = 5
proxy.http.route.apache.proxy = proxyserver1
proxy.http.route.apache.proxy.proxyserver1.hostname = proxyserver1
proxy.http.route.apache.proxy.proxyserver1.port = 8000
        

With the above configuration, we set the maximum HTTP connection count to 5 for this specific route. This configuration forces this HTTP Route to not have more than 5 connections regardless other default configurations.

Also, the above configuration has a proxy server configuration. This is very useful when the web server should retrieve the remote content via the intranet proxy server.