Configuring Elasticsearch for Liferay Portal

Liferay Portal is an open source project, so you won’t be surprised to learn that the default search engine that ships with Liferay Portal is also an open source project. Elasticsearch is a highly scalable, full-text search and analytics engine that ships with Liferay Portal.

By default, Liferay Portal runs Elasticsearch as an embedded search engine, but it’s only supported in production locally or remotely, as a separate server or cluster. This guide walks you through that process.

If you’d rather use Solr, it’s also supported. See here for information on installing and configuring Solr.

To get up and running quickly with Elasticsearch as a remote server, refer to the Installing Elasticsearch article. In that article you’ll find the basic instructions for the installation and configuration of Elasticsearch in a single server environment. This article includes more details and information on clustering and tuning Elasticsearch. In this article you’ll learn to configure your existing Elasticsearch installation for use in Liferay Portal production environments.

If you’ve come here looking for information on search engines in general, or the low level search infrastructure of Liferay Portal, refer instead to the developer tutorial Introduction to Liferay Search.

These terms will be useful to understand as you read this guide:

  • Elasticsearch Home refers to the root folder of your unzipped Elasticsearch installation (for example, elasticsearch-2.2.0).

  • Liferay Home refers to the root folder of your Liferay Portal installation. It contains the osgi, deploy, data, and license folders, among others.

Embedded vs. Remote Operation Mode

When you install Liferay Portal, there’s an embedded Elasticsearch already installed. In embedded mode, Elasticsearch search runs with Liferay Portal as a library in the same JVM. This is done by default to make it easy to test-drive Liferay Portal with minimal configuration. Running Elasticsearch and Liferay Portal in the same process has drawbacks:

  • Your Elasticsearch configuration uses the same JVM options as Liferay Portal.
  • Liferay Portal and Elasticsearch compete for resources.

You wouldn’t run an embedded database like HSQL in production, and you shouldn’t run Elasticsearch in embedded mode in production either. Instead, you want your Liferay Portal installation to run alongside Elasticsearch. This is called remote operation mode, as a standalone server or cluster of server nodes.

Configuring Elasticsearch

For detailed Elasticsearch configuration information, refer to the Elasticsearch documentation.

The name of your Elasticsearch cluster is important. When you’re running Elasticsearch in remote mode, the cluster name is used by Liferay Portal to recognize the Elasticsearch cluster. To learn about setting the Elasticsearch cluster name on the Liferay Portal side, refer below to the section called Configuring the Liferay Elasticsearch adapter.

Elasticsearch’s configuration files are written in YAML and kept in the [Elasticsearch Home]/config folder:

  • elasticsearch.yml is for configuring Elasticsearch modules
  • logging.yml is for configuring Elasticsearch logging

To set the name of the Elasticsearch cluster, open [Elasticsearch Home]/config/elasticsearch.yml and specify

cluster.name: LiferayElasticsearchCluster

Since LiferayElasticsearchCluster is the default name given to the cluster in Liferay Portal, this would work just fine. Of course, you can name your cluster whatever you’d like (we humbly submit the recommendation clustery_mcclusterface).1 You can configure your node name using the same syntax (setting the node.name property).

If you’d rather work from the command line than in the configuration file, navigate to Elasticsearch Home and enter

./bin/elasticsearch --cluster.name clustery_mcclusterface --node.name nody_mcnodeface

Feel free to change the node name or the cluster name. Once you configure Elasticsearch to your liking, start it up.

Starting Elasticsearch

Start Elasticsearch by navigating to Elasticsearch Home and typing

./bin/elasticsearch

if you run Linux, or

\bin\elasticsearch.bat

if you run Windows.

To run as a daemon in the background, add the -d switch to either command:

./bin/elasticsearch -d

When you have Elasticsearch itself installed and running, and Liferay Portal installed and running (do that if you haven’t already) you need to introduce Liferay Portal and Elasticsearch to each other. Fortunately, Liferay provides an adapter that helps it find and integrate your Elasticsearch cluster.

Configuring the Liferay Elasticsearch Adapter

Liferay Portal has an Elasticsearch adapter that ships with Liferay Portal. It’s a module from the Liferay Foundation Suite that’s deployed to the OSGi runtime, titled Liferay Portal Search Elasticsearch. This adapter provides integration between Elasticsearch and Liferay Portal. Before you configure the adapter, make sure Elasticsearch is running.

There are two ways to configure the adapter:

  1. Use the System Settings application in the Control Panel.

  2. Manually create an OSGi configuration file.

It’s convenient to configure the Elasticsearch adapter from System Settings, but this is often only possible during development and testing. If you’re not familiar with System Settings, you can read about it here. Even if you need a configuration file so you can use the same configuration on another Liferay Portal system, you can still use System Settings. Just make the configuration edits you need, then export the .config file with your configuration.

Here are the steps to configure the Elasticsearch adapter from the System Settings application:

  1. Start Liferay Portal.
  2. Navigate to Control PanelConfigurationSystem SettingsFoundation.
  3. Find the Elasticsearch entry (scroll down and browse to it or use the search box) and click the Actions icon (icon-actions.png), then Edit.

    elasticsearch-system-settings.png

    Figure 1: Use the System Settings application in Liferay Portal’s Control Panel to configure the Elasticsearch adapter.

  4. Change Operation Mode to Remote, and then click Save.

    elasticsearch-configuration.png

    Figure 2: Set Operation Mode to Remote from System Settings.

  5. After you switch operation modes (EMBEDDEDREMOTE), you must trigger a re-index. Navigate to Control PanelServer Administration, find the Index Actions section, and click Execute next to Reindex all search indexes.

When preparing a system for production deployment, you want to set up a repeatable deployment process. Therefore, it’s best to use the OSGi configuration file, where your configuration is maintained in a controlled source.

Follow these steps to configure the Elasticsearch adapter using an OSGi configuration file:

  1. Create the following file:

    [Liferay_Home]/osgi/configs/com.liferay.portal.search.elasticsearch.configuration.ElasticsearchConfiguration.config
    
  2. Add this to the configuration file you just created:

    operationMode="REMOTE"
    # If running Elasticsearch from a different computer:
    #transportAddresses="ip.of.elasticsearch.node:9300"
    # Highly recommended for all non-prodcution usage (e.g., practice, tests, diagnostics):
    #logExceptionsOnly="false"
    
  3. Start Liferay Portal or re-index if Liferay Portal is already running.

As you can see from the System Settings entry for Elasticsearch, there are a lot more configuration options available that help you tune your system for optimal performance. For a detailed accounting of these, refer to the reference article on Elasticsearch Settings.

What follows here are some known good configurations for clustering Elasticsearch. These, however, can’t replace the manual process of tuning, testing under load, and tuning again, so we encourage you to examine the settings as well as the Elasticsearch documentation and go through that process once you have a working configuration.

Configuring a Remote Elasticsearch Host

In production systems Elasticsearch and Liferay Portal are installed on different servers. To make Liferay Portal aware of the Elasticsearch cluster, set

transportAddresses=[IP address of Elasticsearch Node]:9300

in the Elasticsearch adapter’s OSGi configuration file. List as many or as few Elasticsearch nodes in this property as you’d like. This tells Liferay Portal the IP address or host name where search requests are to be sent. If using System Settings, set the value in the Transport Addresses property.

On the Elasticsearch side, set the network.host property in your elaticsearch.yml file. This property simultaneously sets both the bind host (the host Elasticsearch listens on for requests) and the publish host (the host name or IP address Elasticsearch uses to communicate with other nodes). See here for more information.

Clustering Elasticsearch in Remote Operation Mode

Clustering Elasticsearch is easy. Each time you run the Elasticsearch start script, a new node is added to the cluster. If you want four nodes, for example, just run ./bin/elasticsearch four times. If you only run the start script once, you have a cluster with just one node.

Elasticsearch’s default configuration works for a cluster of up to ten nodes, since the default number of shards is 5, while the default number of replica shards is 1:

index.number_of_shards: 5
index.number_of_replicas: 1

For more information on configuring an Elasticsearch cluster, see the documentation on Elasticsearch Index Settings.

Advanced Configuration of the Liferay Elasticsearch Adapter

The default configurations for Liferay’s Elasticsearch adapter module are set in a Java class:

com.liferay.portal.search.elasticsearch.configuration.ElasticsearchConfiguration

While the Elasticsearch adapter has a lot of configuration options out of the box, you might find an Elasticsearch configuration you need that isn’t provided by default. In this case, you can add the configuration options you need. If you can configure something for Elasticsearch, you can configure it using the Elasticsearch adapter in Liferay Portal.

Adding Settings and Mappings to the Liferay Elasticsearch Adapter

Liferay Portal has divided the available configuration options into two groups: the ones you’ll use most often by default, and a catch-all for everything else. If you need to configure the local Elasticsearch client when running in remote mode, but the necessary setting isn’t available by default, you can still configure it with the Liferay Elasticsearch adapter. Just specify the settings you need by using one or more of the additionalConfigurations, additionalIndexConfigurations, or additionalTypeMappings settings.

elasticsearch-additional-configs.png

Figure 3: You can add Elasticsearch configurations to the ones currently available in System Settings.

additionalConfigurations is used to define extra settings (defined in YAML) for the embedded Elasticsearch or the local Elasticsearch client when running in remote mode. In production, only one additional configuration can be added here:

client.transport.ping_timeout

The rest of the settings for the client are available as default configuration options in the Liferay Elasticsearch adapter. See the Elasticsearch Settings reference article for more information. See the Elasticsearch documentation for a description of all the client settings and for an example.

additionalIndexConfigurations is used to define extra settings (in JSON or YAML format) that are applied to the Liferay Portal index when it’s created. For example, you can create custom analyzers and filters using this setting. For a complete list of available settings, see the Elasticsearch reference.

Here’s an example that shows how to configure analysis that can be applied to a dynamic template (see below).

{  
    "analysis": {
        "analyzer": {
            "kuromoji_liferay_custom": {
                "filter": [
                    "cjk_width",
                    "kuromoji_baseform",
                    "pos_filter"
                ],
                "tokenizer": "kuromoji_tokenizer"
            }
        },
        "filter": {
            "pos_filter": {
                "type": "kuromoji_part_of_speech"
            }
        }
    }
}

additionalTypeMappings is used to define extra field mappings for the LiferayDocumentType type definition, which are applied when the index is created. Add these field mappings in using JSON syntax. For more information see here and here. Use additionalTypeMappings for new field mappings, but do not try to override existing properties mappings. If any of the properties mappings set here overlap with existing mappings, index creation will fail. Use overrideTypeMappings to replace the default properties mappings.

Here’s an example of a dynamic template that uses the analysis configuration above to analyze all string fields that end with _ja.

{ 
    "LiferayDocumentType": {
        "dynamic_templates": [
            {
                "template_ja": {
                    "mapping": {
                        "analyzer": "kuromoji_liferay_custom",
                        "index": "analyzed",
                        "store": "true",
                        "term_vector": "with_positions_offsets",
                        "type": "string"
                    },
                    "match": "\\w+_ja\\b|\\w+_ja_[A-Z]{2}\\b",
                    "match_mapping_type": "string",
                    "match_pattern": "regex"
                }
            }
        ]
    }
}

The above code adds a new template_ja dynamic template. This overrides the existing dynamic template with the same name. As with dynamic templates, you can add sub-field mappings to Liferay Portal’s type mapping. These are referred to as properties in Elasticsearch.

{ 
    "LiferayDocumentType": {  
        "properties": {   
            "fooName": {
                "index": "not_analyzed",
                "store": "yes",
                "type": "string"
            }
        }   
    }
}

The above example shows how a fooName field might be added to Liferay Portal’s type mapping. Because fooName is not an existing property in the mapping, it will work just fine. If you try to override an existing property mapping, index creation will fail. Instead use the overrideTypeMappings setting to override properties in the mapping.

Use overrideTypeMappings to override Liferay Portal’s default type mappings. This is an advanced feature that should be used only if strictly necessary. If you set this value, the default mappings used to define the Liferay Document Type in Liferay Portal source code (for example, liferay-type-mappings.json) are ignored entirely, so include the whole mappings definition in this property, not just the segment you’re modifying. To make a modification, find the entire list of the current mappings being used to create the index by navigating to the URL

http://[HOST]:[ES_PORT]/liferay-[COMPANY_ID]/_mapping/LiferayDocumentType?pretty

Copy the contents in as the value of this property (either into System Settings or your OSGi configuration file). Leave the opening curly brace {, but delete lines 2-4 entirely:

"liferay-[COMPANY_ID]": {
    "mappings" : {
        "LiferayDocumentType" : {

Then, from the end of the mappings, delete the concluding three curly braces.

        }
    }
}

Now modify whatever mappings you’d like. The changes take effect once you save the changes and trigger a reindex from Server Administration. If you need to add new custom mappings without overriding any defaults, use additionalTypeMappings instead.

Multi-line YAML Configurations

If you configure the settings from the last section using an OSGi configuration file, you might find yourself needing to write YML snippets that span multiple lines. The syntax for that is straightforward and just requires appending each line with \n\, like this:

additionalConfigurations=\
                    cluster.routing.allocation.disk.threshold_enabled: false\n\
                    cluster.service.slow_task_logging_threshold: 600s\n\
                    index.indexing.slowlog.threshold.index.warn: 600s\n\
                    index.search.slowlog.threshold.fetch.warn: 600s\n\
                    index.search.slowlog.threshold.query.warn: 600s\n\
                    monitor.jvm.gc.old.warn: 600s\n\
                    monitor.jvm.gc.young.warn: 600s

Troubleshooting Elasticsearch

Sometimes things don’t go as planned. If you’ve set up Liferay Portal with Elasticsearch in remote mode, but Liferay Portal can’t connect to Elasticsearch, check these things:

  • Cluster name: The value of the cluster.name property in Elasticsearch must match the clusterName property you configured for Liferay’s Elasticsearch adapter.

  • Transport address: The value of the transportAddress property in the Elasticsearch adapter must match the port where Elasticsearch is running. If Liferay Portal is running in embedded mode, and you start a standalone Elasticsearch node or cluster, it detects that port 9300 is taken and switches to port 9301. If you then set Liferay’s Elasticsearch adapter to remote mode, it continues to look for Elasticsearch at the default port (9300).

Now that you have Elasticsearch configured for use with Liferay Portal, if you’re a Liferay Portal customer, you can read here to learn about configuring Shield to secure your Elasticsearch data.

Related Topics

Introduction to Liferay Search

Customizing Liferay Search

1 This is, of course, a nod to all those fans of Boaty Mcboatface.

0 (0 Votes)
Preparing to Install Elasticsearch Previous