Cross cluster replication elasticsearch

Cross cluster replication elasticsearch DEFAULT

High Availability in Elasticsearch – Cross Cluster Replication and Alternatives

How to Achieve High Availability in ES – Cross Cluster Replication, Load Balancer Approach and Snapshots (/Backup and Restore)

Last Updated : August 2020

High availability in Elasticsearch, as well as data recovery, are two of the most crucial needs for running mission critical ES clusters. Key data is stored in various clusters across ES infrastructure and it’s imperative to be able to retrieve data from a certain cluster should it go down for any reason, while preserving service continuity. There are three common methods for ensuring that data remains at high availability, and each method has its own advantages and disadvantages. 

1. Snapshots – Backup and Restore

Firstly, the snapshot method ensures that the data on a chosen cluster is replicated and saved in a different location. In this case, the data is backed up to an external file system, like S3, GCS or any other backend repository that has an official plugin. To ensure high availability with Snapshots, users can designate a periodical backup of a cluster, and the backed up data can then get restored in a secondary cluster. 

The primary disadvantage of the Snapshot method is that the data is not backed up and restored in real-time. The built-in delay can cause users to lose valuable data collected in between Snapshots. If, for example, a user designated a Snapshot and Restore process to occur every 5 minutes, the data being backed up is always 5 minutes behind. If a cluster fails 4 minutes after the last Snapshot was taken, 4 minutes of data will be completely lost. 

The advantage of this method is that users gain a cloud backup of the data. Should both the primary and secondary clusters crash, users can still access their data in the external file system (minus whatever data was collected in the time since the last Snapshot) and restore it in a new cluster. 

Snapshot and Restore Pro and Cons

ProsCons
Ensures HA and DR - to a certain extentData is not backed up in real time - the delay could cause loss of data
Is part of the OSS - free and open sourceData is not available for search in real time in the secondary cluster
Cloud backup for data

2.Cross Cluster Replication (CCR)

The Elasticsearch Cross Cluster Replication feature built into ES can be employed to ensure data recovery (DR) and maintain high availability (HA). In CCR, the indices in clusters are replicated in order to preserve the data in them. The replicated cluster is called the remote or cluster, while the cluster with the backup data is known as the local cluster.

There are various challenges involved in using Cross Cluster Replication for DR and HA. These include:

A. Direct connection between the two clusters 

Pulling data adds additional load

The replication process is performed on a pull basis, with the local cluster pulling the information from the remote cluster. If we call Cluster A the remote/primary cluster and Cluster B the local/secondary cluster, Cluster B is the one pulling information from Cluster A. This means that both clusters have to perform under increased load. The synchronization of the two adds an additional load to both and impacts disk and network functionality. In order to pull the information, Cluster B has to constantly check with Cluster A if there is anything to pull. 

Complicated and manual operation needed to change leaders/followers

In Cross Cluster Replication, you’ll need to configure followers for every index. If Cluster A (the primary) crashes, the same data is replicated in its entirety on Cluster B (the secondary) and remains accessible while Cluster A is fixed. This is how the CCR feature ensures high availability in the case of a cluster failure. While Cluster A is being fixed, the replication process will stop and Cluster B will wait for A to become active again. 

Another challenge with Cross Cluster Replication is that it does not allow users to easily designate leaders and followers and later change them according to your needs. To change which index is the leader and which is the follower requires a manual and very complicated operation. To switch a follower index to a leader index, you need to pause replication for the index, close it, unfollow the leader index, and reopen it as a normal ES index.

B. Dependency between clusters 

Index settings and number of shards

Index settings must be the same for leaders and followers. The follower index has to have the same number of shards as its leader index, you cannot set the follower index shard count to a higher number than the leader. Therefore, it’s also not possible to replicate multiple leader indices into a single follower index.

History

The remote clusters are dependent on the local clusters, and keep old history for them, which is one of the downsides of “soft delete” in Lucene. Though useful for updates and deletes, it adds additional complexity that ties the primary clusters with the secondary clusters. After a period of time, the markers will expire and the leader shards will be able to merge away the history. 

Setting up new leaders

When setting up a new primary cluster, remote recovery must occur and be completed before the leader index can be used. This is a network and disk intensive process, which A) prevents the primary cluster from working until recovery is over, and B) impacts performance on the primary cluster. This is another example of the complex dependency between the clusters.

Cluster health

In order to replicate to the follower, the primary cluster should be healthy. In cases of cluster service down, there would be no writes to the primary and no replications. This is not an active-active solution.

C. Compatibility

Older ES versions and running CCR between different versions

If the remote cluster and local cluster don’t belong to the same or compatible versions of ES, the replication process cannot be performed. Also, it is important to note that CCR only works starting ES v6.7. 

Version upgrades

This also further complicates ES version upgrade, as both indexing and CCR need to be halted before beginning the process.

CCR Pro and Cons

ProsCons
Ensures high availabilityConnection between clusters causes additional load
Ensures data recoveryHigh cluster dependency
Occurs in real time - no delayDoes not support active-active mode well
Included in the X-PackFailover is one-way and difficult to revert
Configuration of followers for every index
Limited number of shards and index settings in follower clusters
Compatibility issues between ES versions
Need to purchase the entire X-Pack in order to get CCR - CCR is not open source/free
Available only starting ES v6.7

3. Multi-Cluster Load Balancer 

In the Multi-Cluster Load Balancer approach, data is directed through an ES Balancer to various clusters in the most efficient way to provide constant and real time high availability. The Balancer enables both the separation and replication of key data across multiple clusters in real time. It also ensures that the clusters themselves are not overloaded with tasks other than their main purpose. The direction of data to each cluster requires the cluster to index the information, but nothing more. 

In both the CCR and the Snapshot methods, errors in the leader affect both clusters, and failover is more complex in CCR. The Multi-Cluster Load Balancer method maintains separate management for each cluster handling data with errors and therefore each error can be taken care of on a case-by-case basis.

When using the Multi-Cluster Load Balancer, the destination clusters don’t need to communicate between themselves, which spares them the added burden of communication and synchronization. This also allows better separation of Elasticsearch topology and reduces cluster communication. For the same reason, this architecture also solves the issue of Elasticsearch version compatibility and supports high availability across all clusters regardless of version. Opster’s Balancer is backward compatible, until version 1.x. The Balancer only requires configuration once per tenant.  

Though the Multi-Cluster Load Balancer requires the addition of another component into the ES system that needs to be monitored and maintained, this can be seen as an advantage. Elasticsearch’s core functionality has always ensured amazing search capabilities and near-real time indexing of fresh data. By moving peripheral tasks to other components, you allow ES to perform its core functionality better. The outsourcing of data recovery and high availability, which are not part of the core tasks, can enable ES’s core to operate at optimal performance and simultaneously provide users with the solution and support they need for high availability.

Multi-Cluster Load Balancer -Pros and Cons

ProsCons
Ensures High AvailabilityNecessitates adding an outside component to ES architecture
Ensures Data RecoveryIndexing to the secondary cluster might cause higher CPU than syncing already indexed files (but it will reduce primary load on disk and network in exchange)
Maximum utilization of every cluster
Supports active-active mode
No communication between clusters - reduces load
No cluster dependency
Backwards compatibility back to ES v1.x
Separate management for every error

To learn more about Opster’s Multi-Cluster Load Balancer, click here.

Sours: https://opster.com/blogs/elasticsearch-cross-cluster-replication-overview/

Open Distro Cross Cluster Replication Plugin

Cross-Cluster Replication Plugin enables users to replicate data across two elasticsearch clusters which enables a number of use cases such as

  • Disaster Recovery (DR) or High Availability (HA): For production systems with high availability requirements, cross-cluster replication provides the safety-net of being able to failover to an alternate cluster in case of failure or outages on the primary cluster.
  • Reduced Query Latency: For critical business needs, responding to the customer query in the shortest time is critical. Replicating data to a cluster that is closest to the user can drastically reduce the query latency. Applications can redirect the customer query to the nearest data center where data has been replicated.
  • Scaling out query heavy workloads: Splitting a query heavy workload across multiple replica clusters improves horizontal scalability.
  • Aggregated reports - Enterprise customers can roll up reports continually from smaller clusters belonging to different lines of business into a central cluster for consolidated reports, dashboards or visualizations.

Following are the tenets that guided our design:

  • Secure: Cross-cluster replication should offer strong security controls for all flows and APIs.
  • Correctness: There must be no difference between the intended contents of the follower index and the leader index.
  • Performance: Replication should not impact indexing rate of the leader cluster.
  • Lag: The replication lag between the leader and the follower cluster should be under a few seconds.
  • Resource usage: Replication should use minimal resources.

The replication machinery is implemented as an Elasticsearch plugin that exposes APIs to control replication, spawns background persistent tasks to asynchronously replicate indices and utilizes snapshot repository abstraction to facilitate bootstrap. Replication relies on cross cluster connection setup from the follower cluster to the leader cluster for connectivity. Once replication is initiated on an index, a background persistent task per primary shard on the follower cluster continuously polls corresponding shards from the leader index and applies the changes on to the follower shard. The replication plugin offers seamless integration with the Open Distro for Elasticsearch Security plugin for secure data transfer and access control.

Build

The project in this package uses the Gradle build system. Gradle comes with excellent documentation that should be your first stop when trying to figure out how to operate or modify the build.

Building from the command line

Set JAVA_HOME to JDK-14 or above

  1. builds and tests project.
  2. cleans previous builds, creates new build and tests project.
  3. launches a 3 node cluster of both leader and follower with replication plugin installed.
  4. launches a single node cluster's and runs all integ tests.
  5. runs a single integ class.
  6. runs a single integ test method (remember to quote the test method name if it contains spaces).

Intellij Setup

Launch Intellij IDEA, choose Import Project, and select the file in the root of this package.

Getting Started

Following steps will help you install the replication plugin on a test cluster.

Step 1: Start test clusters with replication plugin locally

./gradlew clean run -PnumNodes=3 # Set variables for readability (in different terminal window/tab where you will run rest of the steps)export LEADER=localhost:9200 export FOLLOWER=localhost:9201 export LEADER_TRANSPORT=localhost:9300

Step 2: Setup cross-cluster connectivity

Setup remote cluster connection from follower cluster to the leader cluster

curl -XPUT "http://${FOLLOWER}/_cluster/settings?pretty" \ -H 'Content-Type: application/json' -d"{\"persistent\": {\"cluster\": {\"remote\": {\"leader-cluster\": {\"seeds\": [ \"${LEADER_TRANSPORT}\" ] } } } }}"

Step 3: Populate leader cluster with sample data

curl -XPOST "http://${LEADER}/leader-01/_doc/1" -H 'Content-Type: application/json' -d '{"value" : "data1"}'

Step 4: Start replication

curl -XPUT "http://${FOLLOWER}/_opendistro/_replication/follower-01/_start?pretty" \ -H 'Content-type: application/json' \ -d'{"remote_cluster":"leader-cluster", "remote_index": "leader-01"}'

Step 5: Make changes to data on leader index

# 1. Modify doc with id 1 curl -XPOST "http://${LEADER}/leader-01/_doc/1" \ -H 'Content-Type: application/json' -d '{"value" : "data1-modified"}'# 2. Add doc with id 2 curl -XPOST "http://${LEADER}/leader-01/_doc/2" \ -H 'Content-Type: application/json' -d '{"value" : "data2"}'

Step 6: Validate replicated content on the follower

# 1. Validate replicated index exists curl -XGET "http://${FOLLOWER}/_cat/indices"# The above should list "follower-01" as on of the index as well# 2. Check content of follower-01 curl -XGET "http://${FOLLOWER}/follower-01/_search"# The above should list 2 documents with id 1 and 2 and matching content of# leader-01 index on $LEADER cluster

At this point, any changes to leader-01 continues to be replicated to follower-01.

Step 7: Stop replication

Stopping replication opens up the replicated index on the follower cluster for writes. This can be leveraged to failover to the follower cluster when the need arises.

curl -XPOST "http://${FOLLOWER}/_opendistro/_replication/follower-01/_stop?pretty" \ -H 'Content-type: application/json' -d'{}'# You can confirm data isn't replicated any more by making modifications to# leader-01 index on $LEADER cluster

For much detailed instructions/examples please refer to HANDBOOK.

For more details on design and architecture, please refer to RFC under docs.

CONTRIBUTING GUIDELINES

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Sours: https://github.com/opendistro-for-elasticsearch/cross-cluster-replication
  1. 2015 chevy 1500
  2. Iphone xr case quotes
  3. Four eyes furniture

Are you trying to derive deeper insights by replicating your data to Ealasticsearch Cluster? Well, you have landed in the right place. Now, it has become easier to replicate data to Elasticsearch Cluster.

This article will give you a comprehensive guide to Elasticsearch and its various applications. You will also explore various methods to set up your Elasticsearch Replication with their pros and cons. In the end, you will be in the position to choose the best method based on your requirements. Read along to know more about these methods.

Table of Contents

Prerequisites

You will have a much easier time understanding the ways for setting up the Elasticsearch Cluster Replication if you have gone through the following aspects:

  • An active Elasticsearch account.
  • Working knowledge of Elasticsearch.
  • A clear understanding of the data that needs to be transferred.

Introduction to Elasticsearch

Elasticsearch is a distributed, open-source search & analytics engine built on Apache Lucene and developed in Java. At its core, Elasticsearch is a server that can process JSON requests & return JSON data. Elasticsearch allows you to store, search, & analyze huge volumes of data quickly & in real-time & returns answers in milliseconds. It uses a structure based on documents instead of tables & schemas and comes with extensive REST APIs for storing & searching data. Its backend components include Cluster, Node, Shards & Replicas. One of the most famous tech stacks consists of ELK Stack where ELK stands for Elastic, Logstash, Beats & Kibana. Below are the primary use cases of Elasticsearch:- 

  • Application Search 
  • Website Search 
  • Enterprise Search 
  • Logging & Log Analytics
  • Security Analytics
  • Business Analytics 
  • Infrastructure metrics & container monitoring

To know more about Elasticsearch, visit this link.

Applications of Elasticsearch Cluster Replication

Replication of data is one of the most important & necessary features demanded in today’s world. In Elasticsearch, there are multiple ways to replicate clusters to ensure the uninterrupted availability of data. Key data is stored in various clusters across ES infrastructure and it’s imperative to be able to retrieve data from a certain cluster should it go down for any reason while preserving service continuity. 

Following are some useful Applications of Elasticsearch Cluster Replication:-

  • Disaster Recovery
  • Data Locality
  • Centralized Reporting
  • Server Performance

1) Disaster Recovery

This is the need of any application running in crowded environments or non-crowded environments. It is important to recover for applications under disasters like unnatural disruption, cyber-attacks, etc. ES has always provided a solution for that in the previous versions & the most reliable one in the latest version in the technique known as Cross-Cluster Replication or CCR.   

2) Data Locality

It means the availability of data close to the required station. For example, a product catalog or any reference data that usually gets accessed around the globe needs to get replicated to multiple places so that it might get accessed easily & quickly. Data needs to be available at a central place so that it might be accessed from anywhere in the world. 

3) Centralized Reporting

Elasticsearch provides a flexible replicating system that allows data to get replicated from multiple clusters & get aggregated to one central node to be available for anyone to see centralized reporting. Many Organizations need centralized reporting including SuperMarkets, Banks, Post Offices where data is present in a scattered form at branch level & then needs to be aggregated for centralized reporting & decision making. 

4) Server Performance

Data Replication usually boosts the Server Performance as well. As whole data is not kept or present on a single server, it is kept on different servers hence the load on each server or node is balanced which results in the performance-boosting of the server. 

Method 1: Using Snapshots to Set Up Elasticsearch Cluster Replication

The Snapshot method would require you to save the chosen cluster in a different location. This method also provides a cloud backup of the data. However, it doesn’t allow you to restore and backup data in real-time. Also, built-in delay can cause users to lose valuable data.

Method 2: Using Cross-Cluster Replication (CCR) to Set Up Elasticsearch Cluster Replication

This method would require you to have a license that includes Cross-Cluster Replication (CCR). This method requires a lot of configurations and continuous monitoring. This is a time-consuming exercise and would need you to invest in Engineering Bandwidth.

Method 3: Using Multi-Cluster Load Balancer to Set Up Elasticsearch Cluster Replication

This method is achieved by using an Elasticsearch “Coordinating only node” which is also known as “client node”. These nodes help in processing incoming HTTP requests and redirecting operations to other nodes. This method would also require you to invest in Engineering Bandwidth.

Method 4: Using Hevo to Set Up Elasticsearch Cluster Replication

Hevo Data is an automated Data Pipeline platform that can move your data to Elasticsearch Cluster very quickly without writing a single line of code. It is simple, hassle-free, and reliable.

Moreover, Hevo offers a fully-managed solution to set up data integration from 100+ data sources (including 30+ free data sources) and will let you directly load data to Elasticsearch Cluster, Data Warehouses, Databases or the destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.

Explore more about Hevo Data by signing up for the 14-day trial today!

Methods to Set Up Elasticsearch Cluster Replication

There are many ways of replicating data to Elasticsearch Cluster. Here, you are going to look into 4 popular methods. In the end, you will have a good understanding of each of these methods. Below are the 4 methods that you can use to set up Elasticsearch Cluster Replication:

Method 1: Using Snapshots to Set Up Elasticsearch Cluster Replication

Firstly, the Snapshot method ensures that the data on a chosen cluster is replicated and saved in a different location. In this case, the data is backed up to an external file system, like S3, GCS, or any other backend repository that has an official plugin.

The primary disadvantage of the Snapshot method is that the data is not backed up and restored in real-time. The built-in delay can cause users to lose valuable data collected between Snapshots. 

The advantage of this method is that users gain a cloud backup of the data. Should both the primary and secondary clusters crash, users can still access their data in the external file system (minus whatever data was collected in the time since the last Snapshot) and restore it in a new cluster.

Method 2: Using Cross-Cluster Replication (CCR) to Set Up Elasticsearch Cluster Replication

This methodology of Cluster Replication was not available before Elasticsearch 6.7.0 version & replication was done using other methods & third-party technologies which used to be very cumbersome ways to replicate data. With Cross Clustering implemented natively, it has now become a very useful solution with many advantages like comprehensive error handling, availability of APIs which can help in better usage of replication. Elasticsearch also provides User Interface in Kibana which helps in managing & monitoring CCR. 

Cross cluster replication

In CCR, the indices in clusters are replicated to preserve the data in them. The replicated cluster is called the remote cluster, while the cluster with the backup data is known as the localcluster.

CCR is designed around an active-passive index model. An index in one Elasticsearch cluster can be configured to replicate changes from an index in another Elasticsearch cluster. The index that is replicating changes is termed a “follower index” and the index being replicated from is termed the “leader index”.

Setting Up Cross-Cluster Replication

To set up Cross-Cluster Replication, the following are the prerequisites:- 

  • A license on both clusters that include cross-cluster replication. You can start your free trial using this link.
  • Need some privileges including read_ccr, monitor & read privileges for the leader index on the remote cluster. The privileges can be accessed & changed using the following link.
  • Similarly, you need some privileges like manage_ccr, monitor, read, write & manage_follow_index to configure remote clusters and follower indices on the local cluster. You can configure local cluster privileges using the following link.     
  • An index on the remote cluster contains the data needed for replication. 
Connecting to a Remote Cluster

To replicate an index on a remote cluster (Cluster A) to a local cluster (Cluster B), you configure Cluster A as a remote on Cluster B. 

Connecting to a remote cluster

To configure a remote cluster from Stack Management in Kibana:

  1. Select Remote Clusters from the side navigation.
  2. Specify the IP address or hostname of the remote cluster (ClusterB), followed by the transport port of the remote cluster (defaults to 9300). For example, 192.168.1.1:9300.

Further steps included to complete the replication process are:- 

  • Enabling soft deletes on leader indices.
  • Creating a follower index to replicate a specific index.
  • Creating an auto-follow pattern to replicate time-series indices.

Further details on the above-mentioned processes can be found out at the following link. 

Method 3: Using Multi-Cluster Load Balancer to Set Up Elasticsearch Cluster Replication

This can be achieved by using an Elasticsearch Coordinating only node, also known as the client node on the same machine as Kibana. This is the easiest & most effective way to set up a load balancer for ES nodes to use with Kibana. 

Elasticsearch Coordinating only nodes are essentially load-balancers that act in a very smart way. They process incoming HTTP requests, redirect operations to the other nodes in the cluster as needed, and gather and return the results. For more information on this process, you can have a look at the following link. 

The following steps are required to balance the load on multiple Elasticsearch nodes:- 

  1. Install Elasticsearch on the same machine as Kibana.
  2. The node must be configured as a Coordinating only node. In elasticsearch.yml file, the following changes must be made:- 
  1. Configure the client node to join the Elasticsearch Cluster. In elasticsearch.yml, set the cluster.name to the name of your cluster.

      4.   Make the following changes in the file as well:-

     5.    Lastly, make sure that Kibana is configured to point to the local client node. 

Method 4: Using Hevo to Set Up Elasticsearch Cluster Replication

Hevo Banner

Hevo Data, a No-code Data Pipeline, helps you directly transfer data from 100+ other data sources to Elasticsearch Cluster, Data Warehouses, BI tools, or a destination of your choice in a completely hassle-free & automated manner. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Hevo Data takes care of all your Data Preprocessing needs and lets you focus on key business activities and draw a much powerful insight on how to generate more leads, retain customers, and take your business to new heights of profitability. It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. 

Hevo will take all groundwork of moving data to Ealsticsearch Cluster in a Secure, Consistent, and Reliable fashion. Sign up for a 14-day free trial to try Hevo today.

Here are more reasons to try Hevo:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Conclusion

Cluster Replication is an important feature introduced in Elasticsearch after version 6.7.0 that resolved a very critical issue. These methods mentioned above for Elasticsearch Cluster Replication need some technical expertise depending upon the recovery management plans & load management plans. But Hevo Data can guarantee you smooth management and processing.

Hevo is a No-code data pipeline. It has pre-built integrations with 100+ sources. You can connect your SaaS platforms, databases, etc. to any data warehouse of your choice, without writing any code or worrying about maintenance. If you are interested, you can try Hevo by signing up for the 14-day free trial.

Share your thoughts on Elasticsearch Cluster Replication in the comments!

Sours: https://hevodata.com/learn/achieve-elasticsearch-cluster-replication/

Posted On: Oct 5, 2021

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) now supports cross-cluster replication, enabling you to automate copying and synchronizing of indices from one domain to another at low latency in same or different AWS accounts or Regions. With cross-cluster replication, you can achieve high availability for your mission critical applications with sequential data consistency. 

Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. To ensure redundancy and availability customers configure replicas and deploy their domains across multiple availability zones, protecting them against instance failures and availability zone outages. However, the domain itself can still be a single point of failure. To protect against such failures, customers previously had to create a second domain, fork their input data streams to the two clusters, and place a load balancer in front of the two domains to balance incoming search requests. This set-up adds complexity and cost as it requires you to use additional technologies like Apache Kafka or AWS Lambda to monitor and correct data inconsistencies between the domains. 

With cross-cluster replication for Amazon OpenSearch Service, you can replicate indices at low latency from one domain to another in same or different AWS Regions without needing additional technologies. Cross-cluster replication provides sequential consistency while continuously copying data from the leader index to the follower index. Sequential consistency ensures the leader and the follower index return the same result set after operations are applied on the indices in the same order. Cross-cluster replication is designed to minimize delivery lag between the leader and the follower index. You can continuously monitor the replication status via APIs. Additionally, if you have indices that follow an index pattern, you can create automatic follow rules and they will be automatically replicated.

Cross-cluster replication is available on the service today for domains running Elasticsearch 7.10. Cross cluster replication is also available as an open-source feature in OpenSearch 1.1, which is planned to be available on the service soon. 

Cross-cluster replication is available for Amazon OpenSearch Service across 25 regions globally. Please refer to the AWS Region Table for more information about Amazon OpenSearch Service availability. To learn more about cross-cluster replication, please see the documentation. To learn more about Amazon OpenSearch Service, please visit the product page.

Sours: https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-opensearch-service-amazon-elasticsearch-service-cross-cluster-replication/

Replication elasticsearch cluster cross

Cross-cluster replicationedit

With cross-cluster replication, you can replicate indices across clusters to:

  • Continue handling search requests in the event of a datacenter outage
  • Prevent search volume from impacting indexing throughput
  • Reduce search latency by processing search requests in geo-proximity to the user

Cross-cluster replication uses an active-passive model. You index to a leader index, and the data is replicated to one or more read-only follower indices. Before you can add a follower index to a cluster, you must configure the remote cluster that contains the leader index.

When the leader index receives writes, the follower indices pull changes from the leader index on the remote cluster. You can manually create follower indices, or configure auto-follow patterns to automatically create follower indices for new time series indices.

You configure cross-cluster replication clusters in a uni-directional or bi-directional setup:

  • In a uni-directional configuration, one cluster contains only leader indices, and the other cluster contains only follower indices.
  • In a bi-directional configuration, each cluster contains both leader and follower indices.

In a uni-directional configuration, the cluster containing follower indices must be running the same or newer version of Elasticsearch as the remote cluster. If newer, the versions must also be compatible as outlined in the following matrix.

Multi-cluster architecturesedit

Use cross-cluster replication to construct several multi-cluster architectures within the Elastic Stack:

  • Disaster recovery in case a primary cluster fails, with a secondary cluster serving as a hot backup
  • Data locality to maintain multiple copies of the dataset close to the application servers (and users), and reduce costly latency
  • Centralized reporting for minimizing network traffic and latency in querying multiple geo-distributed Elasticsearch clusters, or for preventing search load from interfering with indexing by offloading search to a secondary cluster

Watch the cross-cluster replication webinar to learn more about the following use cases. Then, set up cross-cluster replication on your local machine and work through the demo from the webinar.

Disaster recovery and high availabilityedit

Disaster recovery provides your mission-critical applications with the tolerance to withstand datacenter or region outages. This use case is the most common deployment of cross-cluster replication. You can configure clusters in different architectures to support disaster recovery and high availability:

Single disaster recovery datacenteredit

In this configuration, data is replicated from the production datacenter to the disaster recovery datacenter. Because the follower indices replicate the leader index, your application can use the disaster recovery datacenter if the production datacenter is unavailable.

Production datacenter that replicates data to a disaster recovery datacenter
Multiple disaster recovery datacentersedit

You can replicate data from one datacenter to multiple datacenters. This configuration provides both disaster recovery and high availability, ensuring that data is replicated in two datacenters if the primary datacenter is down or unavailable.

In the following diagram, data from Datacenter A is replicated to Datacenter B and Datacenter C, which both have a read-only copy of the leader index from Datacenter A.

Production datacenter that replicates data to two other datacenters
Chained replicationedit

You can replicate data across multiple datacenters to form a replication chain. In the following diagram, Datacenter A contains the leader index. Datacenter B replicates data from Datacenter A, and Datacenter C replicates from the follower indices in Datacenter B. The connection between these datacenters forms a chained replication pattern.

Three datacenters connected to form a replication chain
Bi-directional replicationedit

In a bi-directional replication setup, all clusters have access to view all data, and all clusters have an index to write to without manually implementing failover. Applications can write to the local index within each datacenter, and read across multiple indices for a global view of all information.

This configuration requires no manual intervention when a cluster or datacenter is unavailable. In the following diagram, if Datacenter A is unavailable, you can continue using Datacenter B without manual failover. When Datacenter A comes online, replication resumes between the clusters.

Bi-directional configuration where each cluster contains both a leader index and follower indices

This configuration is particularly useful for index-only workloads, where no updates to document values occur. In this configuration, documents indexed by Elasticsearch are immutable. Clients are located in each datacenter alongside the Elasticsearch cluster, and do not communicate with clusters in different datacenters.

Data localityedit

Bringing data closer to your users or application server can reduce latency and response time. This methodology also applies when replicating data in Elasticsearch. For example, you can replicate a product catalog or reference dataset to 20 or more datacenters around the world to minimize the distance between the data and the application server.

In the following diagram, data is replicated from one datacenter to three additional datacenters, each in their own region. The central datacenter contains the leader index, and the additional datacenters contain follower indices that replicate data in that particular region. This configuration puts data closer to the application accessing it.

A centralized datacenter replicated across three other datacenters

Centralized reportingedit

Using a centralized reporting cluster is useful when querying across a large network is inefficient. In this configuration, you replicate data from many smaller clusters to the centralized reporting cluster.

For example, a large global bank might have 100 Elasticsearch clusters around the world that are distributed across different regions for each bank branch. Using cross-cluster replication, the bank can replicate events from all 100 banks to a central cluster to analyze and aggregate events locally for reporting. Rather than maintaining a mirrored cluster, the bank can use cross-cluster replication to replicate specific indices.

In the following diagram, data from three datacenters in different regions is replicated to a centralized reporting cluster. This configuration enables you to copy data from regional hubs to a central cluster, where you can run all reports locally.

Three clusters in different regions sending data to a centralized reporting cluster for analysis

Replication mechanicsedit

Although you set up cross-cluster replication at the index level, Elasticsearch achieves replication at the shard level. When a follower index is created, each shard in that index pulls changes from its corresponding shard in the leader index, which means that a follower index has the same number of shards as its leader index. All operations on the leader are replicated by the follower, such as operations to create, update, or delete a document. These requests can be served from any copy of the leader shard (primary or replica).

When a follower shard sends a read request, the leader shard responds with any new operations, limited by the read parameters that you establish when configuring the follower index. If no new operations are available, the leader shard waits up to the configured timeout for new operations. If the timeout elapses, the leader shard responds to the follower shard that there are no new operations. The follower shard updates shard statistics and immediately sends another read request to the leader shard. This communication model ensures that network connections between the remote cluster and the local cluster are continually in use, avoiding forceful termination by an external source such as a firewall.

If a read request fails, the cause of the failure is inspected. If the cause of the failure is deemed to be recoverable (such as a network failure), the follower shard enters into a retry loop. Otherwise, the follower shard pauses until you resume it.

Processing updatesedit

You can’t manually modify a follower index’s mappings or aliases. To make changes, you must update the leader index. Because they are read-only, follower indices reject writes in all configurations.

For example, you index a document named in Datacenter A, which replicates to Datacenter B. If a client connects to Datacenter B and attempts to update , the request fails. To update , the client must connect to Datacenter A and update the document in the leader index.

When a follower shard receives operations from the leader shard, it places those operations in a write buffer. The follower shard submits bulk write requests using operations from the write buffer. If the write buffer exceeds its configured limits, no additional read requests are sent. This configuration provides a back-pressure against read requests, allowing the follower shard to resume sending read requests when the write buffer is no longer full.

To manage how operations are replicated from the leader index, you can configure settings when creating the follower index.

The follower index automatically retrieves some updates applied to the leader index, while other updates are retrieved as needed:

Update type

Automatic

As needed

Alias

Yes

No

Mapping

No

Yes

Settings

No

Yes

For example, changing the number of replicas on the leader index is not replicated by the follower index, so that setting might not be retrieved.

If you apply a non-dynamic settings change to the leader index that is needed by the follower index, the follower index closes itself, applies the settings update, and then re-opens itself. The follower index is unavailable for reads and cannot replicate writes during this cycle.

Initializing followers using remote recoveryedit

When you create a follower index, you cannot use it until it is fully initialized. The remote recovery process builds a new copy of a shard on a follower node by copying data from the primary shard in the leader cluster.

Elasticsearch uses this remote recovery process to bootstrap a follower index using the data from the leader index. This process provides the follower with a copy of the current state of the leader index, even if a complete history of changes is not available on the leader due to Lucene segment merging.

Remote recovery is a network intensive process that transfers all of the Lucene segment files from the leader cluster to the follower cluster. The follower requests that a recovery session be initiated on the primary shard in the leader cluster. The follower then requests file chunks concurrently from the leader. By default, the process concurrently requests five 1MB file chunks. This default behavior is designed to support leader and follower clusters with high network latency between them.

You can modify dynamic remote recovery settings to rate-limit the transmitted data and manage the resources consumed by remote recoveries.

Use the recovery API on the cluster containing the follower index to obtain information about an in-progress remote recovery. Because Elasticsearch implements remote recoveries using the snapshot and restore infrastructure, running remote recoveries are labelled as type in the recovery API.

Replicating a leader requires soft deletesedit

Cross-cluster replication works by replaying the history of individual write operations that were performed on the shards of the leader index. Elasticsearch needs to retain the history of these operations on the leader shards so that they can be pulled by the follower shard tasks. The underlying mechanism used to retain these operations is soft deletes.

A soft delete occurs whenever an existing document is deleted or updated. By retaining these soft deletes up to configurable limits, the history of operations can be retained on the leader shards and made available to the follower shard tasks as it replays the history of operations.

The setting defines the maximum time to retain a shard history retention lease before it is considered expired. This setting determines how long the cluster containing your follower index can be offline, which is 12 hours by default. If a shard copy recovers after its retention lease expires, but the missing operations are still available on the leader index, then Elasticsearch will establish a new lease and copy the missing operations. However Elasticsearch does not guarantee to retain unleased operations, so it is also possible that some of the missing operations have been discarded by the leader and are now completely unavailable. If this happens then the follower cannot recover automatically so you must recreate it.

Soft deletes must be enabled for indices that you want to use as leader indices. Soft deletes are enabled by default on new indices created on or after Elasticsearch 7.0.0.

Cross-cluster replication cannot be used on existing indices created using Elasticsearch 7.0.0 or earlier, where soft deletes are disabled. You must reindex your data into a new index with soft deletes enabled.

Use cross-cluster replicationedit

This following sections provide more information about how to configure and use cross-cluster replication:

Sours: https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-ccr.html
Elastic Cross Cluster Search: Expanding Logging and Metrics Use Cases

Follow the Leader: An Introduction to Cross-Cluster Replication in Elasticsearch

The Needs of the Many

The ability to natively replicate data to an Elasticsearch cluster from another Elasticsearch cluster is our most heavily requested feature, and one that our users have long been asking for. After years of engineering effort laying the necessary foundations, building new fundamental technology into Lucene, and iterating and refining our initial design, we are excited to announce that cross-cluster replication (CCR) is now available and production-ready in Elasticsearch 6.7.0. In this post, the first in a series, we will provide some brief introduction to what we have implemented, and some technical background on CCR. In future posts we will deep-dive into specific CCR use cases.

Cross-cluster replication in Elasticsearch enables a variety of mission-critical use cases within Elasticsearch and the Elastic Stack:

  • Disaster Recovery (DR) / High Availability (HA): Tolerance to withstand a datacenter or region outage is a requirement for many mission-critical applications. This requirement was previously solved in Elasticsearch with additional technologies, which added additional complexity and management overhead. Satisfying cross-datacenter DR / HA requirements can now be solved natively in Elasticsearch, utilizing CCR, with no additional technologies.
  • Data Locality: Replicate data in Elasticsearch to get closer to the user or application server, reducing latencies that cost you money. For example, a product catalog or reference dataset may be replicated to twenty or more data centers around the world, to minimize the distance between the data and the application server. Another use case may be a stock trading firm with offices in London and New York. All trades in the London office are written locally and replicated to the New York office, and all trades in the New York office are written locally and replicated to London. Both offices have a global view for all trades.
  • Centralized Reporting: Replicate data from a large number of smaller clusters back to a centralized reporting cluster. This is useful when it may not be efficient to query across a large network. For example, a large global bank may have 100 Elasticsearch clusters around the world, each within a different bank branch. We can use CCR to replicate events from all 100 banks around the world back to a central cluster where we can analyze and aggregate events locally.

Prior to Elasticsearch 6.7.0, these use cases could be partially addressed with third-party technologies, which is cumbersome, carries a lot of administration overhead, and has significant drawbacks. With cross-cluster replication natively integrated into Elasticsearch, we free our users of the burden and drawbacks of managing complicated solutions, can offer advantages above what existing solutions can provide (e.g., comprehensive error handling), and provide APIs within Elasticsearch and UIs in Kibana for managing and monitoring CCR.

Stay tuned for our follow-up posts to dive into each of these use-cases in greater detail.

Getting Started with Cross-Cluster Replication

Hop on over to our downloads page to obtain the latest releases of Elasticsearch and Kibana, and dive in with our getting started guide.

CCR is a platinum level feature, and is available through 30-day trial license that can be activated through the start trial API or directly from Kibana.

A Technical Introduction to Cross-Cluster Replication

CCR is designed around an active-passive index model. An index in one Elasticsearch cluster can be configured to replicate changes from an index in another Elasticsearch cluster. The index that is replicating changes is termed a “follower index” and the index being replicated from is termed the “leader index”. The follower index is passive in that it can serve read requests and searches but can not accept direct writes; only the leader index is active for direct writes. As CCR is managed at the index level, a cluster can contain both leader indices and follower indices. In this way, you can solve some active-active use cases by replicating some indices one way (e.g., from a US cluster to a European cluster), and other indices the other way (from a European cluster to a US cluster).

Replication is done at the shard level; each shard in the follower index will pull changes from its corresponding shard in the leader index, which means that a follower index has the same number of shards as its leader index. All operations are replicated by the follower, so that operations to create, update, or delete a document are replicated. Replication is done in near real-time; as soon as the global checkpoint on a shard advances an operation is eligible to be replicated by a following shard. Operations are pulled and indexed efficiently in bulk by the following shard, and multiple requests to pull changes can be in flight concurrently. These read requests can be served by the primary and its replicas, and aside from reading from the shard, do not put any additional load on the leader. This design allows CCR to scale with your production load, so that you can continue to enjoy the high-throughput indexing rates you have come to appreciate (and expect) in Elasticsearch.

CCR supports both newly-created indices and existing indices. When a follower is initially configured, it will bootstrap itself from the leader index by copying the underlying files from the leader index in a process similar to how a replica recovers from the primary. After this recovery process completes, CCR will replicate any additional operations from the leader. Mappings and settings changes are automatically replicated as-needed from the leader index.

From time to time, CCR might encounter error scenarios (e.g., a network failure). CCR is able to automatically classify these errors into recoverable errors, and fatal errors. When a recoverable error occurs, CCR enters a retry loop so that as soon as the situation that led to the failure is resolved, CCR will resume replication.

Status of replication can be monitored through a dedicated API. Via this API, you can monitor how closely the follower is tracking the leader, see detailed statistics about the performance of CCR, and track any errors that require your attention.

We have integrated CCR with the monitoring and management apps within Kibana. The monitoring UI gives you insight into CCR progress and error reporting.

Elasticsearch CCR Monitoring UI in Kibana

Elasticsearch CCR Monitoring UI in Kibana

The management UI enables you to configure remote clusters, configure follower indices, and manage auto-follower patterns for automatic index replication.

Elasticsearch CCR Management UI in Kibana

Elasticsearch CCR Management UI in Kibana

Simon Says, Follow Today’s Indices

Many of our users have workloads that create new indices on a periodic basis. For example, daily indices from log files that Filebeat is pushing or indices rolled over automatically by index lifecycle management. Rather than having to manually create follower indices to replicate these indices from a source cluster, we have built auto-follow functionality directly into CCR. This functionality allows you to configure patterns of indices to automatically be replicated from a source cluster. CCR will monitor source clusters for indices that match these patterns, and configure following indices to replicate those matching leader indices.

We have also integrated CCR and ILM so that time-based indices can be replicated by CCR, and managed in both source and target clusters by ILM. For example, ILM understands when a leader index is being replicated by CCR and so carefully manages destructive operations like shrinking and deleting indices until CCR has finished replicating.

Those Who Don’t Know History

For CCR to be able to replicate changes, we require a history of operations on the shards of the leader index, and pointers on each shard to know what operations are safe to replicate. This history of operations is governed by sequence IDs and the pointer is known as the global checkpoint. There is a complication though. When a document is updated or deleted in Lucene, Lucene marks a bit to record that the document is deleted. The document is retained on disk until a future merge operation merges the deleted documents away. If CCR replicates this operation before the delete is merged away, then all is well. Yet, merges occur on their own lifecycle, and this means that a deleted document could be merged away before CCR has had the opportunity to replicate the operation. Without some ability to control when deleted documents are merged away, CCR could miss operations and be unable to fully replicate the history of operations to the follower index. Early in the design of CCR we planned to use the Elasticsearch translog as a source for the history of these operations; this would have side-stepped the problem. We quickly realized that the translog was not designed for the access patterns that CCR needed to perform effectively. We considered putting additional data structures on top of and alongside the translog to achieve the performance that we needed, but there are limitations to this approach. For one, it would add more complexity to one of the most critical components of our system; this is simply not consistent with our engineering philosophy. Further, it would tie our hands for future changes that we intend to build on top of the history of operations where we would be forced to either limit the types of searches that can be done on the history of operations, or reimplement all of Lucene on top of the translog. With this insight, we realized that we needed to build natively into Lucene functionality that would give us control over when a deleted document is merged away, effectively pushing the history of operations into Lucene. We call this technology “soft deletes”. This investment into Lucene will pay off for years to come as not only is CCR built on it, but we are reworking our replication model on top of soft deletes, and the forthcoming changes API will be based on them too. Soft deletes are required to be enabled on leader indices.

What remains then is for a follower to be able to influence when soft deleted documents are merged away on the leader. To this end, we introduced shard history retention leases. With a shard history retention lease, a follower can mark in the history of operations on the leader where in history that follower currently is. The leader shards know that operations below that marker are safe to be merged away, but any operations above that marker must be retained for until the follower has had the opportunity to replicate them. These markers ensure that if a follower goes offline temporarily, the leader will retain operations that have not yet been replicated. Since retaining this history requires additional storage on the leader, these markers are only valid for a limited period after which the marker will expire and the leader shards will be free to merge away history. You can tune the length of this period based on how much additional storage you are willing to retain in case a follower goes offline, and how long you’re willing to accept a follower being offline before it would otherwise have to be re-bootstrapped from the leader.

Summary

We are delighted for you to try out CCR and share with us your feedback on the functionality. We hope you enjoy this functionality as much as we enjoyed building it. Stay tuned for future posts in this series where we will roll up our sleeves and explain in more detail some of the functionality in CCR and the use-cases that CCR is aimed at. And if you have any CCR questions, reach out on the Discuss forum.


The thumbnail image associated with this post is copyright by NASA, and is licensed under the CC BY-NC 2.0 license. The banner image associated with this post is copyright by Rawpixel Ltd, is licensed under the CC BY 2.0 license, and is cropped from the original

Sours: https://www.elastic.co/blog/follow-the-leader-an-introduction-to-cross-cluster-replication-in-elasticsearch

Now discussing:

Today we are excited to announce the experimental release of Cross-Cluster Replication in Open Distro for Elasticsearch, which enables customers to replicate indices from one Elasticsearch cluster to another. The key drivers for the new native replication feature are:

  • High Availability: Cross-cluster replication ensures uninterrupted service availability with the ability to failover to an alternate cluster in case of failure or outages on the primary cluster.
  • Reduced Latency: Replicating data to a cluster that is closer to the application users minimizes the query latency. **
  • Horizontal scalability: Splitting a query heavy workload across multiple replica clusters improves application availability.
  • Aggregated reports: Enterprise customers can roll up reports continually from smaller clusters belonging to different lines of business into a central cluster for consolidated reports, dashboards or visualizations.

This experimental release is not intended for production use. Users can try out the plugin in a sandbox environment and provide feedback on issues and enhancements.

Introducing Cross-Cluster Replication

Cross-Cluster Replication follows an active-passive replication model where the follower cluster (where the data is replicated) pulls data from the leader (source) cluster. The tenets that guided our feature design are:

  • Secure: Cross-cluster replication should offer strong security controls for all flows and APIs.
  • Accuracy: There must be no difference between the intended contents of the follower index and the leader index.
  • Performance: Replication should not impact indexing rate of the leader cluster.
  • Eventual Consistency: The replication lag between the leader and the follower cluster should be under a few seconds.
  • Resource usage: Replication should use minimal resources.

The replication feature is implemented as an Elasticsearch plugin that exposes APIs to control replication, spawns background persistent tasks to asynchronously replicate indices and utilizes snapshot repository abstraction to facilitate bootstrap. Replication relies on cross-cluster connection setup from the follower cluster to the leader cluster for connectivity. The replication plugin offers seamless integration with the Open Distro for Elasticsearch Security plugin. Users can encrypt cross-cluster traffic via the node-to-node encryption feature and control access for replication activities via the security plugin.

Upon establishing secure connectivity between the clusters, users can start replicating indices from the leader cluster onto the follower cluster. The feature also allows for replication of indices using wildcard pattern matching and provides controls to stop replication. Once replication is started on an index, it initiates a background persistent task on the primary shard in the follower cluster that continuously polls corresponding shards from the leader index for updates. The Request for Comments (RFC) document provides more details on the feature, its underlying security and permission models, and the supported APIs.

Getting Started

The cross-cluster replication plugin currently supports Elasticsearch version 7.10.2. Following steps will help you install the replication plugin on a test cluster.

Step 1: Spin up two test clusters with Open Distro for Elasticsearch 1.13 and install the replication plugin

Clone the cross-cluster-replication repository and spin up the clusters from the packaged example via docker-compose.

If you are setting this up on your own Open Distro for Elasticsearch 1.13 cluster, please follow instructions in the handbook.

Step 2: Setup cross-cluster connectivity

Setup remote cluster connection from follower cluster to the leader cluster. The Open Distro for Elasticsearch security plugin ensures the cross-cluster traffic is encrypted if enabled.

Step 3: Populate leader cluster with sample data

Step 4: Configure permissions

The required permissions are documented in the security section of the handbook. The script from the sample helps in configuring the required minimal permissions for a user named “testuser”.

Step 5: Start replication

Now you can begin replication as follows.

Step 6: Make changes to data on leader index

Step 7: Validate replicated content on the follower

At this point, any changes to continues to be replicated to .

Step 8: Stop replication

Stopping replication opens up the replicated index on the follower cluster for writes. This can be leveraged to failover to the follower cluster when the need arises.

Handbook for more details

The handbook in the cross-cluster replication repository provides additonal examples with details.

Summary

In this post, we introduced cross-cluster replication with a discussion on its motivation, design tenets, user experience and instructions to try out the experimental release. This release includes the foundational features of cross-cluster replication. We are adding more exciting features like resiliency and performance improvements in the next phases. We invite the community to collaborate with us to build out this feature, and to make cross-cluster replication resilient, efficient, and performant.

Contributing to cross-cluster replication

You can start contributing to the replication project in many ways. Ask questions and share your knowledge with other community members on the Open Distro discussion forums or through our online community meetup. Share your feedback and comments on the overall goals, user-experience, architecture and features of the replication project. Moreover, submit pull requests using community guidelines to collaborate with us on new feature development and bug resolution.

About the Authors

Pallavi Priyadarshini is an Engineering Manager at Amazon Web Services, leading the design and development of high-performing and at-scale analytics technologies.

Gopala Krishna is a Senior Software Engineer working on Search Services at Amazon Web Services. His primary interests are distributed systems, networking and search. He is an active contributor to Open Distro for Elasticsearch.

Saikumar Karanam is a Software engineer working on Search Services at Amazon Web Services. His interests are distributed systems, networking and machine learning. He is an active contributor to Open Distro for Elasticsearch.

Sours: https://opendistro.github.io/for-elasticsearch/blog/releases/2021/02/announcing-ccr/


1301 1302 1303 1304 1305