JavaCodeByJava: Elastic command

GET /_cluster/health

GET /_cat/nodes?v

GET /_nodes

GET /_cat/indices?v

GET /_cat/shards?v

delete /products

PUT /products

{

"settings":{

"number_of_shards": 2, //this index on 2 nodes

"number_of_replicas": 2 //each node have 2 raplica

}

POST /products/_doc

{

"name":"CoffeeMaker",

"price":60,

"in_stock":10

}

result:

{

"_index" : "products",

"_id" : "C8ZKfYIBMXr0mo8fsB3j",

"_version" : 1,

"result" : "created",

"_shards" : {

"total" : 3, //1 node and 2 its replica

"successful" : 1, //it store only in one node becuase i have cofigure only single node

"failed" : 0

"_seq_no" : 0,

"_primary_term" : 1

}

PUT /products/_doc/100

{

"name":"Tea Maker",

"price":50,

"in_stock":20

}

GET /products/_doc/100

//Update

POST /products/_update/100

{

"doc": {

"in_stock":5

}

POST /products/_update/100

{

"doc":{

"tags":["electronics","wood"]

}

POST /products/_update/100

{

"script": {

"source": "ctx._source.in_stock--"

}

POST /products/_update/100

{

"script": {

"source": "ctx._source.in_stock = 10"

}

POST /products/_update/100

{

"script": {

"source": "ctx._source.in_stock -= params.quantity",

"params": {

"quantity":4

}

Replace:

PUT /products/_doc/100

{

"name":"Tea Maker",

"price":50,

"in_stock":20

}

delete:

DELETE /products/_doc/101

shard_num = hash(_routing) % no. of primary shards

=> we can't add more shards after index get created, because above formula may return differnt shard number for same document.

-------------

no. of parimary shard = 2

get /products/_doc/100

shrad_num = 2

-------------------

no. of parimary shard = 5

get /products/_doc/100

shrad_num = 4(may be)

so the shard number is different in both cases.

As a quick reminder, modifying the number of shards requires creating a new index and

reindexing documents into it. That's made fairly easy with the Shrink and Split APIs,

Shard allocation is the process of allocating shards to nodes. This can happen during initial recovery, replica allocation, rebalancing, or when nodes are added or removed.

One of the main roles of the master is to decide which shards to allocate to which nodes, and when to move shards between nodes in order to rebalance the cluster.

PUT /products3

{

"settings":{

"number_of_shards": 50,

"number_of_replicas": 3

}

//every shard has 3 replica, so if we write it should be store at 4 shard=> 1 primery +3 replica

//update with last sequence number and primary term

below help in optimistic parallel update

POST /products/_update/10?if_primary_term=2&if_seq_no=46

{

"doc": {

"in_stock":8

}

if sequence number and primary term not matxh:

{

"error" : {

"root_cause" : [

{

"type" : "version_conflict_engine_exception",

"reason" : "[10]: version conflict, required seqNo [46], primary term [2]. current document has seqNo [47] and primary term [2]",

"index_uuid" : "gocQDsc7Qy2gnypI99tUDA",

"shard" : "0",

"index" : "products"

}

"type" : "version_conflict_engine_exception",

"reason" : "[10]: version conflict, required seqNo [46], primary term [2]. current document has seqNo [47] and primary term [2]",

"index_uuid" : "gocQDsc7Qy2gnypI99tUDA",

"shard" : "0",

"index" : "products"

"status" : 409

}

-----------------

update by query:

POST /products2/_update_by_query

{

"script": {

"source": "ctx._source.in_stock--"

"query": {

"match_all": {}

}

//skip failure due to sequence and primery term

POST /products2/_update_by_query

{

"conflicts": "proceed",

"script": {

"source": "ctx._source.in_stock--"

"query": {

"match_all": {}

}

--------------------------

Fetch detail of shard : where our document get stored and on which node

config: 50 shards and 3 replica of index products3

GET /products3/_search

{

"explain": true,

"query": {

"match_all": {}

}

result:

{

"took" : 20,

"timed_out" : false,

"_shards" : {

"total" : 50,

"successful" : 50,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : {

"value" : 2,

"relation" : "eq"

"max_score" : 1.0,

"hits" : [

{

"_shard" : "[products3][31]",

"_node" : "-rHC-04PRfuFZ8tEfvVc7w",

"_index" : "products3",

"_id" : "11",

"_score" : 1.0,

"_source" : {

"doc" : {

"in_stock" : 0

}

"_explanation" : {

"value" : 1.0,

"description" : "*:*",

"details" : [ ]

}

{

"_shard" : "[products3][37]",

"_node" : "-rHC-04PRfuFZ8tEfvVc7w",

"_index" : "products3",

"_id" : "12",

"_score" : 1.0,

"_source" : {

"doc" : {

"in_stock" : 0

}

"_explanation" : {

"value" : 1.0,

"description" : "*:*",

"details" : [ ]

}

]

}

Shards Nodes and Replica

When you download elasticsearch and start it up, you create an elasticsearch node which tries to join an existing cluster if available or creates a new one. Let's say you created your own new cluster with a single node, the one that you just started up. We have no data, therefore we need to create an index.

When you create an index (an index is automatically created when you index the first document as well) you can define how many shards it will be composed of. If you don't specify a number it will have the default number of shards: 5 primaries. What does it mean?

It means that elasticsearch will create 5 primary shards that will contain your data:

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

Every time you index a document, elasticsearch will decide which primary shard is supposed to hold that document and will index it there. Primary shards are not a copy of the data, they are the data! Having multiple shards does help taking advantage of parallel processing on a single machine, but the whole point is that if we start another elasticsearch instance on the same cluster, the shards will be distributed in an even way over the cluster.

Node 1 will then hold for example only three shards:

 ____    ____    ____ 
| 1  |  | 2  |  | 3  |
|____|  |____|  |____|

Since the remaining two shards have been moved to the newly started node:

 ____    ____
| 4  |  | 5  |
|____|  |____|

Why does this happen? Because elasticsearch is a distributed search engine and this way you can make use of multiple nodes/machines to manage big amounts of data.

Every elasticsearch index is composed of at least one primary shard since that's where the data is stored. Every shard comes at a cost, though, therefore if you have a single node and no foreseeable growth, just stick with a single primary shard.

Another type of shard is a replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data. Replicas are used to increase search performance and for fail-over. A replica shard is never going to be allocated on the same node where the related primary is (it would pretty much be like putting a backup on the same disk as the original data).

Back to our example, with 1 replica we'll have the whole index on each node, since 2 replica shards will be allocated on the first node and they will contain exactly the same data as the primary shards on the second node:

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4R |  | 5R |
|____|  |____|  |____|  |____|  |____|

Same for the second node, which will contain a copy of the primary shards on the first node:

 ____    ____    ____    ____    ____
| 1R |  | 2R |  | 3R |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

With a setup like this, if a node goes down, you still have the whole index. The replica shards will automatically become primaries and the cluster will work properly despite the node failure, as follows:

 ____    ____    ____    ____    ____
| 1  |  | 2  |  | 3  |  | 4  |  | 5  |
|____|  |____|  |____|  |____|  |____|

Since you have "number_of_replicas":1, the replicas cannot be assigned anymore as they are never allocated on the same node where their primary is. That's why you'll have 5 unassigned shards, the replicas, and the cluster status will be YELLOW instead of GREEN. No data loss, but it could be better as some shards cannot be assigned.

As soon as the node that had left is backed up, it'll join the cluster again and the replicas will be assigned again. The existing shard on the second node can be loaded but they need to be synchronized with the other shards, as write operations most likely happened while the node was down. At the end of this operation, the cluster status will become GREEN.

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html

###the primary and replica shards never store in same node

Example of shard and replica allocation:

if we have a cluster of 2 nodes 101 and 102

and we are creating index like below:

PUT /index_name { "settings":{ "number_of_shards": 5, "number_of_replicas": 5 } }

so here allocation like below

shard1 -> primary node => 101

shard1 -> replica1 -> 102

shard1 -> replica2 -> Unassigned

shard1 -> replica3 -> Unassigned

shard1 -> replica4 -> Unassigned

shard1 -> replica5 -> Unassigned

shard2 -> primary node => 102

shard2 -> replica1 -> 101

shard2 -> replica2 -> Unassigned

shard2 -> replica3 -> Unassigned

shard2 -> replica4 -> Unassigned

shard2 -> replica5 -> Unassigned

shard3 -> primary node => 101

shard3 -> replica1 -> 102

shard3 -> replica2 -> Unassigned

shard3 -> replica3 -> Unassigned

shard3 -> replica4 -> Unassigned

shard3 -> replica5 -> Unassigned

shard4 -> primary node => 101

shard4 -> replica1 -> 102

shard4 -> replica2 -> Unassigned

shard4 -> replica3 -> Unassigned

shard4 -> replica4 -> Unassigned

shard4 -> replica5 -> Unassigned

shard5 -> primary node => 102

shard5 -> replica1 -> 101

shard5 -> replica2 -> Unassigned

shard5 -> replica3 -> Unassigned

shard5 -> replica4 -> Unassigned

shard5 -> replica5 -> Unassigned

The assignment of node done by elastic itself

1. if have only one first node in cluster then replica will not assigned to any node

2. if we add another second node in cluster then replica assign to new node

3. if we add another third node in cluster then load get balanced from first and second node to third node.

4. if we remove second node from cluster than primary and replica will balance in first and third node.

How Elastic read data:

JavaCodeByJava

Saturday, 13 August 2022

Elastic command

No comments:

Post a Comment

links for Data Structure

Search This Blog