Friday, 26 January 2024

Elastic Search basic

 1. Node contains data as document

2. one cluster contains multiple nodes.

3. the document contains data as json format.

3. multiple documents group together to make indices.


  • Replication is about maintaining real-time copies (primary and replica shards) of data within a cluster to ensure high availability, fault tolerance, and improved read performance.

  • Snapshot is about creating backups of your data and settings at a specific point in time. Snapshots are useful for disaster recovery, migration, and long-term data retention.


Invertes index- > for text search
keyword: sorting or aggregation

 "name" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                }

so this name field will store in inverted index as well in doc_value also up to 256 char

  "type" : "text", -> store in innverted index


using below store in doc value as well that will use for sort and aggregation.
"keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }

Logstash

 Logstash is a Data pipeline it consist of three stages inputs, filters and outputs.


input-> can get data from Kafka,relation database , file or any other input source or can get data from multiple input source also.

filters -> filter what kind of data we need to process. 

outlet -> where we need to write data after filter like elastic

Take an example we want to read logs of access.log from file using logstash

like log stash receive one line of log -> then process line using grok pattern then push into elastic .




Having too many concurrent indexing connections may result in a high bulk queue, bad responsiveness and timeouts. And for that reason in most cases, the common setup is to have Logstash placed between Beat instances and Elasticsearch to control the indexing.

And for larger scale system, the common setup is having a buffering message queue (Apache Kafka, Rabbit MQ or Redis) between Beats and Logstash for resilency to avoid congestion on Logstash during event spikes.









t


links for Data Structure

  1) 𝐁𝐞𝐜𝐨𝐦𝐞 𝐌𝐚𝐬𝐭𝐞𝐫 𝐢𝐧 𝐋𝐢𝐧𝐤𝐞𝐝 𝐋𝐢𝐬𝐭:  https://lnkd.in/gXQux4zj 2) 𝐀𝐥𝐥 𝐭𝐲𝐩𝐞𝐬 𝐨𝐟 𝐓𝐫𝐞𝐞 𝐓𝐫𝐚𝐯𝐞𝐫𝐬𝐚𝐥𝐬...