
data Engineer
Data Engineer Data Engineer’s role, responsibilities, skills , and what is the background they come from? More and
Home » Introduction to Elastic stack – ELK 8
Meet One Of The Best Central Log Collection
Note: The Kibana version we will use is 8.5.3
The Elastic Stack (also called ELK stack) is a three free open source projects – Elasticsearch, Logstash and Kibana developed by Elastic NV company and its strong community.
Actually, The Elastic Stack can be considered as an ETL (Extract, Transform, Load) process. The extraction part refers to the process of collecting data from various sources such as log files, databases or APIs. The component of the elk that is used to extract the data is Logstash which can collect data from multiple sources simultaneously and support a wide range of input plugins. Once the data is extracted, the data is transformed using a logstash pipeline filter in order to make it usable. Finally, the transformed data is loaded into elasticsearch where it can be indexed and searched using Kibana.
To summarize, the ELK stack can be seen as a form of ETL because it performs data extraction, transformation, and loading in a similar way to traditional ETL tools. However, ELK is specifically designed for processing and analyzing log data, whereas traditional ETL tools are used for a wide range of data integration tasks.
Using the ELK Stack your company will be able to process data from multiple inputs using Logstash and Beats, store it in the Elasticsearch search engine, and explore it for analytical purposes with ease using Kibana.
Note:
The version we will use is 8.5.3
If you have not installed the ELK Stack yet, please visit our installation article pages:
Directly from elastic group site
In this section we will show you how to ship and process log files using Filebeat.
The flow of the process will be the following:
First of all, we will need to configure our Filebeat config file called filebeat.yml
We will set our input type to be “log” and the paths to our input and output destination.
In this case we will take inputs from a certain folder we created and output the data to our logstash path
filebeat.inputs:
- type: log
paths:
- /home/alon/input/*.log
output:
logstash:
hosts: ["10.128.0.22:5044"]
Next, we will have to configure our Logstash pipeline. We will create a file called filebeat_pipeline.conf and set 3 mainly parameters:
In our case, we would like to receive the data from the filebeat shipper on port 5044, break the log file into pieces according to our filter and store it in the elasticsearch database.
In this example for the filtering part we will use Grok which is a classic filter plugin:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{SYSLOGBASE} BEF25A72965: %{GREEDYDATA:syslog_message}" }
}
output {
elasticsearch {
hosts => ["https://10.128.0.22"]
cacert => "/home/alon/logstash-8.5.3/config/certs/http_ca.crt"
index => "logstash_filebeat_%{+YYYMMdd}"
user => "logstash_user"
password => "logstash_password"
}
}
The processed log file will be inserted to a dynamically daily index by adding logstash date variable %{+YYYMMdd} to the index “logstash_filebeat_%{+YYYMMdd}”
We will now create a static log file and call it filebeat_sample.log just for the example with the following log:
Feb 8 22:39:14 somehostname cleanlog[12345]: BEF25A72965: message-id=<20130101142543.5828399CCAF@somehostname.example.com>
The filter above will break the log to:
After we are done creating a log file and configuring both – our Filebeat config file and the pipeline, we can now run Filebeat and Logstash.
We can run logstash by running the following command from our logstash folder:
bin/logstash -f <path_to_our_pipline>
After running our logstash server will listen to port 5044 and wait for the filebeat to send data
In a different screen or session, we will run our filebeat from our filebeat folder:
./filebeat -e -c filebeat.yml -d “publish”
Once we start logstash, the log file we created filebeat_sample.log will be readen and sent through port 5044.
After we ran both can now see our index and the processed data in our Elasticsearch.
In this section we will show you how to ship and process log files using Auditbeat.
Quick Overview
Auditbeat is a shipper from Elastic Beats family, it audits the activities of the users and process of your system. A classic use case for Auditbeat is to collect events from Linux Audit Framework for security matters. Auditbeat breaks the known system log files into pieces by its own without the need of Logstash filtering.
The flow of the process will be the following:
In this example, since we do use Logstash as a pipeline, we will configure Auditbeat to send its output to our Logstash, in order to do that we will edit the auditbeat.yml file stored in the Auditbeat folder and add the following code to the “Logstash Output” section:
output.logstash:
hosts: [“10.128.0.22:5044”]
It should look like this:
The input paths are configured by default (using linux) so we don’t need to set anything in case we want to track our Linux Audit Framework events.
Now that we done configuring our Auditbeat configuration, we will create new pipeline called auditbeat_pipeline.conf with the following code:
input {
beats {
port => "5044"
}
}
output {
elasticsearch {
hosts => ["https://10.128.0.22"]
cacert => "/home/alon/logstash-8.5.3/config/certs/http_ca.crt"
index => "logstash_auditbeat_%{+YYYMMdd}"
user => "logstash_user"
password => "logstash_password"
}
}
As you can see unlike filebeat, we did not add filter part because Auditbeat breaks the common log files by its own.
We can run logstash by running the following command from our logstash folder:
bin/logstash -f <path_to_our_pipline>
After running our logstash server will listen to port 5044 and wait for the Auditbeat to send data
In a different screen or session, we will run our Auditbeat from our Auditbeat folder:
./auditbeat
Once we start logstash, the log files from linux audit framework will be readen and sent through port 5044.
After we ran both can now see our index and the processed data in our Elasticsearch.
Now that we have stashed data in the Elasticsearch sent from Logstash we can make a quick visualization and start analyze it.
First of all we will create something called Data View. These will help us retrieve data from Elasticsearch in a way Kibana knows to read.
To create a new Data View we will go to our Kibana URL (usually under port 5601)
Stack Manager → Data View → Create data view
We will create data view for both, our Auditbeat and Filebeat indexes separately:
Now we can start analyze the data using the Discover tab, on the left side we will be able to choose the data view we want to discover:
Data Engineer Data Engineer’s role, responsibilities, skills , and what is the background they come from? More and
Data Warehouse is a data platform where organisations store all their information from external or internal sources .
MySQL 8 Galera Cluster High Availability In MySQL 8 Galera Cluster Installation we described how to set up
MySQL Galera Cluster Introduction MySQL Galera cluster is the common solution for MySQL high availability and bring