First Steps with Elasticsearch

LoD Elasticsearch
What is Elasticsearch actually you might wonder? Well, it’s a search engine! Elasticsearch is Open-Source and based on Lucene.
It helps you to search in Big Data with an amazing speed. You can run Elasticsearch on a single node or use a cluster with more than 100 servers so queries can run simultaneously on several nodes and increase the speed even more! It is used by many big names like Github, Netflix, Microsoft…

But before we start, we need to install Elasticsearch.

How to Install Elasticsearch

The installation of Elasticsearch is very easy. First you need the latest version of Java, which you can find at www.java.com. After you installed Java, you can download the latest version of Elasticsearch from https://www.elastic.co/downloads/elasticsearch.

You can start Elasticsearch with the following command in your console:

[bash light=”true”]PATH_TO_ELASTICSEARCH/bin/elasticsearch[/bash]

You can test it with the following command:

[bash light=”true”]curl’https://localhost:9200/?pretty'[/bash]

You should see something like this:

[jscript light=”true”]{
"status" : 200,
"name" : "Alexander Summers",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}[/jscript]

This looks great but there is one thing you should change. Open config/elasticsearch.yml and set your own name for cluster.name. Otherwise your node might join another cluster in your network with the same name.

Congratulation you have installed Elasticsearch!

Let’s play with Elasticsearch

How to communicate to Elasticsearch?

When you develop with Java, you can use the Java API from elastic. You can read more about it at:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html

Other languages can communicate via the RESTful API over HTTP.

In this post we use the curl command to access Elasticsearch:

[bash]curl -X<VERB> ‘<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>’ -d ‚<BODY>'[/bash]

Where BODY is a JSON-encoded requestbody.

Create our first Document

Let us first look at the terminology of Elasticsearch, there are Indices, Types, Documents and Fields. It is clearer when we compare them to the relational database terminology:

Elasticsearch-database-table-overviewFor our example we create a new student document.

To create a new Document we use the following Curl command:

[bash]curl -X POST https://localhost:9200/students/student -d ‘{
"first_name": "Ivan",
"last_name":"Bravowitch" }'[/bash]

You don’t need to create the index before you create a document. In this case Elasticsearch recognizes the new index and creates it by itself.

It responds with something like this:

[jscript light=”true”]{
"_index":"students",
"_type":"student",
"_id":"AU9upGq7tksYXYePwhab",
"_version":1,
"created":true }[/jscript]

As you can see Elasticsearch sets the index to students and the type to student . Since we used Post and don’t provide an id, Elasticsearch generates one automatically. If you want to set the id by yourself, you can use the following command to set the id to 100:

[bash]curl -X POST https://localhost:9200/students/student/100 -d ….[/bash]

Automatic Mapping

When you don’t provide any information about the datatypes of your fields, Elasticsearch choose them automatically.
We could extend our simple student model with the following:

[jscript light=”true”]
{
"first_name": "Ivan",
"last_name": "Bravowitch",
"birth_date": &nbsp;"1985-03-03",
"semester": &nbsp;3,
"grade_point_average": 1,9
}[/jscript]

Elasticsearch interprets the values and sets the following types for the fields:

first_name, last_name: String
birth_date: Date
semester: long
grade_point_average: double

Elasticsearch sets always the biggest datatype, that’s why semester is long and grade_point_average is double.

In some cases it’s even possible that Elasticsearch interprets the field wrong and sets the wrong type.
Assume you have something like this in your document:

[jscript light=”true”]{ "array":[123, "string"] }[/jscript]

With automatic mapping this array causes an exception, because Elasticsearch sees 123 and sets the type as number and then it throws the exception when it comes to „string“.

You can disable the automatic mapping of Elasticsearch in the config file. You just have to set index.mapper.dynamic to false.

Manual mapping

We have seen that it might be useful to set the type by yourself in some cases. With the type attribute you can specify the data type for a field in the JSON document, so Elasticsearch uses the right data types:

[jscript light=”true”]
{
"student": {
"properties":{
{
"first_name": {"type" : "string" } ,
"last_name": {"type" : "string" },
"birth_date": {"type" : "date" },
"semester": {"type" : "integer"} ,
"grade_point_average": {"type" : "float"}
}

}
}

}[/jscript]

Access our First Document

We have created our first document, but how can we access it ? With the GET verb and the id we can use:

[jscript light=”true”]curl -X GET https://localhost:9200/students/student/AU9upGq7tksYXYePwhab\?pretty[/jscript]

In the response we can see if there is a result:

[jscript light=”true”]
{
"_index" : "students",
"_type" : "student",
"_id" : "AU9upGq7tksYXYePwhab",
"_version" : 1,
"found" : true,
"_source":{
"first_name": "Ivan",
"last_name": "Bravowitch"
}
[/jscript]

The „pretty“ in our curl command is only for a pretty format of the JSON result.

Update our first Document

We have created our own Document and now we want to update the information. We can use the PUT verb to overwrite the current information:

[bash]curl -X PUT ‘localhost:9200/students/student/12’ -d ‘{
"first_name": "Witali",
"last_name": "Bravowitch"
}'[/bash]

Delete our first Document

So far we know how we can create, read and update a document. The last thing we need, is to delete a document. Therefore you can use the DELETE verb:

[bash]curl -XDELETE ‘localhost:9200/students/student/12′[/bash]

So far we have seen how you use the basic operations in Elasticsearch for working with your data.

Shards in Elasticsearch

Primary Shards
In many cases you start with a rather small amount of data. As your application grows your data grows as well. Therefore it’s necessary to run Elasticsearch in a cluster.

You can specify the number of primary shards for your data, so Elasticsearch distributes your data to the primary shards and can process your data parallel.

Replica Shards
Replica Shards are copies of your Primary Shards. Each Primary Shard has at least one replica shard, which contains the same data. So when one Primary Shard is offline, the according Replica Shard becomes automatically this Primary Shard.

Conclusion

As you have seen the installation of Elasticsearch is done in a couple of minutes. We have covered the basic operations to work with Elasticsearch.

When you want to learn more about the powerful DSL queries or other interesting topics you can visit the official documentation of Elasticsearch at: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Sources:

Leave a Reply

Your email address will not be published.