How I learned to love Elasticsearch

A love story, by @tizzo y @nstielau

Nick Stielau

@nstielau

Director of Engineering at Pantheon

Smooth operator

Howard Tyson

@tizzo

Zivtechnician

We're going to make you love Elasticsearch

How we'll do it

Slides, examples, demos

https://github.com/tizzo/drupal-elasticsearch-talk

Elasticsearch is..

a distributed, open source search and analytics engine, designed for horizontal scalability, reliability, and easy management.

Distributed

"Elasticsearch allows you to start small and scale horizontally as you grow. Simply add more nodes, and let the cluster automatically take advantage of the extra hardware."

Open source

The Apache License (/əˈpætʃi/) is a free software license written by the Apache Software Foundation (ASF). The Apache License requires preservation of the copyright notice and disclaimer.

Document-oriented

[
  {
    "commit": "b03932be472a7198980cd73e5c532dc34e0c9aea",
    "committer_name": "bob",
    "author_name": "bob",
    "date": "Tue May 12 15:40:44 2015 -0700",
    "message": "Issue-2457427-by-Crell-Bad-error-handling-of-invalid-Entity-definition",
    "@timestamp": "2015-05-12T15:40:44-07:00"
  }
]

Near Realtime*

*The difference between realtime and near-realtime is that near-realtime is not realtime.

Built on Apache Lucene™

Lucene is the search smarts

Clusterable

Clusterable: Data security

"make sure you mention that elasticsearch is a very good primary data store for critical mutable data." - @joemiller

"To be precise, Elasticsearch’s transaction log does not put your data safety first. It puts it anywhere from zero to five seconds later." -Aphyr

Dude, I've been searching for nodes since like, 2005

ES vs. MySQL Full-text Search

This works out-of-the-box, but it can

ES vs. Solr

"I guess comparing Elasticsearch to Solr is apples to apples, but really it's like comaping a new apple to a rotten apple." -Elasticsearch User

ES vs.

Bonus! sane capitalization

sane...

<?php captial_E_duh( "Elasticsearch" );

uh...

<?php capital_P_dangit( "WordPress" ); ?>

Ain't no Party like a cURL Party

Create an index (curl)

curl -XPUT 'http://localhost:9200/logstash-commits/' -d '
  index:
    number_of_shards : 1
    number_of_replicas : 2
'
curl 127.0.0.1

Inserting docs

Request:
curl -XPOST "http://localhost:9200/logstash-commits/logs" -d' {
  "commit": "bfdcc65a85a2ae3af5025f8d0a05779fbb59c621",
  "committer_name": "xjm",
  "author_name": "xjm",
  "date": "Tue May 12 15:14:13 2015 -0700",
  "message": "Issue-2459777 fixing stuff",
  "@timestamp": "2015-05-12T15:14:13-07:00"
}'

Response:
{
    "ok": true,
    "_index": "logstash-commits",
    "_type": "logs",
    "_id": 1,
    "_version": 1
}


Updating docs

Request:
curl -XPUT "http://localhost:9200/logstash-commits/logs/1" -d' {
  "commit": "bfdcc65a85a2ae3af5025f8d0a05779fbb59c621",
  "committer_name": "xjm",
  "author_name": "xjm",
  "date": "Tue May 12 15:14:13 2015 -0700",
  "message": "Issue-2459777 fixing stuff",
  "@timestamp": "2015-05-12T15:14:13-07:00"
}'

Response:
{
    "ok": true,
    "_index": "logstash-commits",
    "_type": "logs",
    "_id": 1,
    "_version": 2
}

Retreiving a doc

Request:
curl -XGET "http://localhost:9200/logstash-commits/logs/1"
Response:
{
  "_index": "logstash-commits",
  "_type": "logs",
  "_id": 1,
  "_version": 2,
  "exists": true,
  "_source": {
    "commit": "bfdcc65a85a2ae3af5025f8d0a05779fbb59c621",
    "committer_name": "xjm",
    "author_name": "xjm",
    "date": "Tue May 12 15:14:13 2015 -0700",
    "message": "Issue-2459777 fixing stuff",
    "@timestamp": "2015-05-12T15:14:13-07:00"
  }
}

Searching

Find commits by xjm matching 'Views'

Query: Views

Filter: xjm

curl -XPOST "http://localhost:9200/_search" -d'{
    "query": {
        "query_string": {
            "query": "message:Views"
        }
    },
    "filter": {
      "term": {"author_name": "xjm"}
    }
}'

BRING IT!

Clarks' Laws vis-á-vis ES

  1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
  2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
  3. Any sufficiently advanced technology is
    indistinguishable from magic.

Histogram

Example: Score boosting

Scenario:Weight search results by popularity.

Solution:Use `function_score` with a decay function

"DECAY_FUNCTION": {
    "FIELD_NAME": {
          "origin": "0",
          "scale": "100"
    }
}

Percolate

Store a queries and then use documents to find matching queries

Turn your search frown upside down

Percolate

Percolate: Store a Query

curl -XPUT 'localhost:9200/logstash-commits/.percolator/security' -d '{
  "query" : {
    "match" : {
      "message" : "(message:XSS OR message:security)"
    }
  }
}'
curl -XPUT 'localhost:9200/logstash-commits/.percolator/admin_ui' -d '{
  "query" : {
    "match" : {
      "message" : "(message:admin AND message:UI)"
    }
  }
}'

Percolate: Query for matches

Define query-matching for admin UI and security commits:

curl -XGET 'localhost:9200/logstash-commits/logs/_percolate' -d '{
  "doc" : {
    "message" : "Issue #27230: Fix XHTML slash in XSS filter"
  }
}'

Percolate: Get results

{
  "took" : 19,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "total" : 1,
  "matches" : [
    {
      "_index" : "logstash-commits",
      "_id" : "security"
    }
  ]
}

Integrating Drupal with Elasticsearch

Drupal Modules

Search API Elasticsearch

The Good

Really complete integration with all of Search API's features

The Bad

It's really not mature, too many bugs to really use on a project

Elasticsearch Connector

The Good

The Bad

Log aggregation

Logs HTTP

Log directly to Elasticsearch

GELF

Log to Logstash, let it log to elasticsearch

Custom Code

Custom Code

Devops <3 ES

ELK Stack

ELK

The killer app for ElasticSearch

Getting my data into ELK

Munging with Logstash

the richer the better!

This message
Issue #27230: Fix XHTML slash in XSS filter
passes through this logstash configuration
grok { match => ["message", "Issue #%{NUMBER:issue_id}: %{GREEDYDATA:message}" ] }
and gets stored in Elasticsearch kinda like this
{
  "issue_id": "27230",
  "issue_message": "Fix XHTML slash in XSS filter",
  "message": "Issue #27230: Fix XHTML slash in XSS filter"
}

What could you do with data?

ELK Demo

ES Errata

Securing ES

Hosted ElasticSearch

Criteria

Providers