Skip to content
This repository has been archived by the owner on Dec 15, 2021. It is now read-only.

Logstash (ELK) not streaming data real-time from dynamodb, duplicates upon restart & fetching in some random order. #24

Open
ameyaloni opened this issue Nov 15, 2016 · 3 comments

Comments

@ameyaloni
Copy link

ameyaloni commented Nov 15, 2016

We're facing multiple issues while using ELK stack. We suspect they're Logstash Configuration issues. Issues are as follows:

  1. Logstash connected to Dynamodb streams isn't showing real-time changes. We even have an explicit perform_stream=>true in our Logstash configuration. Note: We do get the latest data if we restart the logstash (which is running in a docker container). Could this be cross-region issue? Dynamodb (in us-east-1) while Logstash & Elasticsearch (in us-west-1)?

  2. Upon restarting Logstash the entire Dynamodb table data is presumably duplicated in ElasticSearch. Dynamodb has around 70K+ Item Count while ElasticSearch has more than double Searchable Documents. Could it be because we have perform_stream=>true config?

  3. Intermittently the latest data can be seen but it is sandwiched between older records; some kind of random data fetch order. Could it be due to multiple workers trying to log at the same time?

  4. We need the json message contents from Dynamodb as is. However, we noticed that when we run Logstash the output shows the data in "Stream Records". When we use log_format=>"json_binary_as_text", we can see the json message as we require. Is this sufficient?

Following is our Logstash Configuration:

input { 
    dynamodb {
      endpoint => "dynamodb.us-east-1.amazonaws.com"
      streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com"
      view_type => "new_image"
      perform_scan => true
      perform_stream => true
      publish_metrics => true
      table_name => "here-we-have-dynamodb-table-name"
      log_format => "json_binary_as_text"
  }
}
output {
    elasticsearch {
      hosts => "here-we-have-our-elasticsearch-endpoint-which-is-in-us-west-1"
    } 
}

NOTE: There are no errors in the logs (docker logs --follow container-name).
Any help on these issues is really appreciated.

@huytv593
Copy link

@ameyaloni Have you tried setting document_id for elastic search to avoid duplicate?

@tshrikant
Copy link

I am unable to retrieve my table column (Primary Key) or event fields in dynamo input plugin to assign id to records. Can anyone help me with Syntax? I've tried below options.

document_id => "{[eventID]}"
document_id => [eventID]
document_id => eventID

@huytv593
Copy link

huytv593 commented Nov 9, 2017

@tshrikant You can use this style:
document_id => "%{[keys][Your_Primary_PartitionKey]}"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants