Logstash+Elasticsearch: Best way to handle JSON arrays
Before I start with the solution, let's review what's the problem we're trying to solve here. If we have these two JSON documents pushed to ES:-
{
"test": {
"steps": [{
"response_time": "100"
}, {
"response_time": "101"
}]
}
}
{
"test": {
"steps": [{
"response_time": "101"
}, {
"response_time": "100"
}]
}
}
And you write a Kibana query like:
test.steps.response_time:101
# Full ES query in the background
{
"query": {
"query_string": {
"query": "test.steps.response_time:101"
}
}
}
It'll match both documents. Why? Because Elasticsearch flattens the arrays internally. More details:- https://www.elastic.co/guide/en/elasticsearch/guide/current/complex-core-fields.html#object-arrays and https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
Not just that, if I were to write a query to search all documents with response_time=101 in second element of array, logically, test.steps[1].response_time:101, it's not possible.
To fix this, we can simple create a filter in Logstash which converts these arrays to hashes recursively, ie, all arrays are converted to hashes, even the nested ones. Hence, we want to write a filter which converts arrays like this.
Before:-
{
"foo": "bar",
"test": {
"steps": [
{
"response_time": "100"
},
{
"response_time": "101",
"more_nested": [
{
"hello": "world"
},
{
"hello2": "world2"
}
]
}
]
}
}
After:-
{
"foo": "bar",
"test": {
"steps": {
"0": {
"response_time": "100"
},
"1": {
"response_time": "101",
"more_nested": {
"0": {
"hello": "world"
},
"1": {
"hello2": "world2"
}
}
}
}
}
}
The filter that can do this is shared below:-
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
Now, to search the document which contains response_time=101 in second element of array, it's simple.
test.steps.1.response_time:101
Happy ELKing!