Logstash is a high-performance indexing and search engine for logs.
Hosts: |
|
---|---|
Puppet: | |
Configuration: | |
Projects: | |
Bugs: |
Logs from Jenkins test runs are sent to logstash where they are indexed and stored. Logstash facilitates reviewing logs from mulitple sources in a single test run, searching for errors or particular events within a test run, as well as searching for log event trends across test runs.
There are four major layers in our Logstash setup.
Each layer scales horizontally. As the number of logs grows we can add more log pushers, more Logstash indexers, and more ElasticSearch nodes. Currently we have multiple Logstash worker nodes that pair a log pusher with a Logstash indexer. We did this as each Logstash process can only dedicate a single thread to filtering log events which turns into a bottleneck very quickly. This looks something like:
_ logstash-worker1 _
/ \
jenkins -- logstash-worker2 -- elasticsearch -- kibana
\_ _/
logstash-worker3
This is a simple Python script that is given a list of log files to push to Logstash when Jenkins builds complete.
Log pushing looks like this:
In the near future this script will be modified to act as a Gearman worker so that we can add an arbitrary number of them without needing to partition the log files that each worker handles by hand. Instead each worker will be able to fetch and push any log file and will do so as directed by Gearman.
If you are interested in technical details The source of this script can be found at modules/openstack_project/files/logstash/log-pusher.py
Logstash does the heavy lifting of squashing all of our log lines into events with a common format. It reads the JSON log events from the log pusher connected to it, deletes events we don’t want, parses log lines to set the timestamp, message, and other fields for the event, then ships these processed events off to ElasticSearch where they are stored and made queryable.
At a high level Logstash takes:
{
"fields" {
"build_name": "gate-foo",
"build_numer": "10",
"event_message": "2013-05-31T17:31:39.113 DEBUG Something happened",
},
}
And turns that into:
{
"fields" {
"build_name": "gate-foo",
"build_numer": "10",
"loglevel": "DEBUG"
},
"@message": "Something happened",
"@timestamp": "2013-05-31T17:31:39.113Z",
}
It flattens each log line into something that looks very much like all of the other events regardless of the source log line format. This makes querying your logs for lines from a specific build that failed between two timestamps with specific message content very easy. You don’t need to write complicated greps instead you query against a schema.
The config file that tells Logstash how to do this flattening can be found at modules/openstack_project/templates/logstash/indexer.conf.erb
ElasticSearch is basically a REST API layer for Lucene. It provides the storage and search engine for Logstash. It scales horizontally and loves it when you give it more memory. Currently we run a single node cluster on a large VM to give ElasticSearch both memory and disk space. Per index (Logstash creates one index per day) we have one replica (on the same node, this does not provide HA, it speeds up searches) and five shards (each shard is basically its own index, having multiple shards increases indexing throughput).
As this setup grows and handles more logs we may need to add more ElasticSearch nodes and run a proper cluster. Haven’t reached that point yet, but will probably be necessary as disk and memory footprints increase.
Kibana is a ruby app sitting behind Apache that provides a nice web UI for querying Logstash events stored in ElasticSearch. Our install can be reached at http://logstash.openstack.org. See Querying Logstash for more info on using Kibana to perform queries.
Hop on over to http://logstash.openstack.org and by default you get the last 15 minutes of everything Logstash knows about in chunks of 100. We run a lot of tests but it is possible no logs have come in over the last 15 minutes, change the dropdown in the top left from Last 15m to Last 60m to get a better window on the logs. At this point you should see a list of logs, if you click on a log event it will expand and show you all of the fields associated with that event and their values (not Chromium and Kibana seem to have trouble with this at times and some fields end up without values, use Firefox if this happens). You can search based on all of these fields and if you click the magnifying glass next to a field in the expanded event view it will add that field and value to your search. This is a good way of refining searches without a lot of typing.
The above is good info for poking around in the Logstash logs, but one of your changes has a failing test and you want to know why. We can jumpstart the refining process with a simple query.
@fields.build_change:"$FAILING_CHANGE" AND @fields.build_patchset:"$FAILING_PATCHSET" AND @fields.build_name:"$FAILING_BUILD_NAME" AND @fields.build_number:"$FAILING_BUILD_NUMBER"
This will show you all logs available from the patchset and build pair that failed. Chances are that this is still a significant number of logs and you will want to do more filtering. You can add more filters to the queriy using AND and OR and parentheses can be used to group sections of the query. Potential additions to the above query might be
General query tips: