Splunk is a searching and reporting tool which ingests application logs and pulls out the key information (with careful prodding). It allows businesses to get a near real time view of how their application is performing. Things like system utilisation and transaction throughput are very easy to represent in a clean dashboard interface.
I've recently finished a proof of concept for work involving a system utilisation dashboard for a client. It shows the average transaction throughput overlayed on CPU and RAM load, as well as the average, maximum and minimum response times of the login server.
I spent quite a bit of time figuring out exactly how to talk to Splunk. If you're not fluent in Splunkenese, things can get a little tough (as i found out). But i have learned a few new tricks, which is what I'm going to list here.
Settings
Splunk allows you to package your work in apps. Various configuration details are stored in these apps, and each can be overridden. The app contains a default directory, and obviously all default settings should be stored in here. There is also a local directory, which takes priority over the default directory. Why do they do this? Well, splunk allows you to "publish" apps, and any changes that are pushed out are made in the default directory. Users of the app should store their custom modifications in the local directory to ensure that they are not overridden with a new release.
The plot thickens when splunk users are added into the mix. Each user also has a local directory which takes top priority. Im sure you can guess why. The tricky part, however, is tracking down these settings and bringing them all together. Changes made via the web interface will result in settings being written to the user's local directory. Remember that, and you'll be fine.
props.conf
This config file allows you to specify properties for each index you have. I have found that it is typically a good idea to have one source type per index, especially since they may be formatted differently. Props conf allows you to specify the location and format of the logs timestamp (very important to get this right at ingestion time, you cannot modify this afterwards!). It also allows you to specify specific field extractions via regex. Here is an example:
[index_name]
# timestamp extraction
# time_prefix takes a regex string whihc specifies the start of the the timestamp
# this will match "timestamp>"
TIME_PREFIX = (?i)timestamp>
# time_format takes an strptime formatted strin
# this will match 2011-08-30 10:10:10.111
TIME_FORMAT = %Y-%m-%d $H:$M:$S.%f
# inline field extractions take the following form
# EXTRACT-fieldName = regex to id start of field <fieldName>regex to id end of field
# this will match an xml string in the form:
# <Destination>
# ...
# <Destination>
# ...
# <Hostname>hostname</Hostname>
# </Destination>
EXTRACT-destinationHostname = (?i):Destination>.*<Hostname>(?P<destinationHostname>[^<]+)
Navigation Bar
The navigation bar can also be customised to suit the application. There is an XML file which specifies the layout, and it is found in:
apps/appName/default/data/ui/nav/default.xml
Here is a simple example, which places a link to the search view and another custom view. The default attribute states that this is the first view a user will see once they have logged into the app:
<nav> <view name="flashtimeline" default="true" /> <view name="custom_chart" /> </nav>
Advanced XML
Charts and tables are generated and slapped onto dashboards, These are simply XML files which contain the instructions on how to display the particular chart or table, To do anything even remotely interesting, you need to convert these XML files from simple to advanced XML. This took me a while to remember.
When you've got the chart visible via the web interface, just add "?showsource=true" to the end of the URL. This will spit out what the advanced XML version of the simple chart looks like. Just copy this and stick it into the XML file. Viola! You've now converted your chart into advanced XML. All views are placed in the view directory:
apps/appName/default/data/ui/views/
Charts are specified in terms of modules. Modules can be nested in other modules, and hence will inherit from their parent. For example:
<module name="TimeRangePicker" layoutPanel="mainSearchControls" autoRun="True"> <module name="HiddenSavedSearch" group="Chart Name" autoRun="True"> <param name="savedSearch">chartPopulationSearch</param> <module name="FlashChart" />
The FlashChart module displays the results piped down from the HiddenSavedSearch module (which uses the savedSearch parameter to specify which search to run). The HiddenSavedSearch module will use the date range specified by the user via the TimeRangePicker module to limit the scope of the search:
[TimeRangePicker] --date-range--> [HiddenSavedSearch] --saved-search--> [FlashChart]
If you can understand this concept, then you should be able to speak fluent Splunkenese. Now lets move on to some the the more interesting features...
TimeRangePicker Module
As said before, this module allows users to specify the range in which to run their search over. You can have one per chart, or one per page. This just depends on how you nest your XML. Here is an example of one per page:
<module name="TimeRangePicker" layoutPanel="mainSearchControls" autoRun="True"> <param name="searchWhenChanged">True</param> <module name="HiddenSavedSearch"> ... </module> <module name="HiddenSavedSearch"> ... </module> </module>
The parameter searchWhenChanged allows you to do away with the search button and apply the new date range right after it has been selected.
JobProgressIndicator Module
Splunk likes to do thinks in the background, which is all well and good, but if you're waiting for a search to finish and you're getting no feedback about its progress, you'll be very tempted to hammer the refresh button. Enter the JobProgressIndicator module! This module reports back to the user (in the form of a progress bar) the current progress of its direct parent. You can place it on a TimeRangePicker, or on a HiddenSavedSearch, wherever. Heres an example:
<module name="TimeRangePicker" layoutPanel="mainSearchControls" autoRun="True"> <param name="searchWhenChanged">True</param> <module name="JobProgressIndicator" /> <module name="HiddenSavedSearch"> <module name="JobProgressIndicator" /> ... </module> <module name="HiddenSavedSearch"> <module name="JobProgressIndicator" /> ... </module> </module>
The previous example will report on the overall progress of the view (for the one nested under the TimeRangePicker module), as well as the progress of each individual chart on the view (for those nested under the HiddenSavedSearch modules).
LinkSwitcher Module
You may find that some of the charts you generate show you the same information, just in a slightly different way and you're not too fond of the extra exercise your scrolling finger has to do. The LinkSwitcher module allows you to group similar charts together into the one chart area and lets the user select which one they want to see at any time. Here's an example:
<module name="TimeRangePicker" layoutPanel="mainSearchControls" autoRun="True"> <param name="searchWhenChanged">True</param> <module name="JobProgressIndicator" /> <module name="LinkSwitcher" layoutPanel="panel_row1_col1" group="Similar Charts"> <param name="mode">independent</param> <param name="label"> </param> <module name="HiddenSavedSearch" group="Chart 1.1"> <module name="JobProgressIndicator" /> ... </module> <module name="HiddenSavedSearch" group="Chart 1.2"> <module name="JobProgressIndicator" /> ... </module> </module> <module name="HiddenSavedSearch" group="Chart 2"> <module name="JobProgressIndicator" /> ... </module> </module>
This places the top two charts into the same chart area. The LinkSwicher will use the group attribute of its children as the name of the link used to switch to that chart.
HiddenChartFormatter Module
This is where the magic happens. Developers use this module to customise the layout of the chart. It is typically nested inside of a HiddenSavedSearch module:
<module name="HiddenSavedSearch" group="Chart 2"> <module name="JobProgressIndicator" /> <module name="HiddenChartFormatter"> ... </module> </module>
This module allows experienced Googlers to put their skills to the test in finding the right invocation of some archaic Splunkism to make it do what they want. Some of the fancy stuff I've used is detailed below.
Chart Colouring
For some reason Splunk will randomly allocate a colour to a particular data set, This could potentially be a good thing, except Splunk (for some reason beyond us mere mortals) will change that colour whenever it feels like it. If a user has come to rely on a particular colour to mean a certain thing then they may find themselves confused. To fix this, you can assign a colour to a data set yourself:
<module name="HiddenChartFormatter"> ... <param name="charting.legend.labels"> [dataset1, dataset2, dataset3] </param> <param name="charting.seriesColors"> [0xFF0000, 0x00FF00, 0x0000FF] </param> <param name="charting.legend.masterLegend"></param> ... </module>
I'm really not sure why but you need to specify the names of the data sets. So if you're charting system utilisation and your chart shows the average RAM used (% AVG RAM Used) and the average CPU used (% AVG CPU Used) then your labels param would look like this:
<param name="charting.legend.labels"> [% AVG RAM Used, % AVG CPU Used] </param>
The hex values in the seriesColors param will correlate to the data set specified in the labels param.
Chart Overlays
There are times when you want to have one chart overlayed onto another, take the system utilisation app i recently developed. One charr shows the ram and cpu used and the overlayed chart shows the transaction throughput. This allows the user to see more in terms of the performance of their system. To do this, you have to make use of Splunks hidden axis*Y2 parameter.
Step 1: State the type of data
<param name="charting.axisX">time</param> <param name="charting.axisY">numeric</param> <param name="charting.axisY2">numeric</param>
Step 2: Give everything a pretty heading
<param name="charting.axisTitleX.text">Time</param> <param name="charting.axisTitleX.visibility">collapsed</param> <param name="charting.axisTitleY.text">Percentage of Utilisation</param> <param name="charting.axisTitleY2">#axisTitleY</param> <param name="charting.axisTitleY2.text">Number of Transactions</param> <param name="charting.axisTitleY2.placement">right</param>
Step 3: Set up the axis scales
<param name="charting.axisY.minimumNumber">0</param> <param name="charting.axisY.maximumNumber">100</param> <param name="charting.axisY2.minimumNumber">0</param>
Step 4: Set up the second y axis label
<param name="charting.axisLabelsY2">#axisLabelsY</param> <param name="charting.axisLabelsY2.axis">@axisY2</param> <param name="charting.axisLabelsY2.placement">right</param>
Step 5: Divide the data set (split the columns up so that they are grouped in terms of which axis they will map to)
<param name="charting.data0">results</param> <param name="charting.data0.jobID">@data.jobID</param> <param name="charting.data1">view</param> <param name="charting.data1.table">@data0</param> <param name="charting.data1.columns">[0,1:2]</param> <param name="charting.data2">view</param> <param name="charting.data2.table">@data0</param> <param name="charting.data2.columns">[0,3]</param>
Step 6: Set up the type of charts to display
<param name="charting.chart.data">@data1</param> <param name="charting.chart">area</param> <param name="charting.chart2">line</param> <param name="charting.chart2.axisY">@axisY2</param> <param name="charting.chart2.data">@data2</param>
Step 7: Set up the layout (this is where you state which charts will be visible on the single graph)
<param name="charting.layout.charts">[@chart,@chart2]</param> <param name="charting.layout.axisTitles">[@axisTitleX,@axisTitleY,@axisTitleY2]</param> <param name="charting.layout.axisLabels">[@axisLabelsX,@axisLabelsY,@axisLabelsY2]</param>
Chart Margins
Dead simple (phew). Arguements are specified in parenthesis (rather than square backets). Why this is inconsistent? I don't know... Takes four values which are (in order of appearance): left, right, top, and bottom. Not to be confused with the CSS margin order (which is top, right, bottom, and left). Why is this inconsistent? Again, i don't know...
<param name="charting.layout.margin">(0,0,-5,-15)</param>
This was especially useful for when i was trying to line up the charts with those that didn't have the second y axis.
Chart Size
Splunk does something especially screwie here. I havn't been able to nail it exactly, but its something to watch out for. If the user resizes a chart, it doesn't seem to matter what the XML says, it will automatically resize to the previous size. I think this comes down to a vewstates.conf file hidden in the users directory. Just nuke this, and pray.
This is specified inside of the FlashChart module, here's an example:
<module name="FlashChart"> <param name="width">100%</param> <param name="height">250px</param> <param name="enableResize">False</param> </module>
Width and height are fairly self explanatory, but I do recommend disabling resize right from the beginning so that you avoid the resize/viewstates conf headache I've had.