Wednesday, December 30, 2015

Apache Solr Enterprise Search Server -- Third edition

This year gave me a chance to be a technical reviewer of the book with search engine topic. The title is Apache Solr Enterprise Search Server and it saw the light in its third edition. The first edition back in 2010 helped me to start thinking in NoSQL way, despite that SQL has been literally everywhere (well, and still is). It does take a bit of mind warping to think beyond relational database lingo and data modelling and in my opinion is rather useful for your career as a software engineer.

Here goes my review on Amazon:

This book in its first edition was the first one around back in 2010, that covered Apache Solr in as much detail as I needed to get into the topic quickly. This third edition includes revisions for Apache Solr 5, notoriously covering things like Solr admin page, SolrCloud, scaling the search engine for large amount of documents, text analysis, indexing, search and even map-reducing your Solr index! In particular, throwing a MapReduce task at large-scale indexing task has been hard / unclear in the past and now it is available to any user of Apache Solr out of the box. This makes books like this immensely important to not waste one's time in looking around for useful bits of information scattered here and there. More importantly, authors of the book are directly involved into the project, either as Apache Solr / Lucene committers or active practitioners and developers of the technology. So I recommend this book for an entry-level and mid-level search engineers that look into getting their hands dirty with search problems and / or improving on the previously untapped areas of the search engine world.

Sunday, October 11, 2015

[ANNOUNCE] Luke 5.3.0 released: naturally runs on Java 8

This release runs on Java8 and does not run on Java7.

This release includes a number of pull requests and github issues. Worth mentioning:
#38 upgrade to 5.3.0 itself
#28 Added LUKE_PATH env variable to
#35 Added copy, cut, paste etc. shortcuts, using Mac command key
#34 Fixed lastAnalyzer retrieval (this feature remembers the last used analyzer on the Search tab)
#31 200 stargazers on github (by the time of this release the number crossed 260). Luke community is growing.

Everybody is welcome to contribute. If you feel like you care about search / indexing or would like to get deeper with Apache Lucene, go ahead and pick a ticket:
And, don't be afraid, we do not have any complaint departments:

All you need is your favourite beverage and a good debugger.

Wednesday, July 8, 2015

[ANNOUNCE] Luke 5.2.0 released

This is a major release supporting lucene / solr 5.2.0. Download the zip here:

It supports elasticsearch 1.6.0 (lucene 4.10.4)
Issues fixed:
#20 Added support for reconstructing field values of indexed and not stored fields, that do not expose positions.
Pull requests:
#23 Elasticsearch support and Shade plugin for assembly
#26 added .gitignore to project
#27 Lucene 5x support
#28 Added LUKE_PATH env variable to
#30 Luke 5.2

I'd like to highlight the contribution of Tomoko Uchida who has been recently very active in sending pull requests, including upgrade to lucene 5.x and first version of Apache Pivot based luke ui.

Wednesday, April 15, 2015

Luke gets support for Elasticsearch indices

That is that, really. The so long awaited proper support for elasticsearch indices.

Luke supported Apache Solr indices already. Why not Elasticsearch? The reason was, that ES uses its own SPI for postings format. If you tried to open an Elasticsearch index with luke before, you'd get something like:

A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene40, Lucene41]

The biggest issue of supporting custom SPI is that you'd need to hack the luke jar binary and add the ES SPI. I bet it is not what you would want to spend your time on.

With the excellent pull request by apakulov luke uses shade maven plugin, that does all the magic. It magically updates the in-binary META-INF/services file with the following entry:


Currently this is available on luke master: and a pre-release:

Saturday, March 21, 2015

Flexible run-time logging configuration in Apache Solr 4.10.x

In a multi-shard setup it is useful to be able to change log level in runtime without going to each and every shard's admin page.

For example, we can set the logging to WARN level during massive posting sessions and back to INFO, when serving the user queries.

In solr 4.10.2 these one-liners do the trick:

# set logging level to WARN,
# saves disk space and speeds up massive posting 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:WARN&wt=json" 
# set logging level to INFO,
# suitable for serving the user queries 
curl -s http://localhost:8983/solr/admin/info/logging \
                       --data-binary "set=root:INFO&wt=json"

Back from Solr you get a JSON with the current status of each configured logger.

Monday, March 16, 2015

Luke keeps getting updates and now on Apache Pivot

Originally developed for fun and profit by Andrzej Bialecki, the lucene toolbox luke continues to be developed. Its releases are published at:

Most recently Tomoko Uchida has contributed into effort of porting Luke to an Apache License 2.0 friendly GUI framework Apache Pivot. New branch has been created to host this work:

Currently supported Lucene: 4.10.4.

It is far from completion, but already now you can:

  • open your Lucene index and check its metadata

  • page through the documents and analyze fields

  • search the index

We will appreciate if you could test the pivot luke and give your feedback.

Monday, November 17, 2014

Lightweight Java Profiler and Interactive svg Flame Graphs

A colleague of mine has just returned from the AWS re:Invent and brought in all the excitement about new AWS technologies. So I went on to watching the released videos of the talks. One of the first technical ones I have set on watching was Performance Tuning Amazon EC2 Instances by Brendan Gregg of Netflix. From Brendan's talk I have learnt about Lightweight Java Profiler (LJP) and visualizing stack traces with Flame Graphs.

I'm quite 'obsessed' with monitoring and performance tuning based on it.
Monitoring your applications is definitely the way to:

1. Get numbers on performance inside your company, spread them and let people talk stories about them.
2. Tune the system in where you see the bottleneck and measure again.

In this post I would like to share a shell script that will produce a colourful and interactive flame graph out of a stack trace of your java application. This may be useful in a variety of ways, starting from an impressive graph for you slides to making informed tuning of your code / system.

Components to build / install

This was run on ubuntu 12.04 LTS.
Checkout the Lightweight Java Profiler project source code and build it:

svn checkout \ \
cd lightweight-java-profiler-read-only/
make BITS=64 all

(omit the BITS parameter if you want to build for 32 bit platform).

As a result of successful compilation you will have a binary that will be used to configure your java process.

Next, clone the FlameGraph github repository:

git clone

You don't need to build anything, it is a collection of shell / perl scripts that will do the magic.

Configuring the LJP agent on your java process

Next step is to configure the LJP agent to report stats from your java process. I have picked a Solr instance running under jetty. Here is how I have configured it in my Solr startup script:

java \
      build-64/ \
-Dsolr.solr.home=cores start.jar

Executing the script should start the Solr instance normally and will be logging stack trace to traces.txt

Generating a Flame graph

In order to produce a flame graph out of the LJP stack trace you will need to perform the following:

1. Convert LJP stack trace into a collapsed form that FlameGraph understands.

2. Call tool on the collapsed stack trace and produce the svg file.

I have written a shell script that will do this for you.


# change this variable to point to your FlameGraph directory


   $(dirname $LJP_TRACES_FILE)\
       $(dirname $LJP_TRACES_FILE)/${FILENAME%.*}.svg

# collapse the LJP stack trace
$FLAME_GRAPH_HOME/stackcollapse-ljp.awk $LJP_TRACES_FILE > \

# create a flame graph

And here is the flame graph of my Solr instance under the indexing load.

You could interpret this diagram bottom-up: the lowest level is entry point class that starts the application. Then we see that CPU-wise two methods are taking the most of the time: org.eclipse.jetty.start.Main.main and

This svg diagram is in fact an interactive one: load it in the browser and click on the rectangles with methods you would like to explore more. I have clicked on the
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd rectangle and drilled down to it:

It is this easy to setup a CPU performance check for your java program. Remember to monitor before tuning your code and wear a helmet.