Root-causing the random failures in the integration tests with ElasticSearch

In our recent development we were creating an integration test framework and some tests for manipulating data in the ElasticSearch cluster. Strangely the tests could succeed or fail randomly, even though we never made any changes to the code on the business logic at that time.

What did we have in the test cases?

  • @BeforeClass: load the test data into ElasticSearch cluster through ElasticSearch TransportClient.
  • @Test: retrieve test data and check equality on some fields.
  • @AfterClass: clean up the test data through ElasticSearch TransportClient.

Really just simple as this.

What did it the error message say when the tests failed? Well it complained about not being able to find the test data.

Strange. The @BeforeClass annotated method should always load the data into the cluster before executing the test cases and there were no errors about failing to load data. Feeling a bit stuck, I commented out the clean up code in the @AfterClass method. Now the tests passed consecutively on every test I issued but once I added back the cleanup code, it started failing occasionally again, especially when I ran the test right after the previous one finished.

This got me thinking: “Could it be possible that the test data was cleaned up at the end of the previous test but not loaded into the cluster in the next run even though @BeforeClass method was executed? ”

My suspicion was confirmed after some reading on how ElasticSearch loads data. Why did this happen? Because loading data into ElasticSearch cluster takes time and so does deleting them. The test cases were executed right after the load request was issued in the @BeforeClass method, but not necessarily after the request was processed by the cluster. In other words, it is asynchronous. We made a false assumption that the load request was processed and the data was present in the cluster immediately. This mindset may be OK in unit test but with integration test it can be problematic.

Stupid solution: Add a buffer before actually executing the tests, for example, Thread.sleep(30000) in the @BeforeClass method. However, this does not always guarantee the data was loaded if the data size is large.

Better solution: Send a request to verify that the request is actually processed given the request id. Wait in the @BeforeClass method until the request is finished.

Whatever you do, make sure that the test data are actually in the cluster before moving on.

 

 

Advertisements
Root-causing the random failures in the integration tests with ElasticSearch

How to use JavaConfig Bean in Spring XML

Our current project is at the first stage to wire all the components together and do a simple integration test. When I took on this task, I found that all beans were defined in XML. Given the number of beans I have to create, it would be tedious to write them all in XML. Personally I prefer using JavaConfig to the XML files as the navigation is easier for me in JavaConfig. But I don’t want to change the XML configurations into JavaConfig all at once. Can I define JavaConfig Beans and use them in the XML?

A bit of search revealed a simple way. Now assume that we have a provider class as follows:

package com.example.xyz;

@Configuration
public class ResourceProvider{
    @Bean
    public SQSWrapper sqsWrapper() {
        return new SQSWrapper();
    }
}

Assume that we have an application.xml file and we want to use the SQSWrapper Bean in a bean definition in the file:

<bean id="SQSConsumer" class="com.example.xyz.SQSConsumerImpl">
    <constructor-arg ref="THE_ID_OF_THE_SQSWRAPPER_BEAN">
</bean>

To do that we need to add two extra lines to the file and then we specify the id of the SQSWrapper bean by using the method name sqsWrapper. The complete xml file looks like this:

<context:annotation-config/>

<!-- The following line brings in the beans defined in the ResourceProvider -->
<bean class="com.example.xyz.ResourceProvider" />

<bean id="SQSConsumer" class="com.example.xyz.SQSConsumerImpl">
    <constructor-arg ref="sqsWrapper">
</bean>

The first line “annotation-config” is crucial as noted in this stackoverflow answer: “while annotation-config is switched on, the container will recognize the @Configuration annotation and process the @Bean methods declared in JavaConfig properly”.

Now that saved me from creating more xml files!

How to use JavaConfig Bean in Spring XML

Notes on Java Daemon Thread

I’m going to work on Daemon thread in my new job, but I have no idea what it is. This post summarizes some of the key points from a stackoverflow post.


 

First, let’s look at daemon threads in Unix. Simply put, they are threads running in the background that answer requests for services. You can check more of it on Wikipedia.

There are two types of Java thread:

  • Normal/User thread: Generally all threads created by programmer are user thread (unless you specify it to be daemon or the parent thread spawning the new thread is a daemon thread). The main thread is by default a non daemon thread.
  • Daemon thread: it is similar (I don’t know if I can say that. Correct me if I’m wrong please). Daemon threads are like a service providers for other threads or objects running in the same process as the daemon thread (In other words, they may serve the user threads). They are typically used for background supporting tasks.

Points to Note about Java Daemon Threads:

  • (needs verification) It has very low priority and only executes when no other threads of the same program is running
  • When there are no more user threads (meaning that only daemon threads are running in a program), the JVM will ends the program and exit. This is reasonable. If there are no one to serve any more, why keep the servants? (This is my own thoughts) 
  • When the JVM halts, all daemon threads are abandoned. The “finally blocks“ are not executed and stacks are not unwound (not sure what this means).
  • Daemon threads usually have an infinite loop in its run() method that waits for the service request or performs the tasks of the thread.
  • We can set a thread to be daemon through the setDaemon() method but we can only do that before the start of the thread.
  • We can check if a thread is a user thread or daemon thread using isDaemon() thread.

Examples of Java Daemon Threads:

  • Garbage collection. It runs in the background, claiming resources from unwanted objects.
  • A good Java code example from that post, reposted on gist

Things to check…

  • Non-daemon threads (default) can even live longer than the main thread.
Notes on Java Daemon Thread

Java Multithreading Notes From Lecture Two

This is part of the notes from an online course (Java Multithreading) I’m taking on Udemy. Nothing complicated.


In theory it is possible that on some system a Java thread may ignore changes to its own data from other threads. If the changes are not made inside its own thread, it may have no effect. We can call it caching variable in thread.

To prevent such thing, we can add the keyword volatile to the variable that may be changed by other threads and guarantee that changes can be seen.

An example on gist.

 

Java Multithreading Notes From Lecture Two

Java Multithreading Notes From Lecture One

This is part of the notes from an online course (Java Multithreading) I’m taking on Udemy. Nothing complicated.


 

There are normally three ways to create threads (Examples on gist):

  • Create a class that extends the Thread class
  • Create a class that implements the Runnable interface
  • Create a Thread anonymously

Whichever we choose to use, we must override or implement the public void run method.

Multithreading:

All Java programs have a main thread, but we can create and invoke other threads from the main thread.

To do that, we need to call the start() method of each thread we want to invoke from main thread. It will look for the run() method and run that in its own special thread, not in the main thread (refer to the App.java in the gist).

The start() method will return immediately so the main thread will continue its execution of the next line of code.

However, if we accidentally call the run() method of those threads, then the method run() will be executed in the main thread, not in its own special thread! So be careful.

 

Java Multithreading Notes From Lecture One

Manage and Repeat Experiments

It has been painful that when doing experiments I forget saving the parameters and lose track of them when the results come out. OK, the truth is that I’m lazy and I change the parameters in code and hope that my brain can remember the difference. Well it can’t and when I want to repeat any analysis, things start to bite me back. So I searched a bit online and see if there are some good strategies out there to manage the experiments. And luckily I did find some excellent post:

http://stackoverflow.com/questions/6437213/strategies-for-repeating-large-chunk-of-analysis/6550914#6550914

http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets

In these two posts, one answer mentioned that he uses JSON files to save parameters for different experiments and when reproduction is needed, he can simply import them. Quoting from the answer: “Everything in between is just code that runs with a given parametrization, but the code shouldn’t really change much, should it?”

Since I’m using R recently, I wrote a short script that help a user create a list of parameters and export them to a JSON file. It is kind of raw but  I hope someone will find it useful. It doesn’t have to be R. You can write your own scripts in a language you prefer.

Code on github:

https://github.com/kiribatu/Kiribatu-R-Toolkit/blob/master/docs/parameter_configuration.md

Enhanced by Zemanta
Manage and Repeat Experiments

Tips for Plot in R (1) — inconsistent type of coordinate parameters

The plot function in R seems really simple. But I ran into the following problem and it took me some time to figure it out.

# suppose you have two vectors v1 and v2
v1 &lt;- c(1,2,3)
v2 &lt;- c(3,4,5)
# we also create a data frame using v1 and v2
df &lt;- data.frame(v1=v1, v2=v2)
# to plot v1 against v2 (1)
plot(v1, v2)
# or we can do
plot(df$v1, df$v2)
# BUT we cannot use plot(v1, df["v2"])
# This will throw an error that 'v1' and 'df["v2"]' 
# have different length

This error confused me a bit since I think for sure v1 and df[“v2”] have the same length 3. Well it turns out they don’t.

# if you check the type of v1 and df["v2"]
class(v1) # this returns a "numeric" vector with length 3
class(df["v2"]) # this returns a "data.frame" with length 1

Ops, we got two different types of variables. We need to convert our “data.frame” to a numeric vector we can use.

# instead of using df["v2"], we could use either df$v2 or df[,"v2"].
plot(v1, df[,"v2"])
Tips for Plot in R (1) — inconsistent type of coordinate parameters