Intro to things break
In this series, we’ve now arrived here:
- Intro
- Unit Tests
It was easy to get a full local mode Hive shell up on the command line to do verification on our simple UDF. But do you really want to go through that every time? What if you have 100 UDFs that you need to verify after a small change to a shared library? Do you really want to go through one-by-one and re-test all that by hand? Bananas! Lets try to automate this as much as possible so that it has a disciplined repeatability and can be executed as quickly as possible. This should be nearly effortless!
What is JUnit?
If you’ve worked with any sort of Java before, you probably already know what this is and can skip ahead. But, in case you’re new, let me reward your curiosity!
The JUnit testing framework has been around for a long time. It’s easy to use, ubiquitous, and nearly every Java development or build tool out there will support it in some way. There are also tons of libraries that can make you more productive at making quick, clear tests. So it’s a good choice for where to start in this exercise. There are alternatives out there like TestNG, but note that many (notably TestNG) require a 1.7+ JDK which contradicts the potential Kotlin 1.6+ support. Choice done, onward!
What is the Klarna HiveRunner?
The Klarna HiveRunner provides an easy way to inject a bottled local mode Hive directly into JUnit tests. For instance, they provide:
- Mappings for data defined in code to files in HDFS
- Auto config of Hive local mode for fast test performance
- Ability to safely run many unit tests concurrently on Hive
- Auto-load scripts for schemas, constants, or other common definitions
Hive can be pretty slow when operating on small scale data, which your unit tests will virtually always be. So a lot of this is invaluable as you grow your code base.
It’s also important to note that the different HiveRunner releases each assume different versions of Hive. By looking at the appropriate POM files for each release you can see:
- HiveRunner 3.0.0 ==> Hive 1.2.1
- HiveRunner <=2.6.0 ==> Hive 0.14.0
Since we’re making a Hive v1.0.0 assumption (see original post), we will eventually end up citing our own dependency on Hive. Because of the way that the Java class loader works, our dependency will desirably win at run-time which means that version is what all tests will actually run against. This also means that the HiveRunner version we’re using may behave unpredictably with the version we’re forcing because of the assumptions made when it was built and tested. For instance, when I tried running this project on HiveRunner 3.0.0, time out errors and other misconfigurations occurred because Hive 1.0.0 doesn’t behave as this version of HiveRunner expects it to.
Foundation for the future
Given these new tools, lets first create a test base class that we can use to build all our future tests in a way that’s as clear and concise as possible.
In this case, I’ll be using this:
import com.klarna.hiverunner.HiveShell
import com.klarna.hiverunner.StandaloneHiveRunner
import org.junit.Before
import org.junit.runner.RunWith
import kotlin.reflect.KClass
@RunWith(StandaloneHiveRunner::class)
abstract class TestBase(val methodName:String, val classToTest:KClass) {
var setupComplete = false
val childHiveShell by lazy {
ReflectUtils.getFieldValue(this, "hiveShell") as HiveShell
}
fun execute(str:String) {
childHiveShell.execute(str)
}
fun query(queryStr:String):List {
return childHiveShell.executeQuery(queryStr)
}
fun queryOne(queryStr:String):String? {
val results = query(queryStr)
assertNotNull("Hive should not provide a null response!", results)
assertEquals("Expected exactly 1 result!", 1, results.size)
return results.first()
}
@Before
fun prepare() {
if(!setupComplete) {
execute("CREATE TEMPORARY FUNCTION $methodName AS '${classToTest.qualifiedName}'")
setupComplete = true
}
}
}
Unfortunately the HiveRunner requires that the HiveShell used by each test class get added as a member of the actual class and can’t be in the parent. However! We can use a bit of reflection to get around this so that we can still have as much common code in the parent as possible. Don’t imitate this pattern in any production code if you can help it, but we have a bit of design slack with tests. The lazy delegator here means that this won’t try to fetch the child’s definition until after we expect it to have been injected by the HiveRunner. That saves lots of annoying infrastructure. Fancy!
Now, the only information the test class has to know about HiveRunner is the declaration of the HiveShell. The helper methods provide the rest:
executeexpects no data backqueryexpects nothing, one, or many records backqueryOneexpects exactly one record
The base class will also automatically declare the function you intend to test when you pass the required non-null parameters to the base constructor.
What if we need Java?
But what about ReflectUtils? Where did that come from?! In this case, I spun my own helpers to illustrate the fact that our project can cross compile both Java and Kotlin code together. It also shows that each can also access each other without difficulty. You can find the full implementation (in Java) and tests (in both Java and Kotlin) at these URLs:
I won’t go over these in detail since it’s not at the core of what we’re trying to do, but please take a look if you’re curious.
The simplest UDF test
Now we should have everything we need to define our first actual test. That was a bit of work, but it’ll save us a ton of time in the future. It’s worth it! Here’s what our simplest UDF test can now look like:
import com.klarna.hiverunner.HiveShell
import com.klarna.hiverunner.annotations.HiveSQL
import org.junit.Assert.assertEquals
import org.junit.Test
class SandboxSimpleUDFTest : TestBase("sandbox", SandboxSimpleUDF::class){
@Suppress("unused")
@field:HiveSQL(files = arrayOf())
var hiveShell:HiveShell? = null
@Test
fun simpleCase() {
assertEquals("test me", queryOne("SELECT sandbox('Test Me')"))
}
@Test
fun emptyString() {
assertEquals("", queryOne("SELECT sandbox('')"))
}
@Test
fun blankString() {
assertEquals(" ", queryOne("SELECT sandbox(' ')"))
}
@Test
fun alreadyLowerCase() {
assertEquals("test me", queryOne("SELECT sandbox('test me')"))
}
@Test
fun allUpperCase() {
assertEquals("test me", queryOne("SELECT sandbox('TEST ME')"))
}
@Test
fun nullValue() {
assertEquals("NULL", queryOne("SELECT sandbox(NULL)"))
}
@Test
fun integersGetAutoMappedToStrings() {
assertEquals("123", queryOne("SELECT sandbox(123)"))
}
}
First off, note that our declaration of hiveShell requires a field scoped annotation to work. As already mentioned above, Klarna is very particular about how it searches for the annotation. Not only is it required that this live only on the child class, but it also (unsurprisingly!) has no concept of how Kotlin members work. So we have to make sure that the annotation actually gets placed on the backing field and not the overall construct.
Second, look how beautiful those tests are? Tell me you couldn’t pump out a billion of those in your sleep! And that if something goes wrong logically, it’s immediately clear what went wrong. You can’t tell me that, can you?
As is now obvious, it would make for a cleaner test class if we didn’t have to add the hiveShell member every time, but it does at least provide a way for the test to customize the options for the HiveSQL annotation (see its source or generated javadoc for details).
Maven config updates
Now that we have all the tests cobbled together, we have to make sure all the different pieces can work together in the real world. Here are the additions we will have to make to the pom.xml on top of what we started with in the last post (full file here):
<project>
...
<dependencies>
...
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.klarna</groupId>
<artifactId>hiverunner</artifactId>
<version>2.6.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-reflect</artifactId>
<version>${kotlin.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
...
</project>
The junit and hiverunner dependencies seem obvious, but why kotlin-reflect? This is because of the call we make to classToTest.qualifiedName in the TestBase class. This ends up invoking Kotlin’s reflection libraries which, unlike Java, they chose to make an optional dependency. This means you’ll only need this when running tests locally (hence the ‘test’ scope) and not when you deploy to a real Hive environment.
Command that line to test
We made it! Lets actually run those tests now by executing this on the command line from the project base directory:
kotlin-hive-unittests$ mvn clean test [INFO] Scanning for projects... [INFO] [INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Kotlin on Apache Hive - Unit Tests 1.0.0 [INFO] ------------------------------------------------------------------------ ...A bunch of stuff we don't care about right now... ------------------------------------------------------- T E S T S ------------------------------------------------------- Running com.mergehead.kotlinhive.unittests.ReflectUtilsJavaTest Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.132 sec Running com.mergehead.kotlinhive.unittests.ReflectUtilsKotlinTest Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 sec Running com.mergehead.kotlinhive.unittests.SandboxSimpleUDFTest OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.876 sec Results : Tests run: 39, Failures: 0, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 19.947 s [INFO] Finished at: 2016-02-25T14:31:33-08:00 [INFO] Final Memory: 66M/370M [INFO] ------------------------------------------------------------------------
Celebration! Note that all of the “OK” lines are actually coming to stdout from Hive and not from JUnit or Maven. This isn’t a big deal now, but note that some of the Hive code base logs things in shady ways that can look all kinds of weird in the right circumstances. We shall get into that later!
Are we too optimistic?
Did that really just work? Or are we just passing all tests all the time because Klarna wants us to like them? Lets temporarily add one more test just so we can see what failure looks like:
@Test
fun doesFailureFail() {
assertEquals("I paid for an argument", queryOne("SELECT sandbox('No you didn\\'t')"))
}
And the output of this on the command line is:
Davids-MacBook-Pro-2:kotlin-hive-unittests drom$ mvn clean test [INFO] Scanning for projects... [INFO] [INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Kotlin on Apache Hive - Unit Tests 1.0.0 [INFO] ------------------------------------------------------------------------ ...A bunch of stuff we don't care about right now... ------------------------------------------------------- T E S T S ------------------------------------------------------- Running com.mergehead.kotlinhive.unittests.ReflectUtilsJavaTest Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.128 sec Running com.mergehead.kotlinhive.unittests.ReflectUtilsKotlinTest Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec Running com.mergehead.kotlinhive.unittests.SandboxSimpleUDFTest OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.939 sec <<< FAILURE! doesFailureFail(com.mergehead.kotlinhive.unittests.SandboxSimpleUDFTest) Time elapsed: 0.513 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[I paid for an argumen]t> but was:<[no you didn']t> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at com.mergehead.kotlinhive.unittests.SandboxSimpleUDFTest.doesFailureFail(SandboxSimpleUDFTest.kt:75) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at com.klarna.hiverunner.StandaloneHiveRunner.evaluateStatement(StandaloneHiveRunner.java:176) at com.klarna.hiverunner.StandaloneHiveRunner.access$000(StandaloneHiveRunner.java:64) at com.klarna.hiverunner.StandaloneHiveRunner$1$1.evaluate(StandaloneHiveRunner.java:91) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at com.klarna.hiverunner.ThrowOnTimeout$1.run(ThrowOnTimeout.java:42) at java.lang.Thread.run(Thread.java:745) Results : Failed tests: doesFailureFail(com.mergehead.kotlinhive.unittests.SandboxSimpleUDFTest): expected:<[I paid for an argumen]t> but was:<[no you didn']t> Tests run: 40, Failures: 1, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 24.672 s [INFO] Finished at: 2016-02-26T08:46:40-08:00 [INFO] Final Memory: 66M/370M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project kotlinhive-unittests: There are test failures. [ERROR] [ERROR] Please refer to /Users/drom/Projects/MergeHead/kotlin-hive-unittests/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException Davids-MacBook-Pro-2:kotlin-hive-unittests drom$
So, failure fails! That’s comforting. You can see that the output is helpful, but a bit buried in all the other detail. This detail is great when tracking down obscure dependency issues or for integrating with other tools, but it’s a lot for human eyes to have to sift through…
Another IDEA for testing
It’s worth mentioning that Kotlin comes from Intellij, Intellij provides the IDEA IDE for Java development (free and OSS for the non-enterprise version), and they also provide a Kotlin plugin (also free and OSS) for IDEA to make working with Java / Kotlin easier. It’s almost suspiciously convenient…
Given this, if you open the full source for this post (see here) as an IDEA project after installing the Kotlin plugin, then you should be able to run any test file or even individual test method by just right clicking and selecting “Run YourTest”. You’ll then get presented with a nicer, more readable view of the results at the bottom of the IDE like so:
If a test fails, its icon will turn red or yellow depending on the nature of the failure. Clicking on a particular test will show you the stdout / stderr streams from just that test execution. Integrated full text search (including regexp support) is available for all output. It can get you into a nice workflow of change code, run tests with a hot key, and repeat if IDEs are your thing. If not, then the command line option plus your favorite text editor can be your end all.
In keeping with the spirit of the last section, here’s what failure in IDEA would look like:
One last thing worth mentioning is that the IDE makes it pretty easy to set break points, run specific tests with a debugger, and then step through your code and even Hive / Hadoop original source step-by-step so you can know exactly what went wrong (e.g., Hive is expecting to get an object inspector for a what?! How did that happen??).
Final words
So there’s a basis for automated testing of all our new Hive UDFs. If you’d like to see the full project source used here, check out this repo.
See you in the next post!

