Let me start out by saying that I am not a developer. I do have a technical background, but I hadn't coded in Java for at least 10 years before I got involved in the Apache Drill project. One has to wonder how, as a non-developer, I ended up as a committer for the Drill project. In this blog post, I'd like to share with you how I came to be involved with the Drill project.
But first, why Drill?
I first heard about Drill at an industry conference several years ago. I was speaking with Dr. Ellen Friedman about some data issues we were having and she casually mentioned have I tried Drill? I had not heard of it at that point, so I did some research and it seemed as if Drill could solve a lot of problems that my clients were having. But then, I tried using it and kept getting stuck.
If you aren't familiar with Apache Drill, Drill is an SQL engine which allows you to query any kind of self-describing data. After experimenting with Drill for a while, I was impressed enough to thing that the tool had major potential in security. One of the biggest problems that Drill solves is the need to Extract, Transform, Load (ETL) data into an analytic tool before actually doing analysis of that data. This ETL process adds no value to anything really, and costs large enterprises literally millions of dollars as well as adding unnecessary delays between the time data is ingested and when the data is actually available for analysis. In security applications, this delay directly translates into risk. The longer it takes to make your data available, the more time it will take to potentially find malicious activity and hence, more risk. Therefore, if you're able to query the data without having to do any kind of ETL or ingestion, you are lowering your risk as well as potentially saving millions of dollars.
Unfortunately, when I started using Drill, I saw this potential, but I couldn't get it to work. My next step from here was to try to get assistance at my company. I pitched the ideas to my company leadership, but it proved very difficult to get the company to pull Java developers from revenue generating projects to work on this "pie-in-the-sky", unproven project. After spending several months on this, I got really frustrated and decided that I was going to try to do it myself, however, I really had no idea what I was doing. I hadn't coded in Java for at least 10 years at the time, and had zero experience with all the modern Java development tools such as Maven and Git. What I did have was persistence, so I started asking for help and decided that I was going to dive right in and start adding the functionality that I felt Drill needed to be useful in security applications. I started working on something that someone else started—the HTTPD format plugin for Drill. Most of the coding was done, but there was still enough there for me to get my hands dirty and start figuring things out.
What I learned
I still would not consider myself a developer, but after getting that particular item committed to the codebase, I learned a lot about how open source projects actually work as well as writing production quality code. Since then, I've tried to add at least one bit of new functionality to each Drill release. I would encourage anyone who is interested in contributing to an Open Source project at the Apache Software Foundation, to dive right in, and start. There are still a lot of ideas I have for Drill, and with time, I hope to have the time to see them through to implementation.
In conclusion, I'm fairly certain that my involvement with Drill and the Apache Software Foundation is really just beginning. I'm currently working on the O'Reilly book about Apache Drill with a fellow Drill committer. It is my hope that the book will spark additional interest in Apache Drill. Open Source software is at the heart of the ongoing data revolution which is dramatically expanding what is possible with data. I firmly believe that Apache Drill will have a role to play in this data revolution and I'm honored to have the opportunity to play a small role in developing Drill.
Charles Givre CISSP is a Lead Data Scientist at Deutsche Bank where he works in the Chief Information Security Office (CISO). Mr. Givre is an active data science instructor and regularly teaches classes about data science and security at various industry conferences, such as BlackHat. Mr. Givre is a committer for the Apache Drill project and together with Mr. Paul Rogers, is working on the forthcoming O’Reilly book about Apache Drill. He can be reached at cgivre(at)apache(dot)org.
= = =
"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache
# # #