BigData at the Commandline

BigData at the Commandline

BigData and Agile seem not to be friendly in the past but that is no more the case.  One of the important points in processes data is data integrity.   Assuming you are pulling data from an API(Application Programming Interface) and performing some processing on the result before dumping as utf-8 gzipped csv files on  Amazon’s S3.  The task is to confirm that the files are properly encoded(UTF-8), each file has the appropriate headers, each row in each file do not have missing data and finally produce a report with filenames, column count, records count and encoding type.   There are many languages today and we can use any BUT speed is of great importance.  also, we want to have a Jenkins (Continuous Integration Server) job running.

I have decided to use Bash to perform these checks and will do it twice!  First, I will use basic Bash commands and then will use the csvkit (  The other tool in the mix is the AWS commandline tool(aws-cli)

Bash: List all files but ignore some

Bash: List all files but ignore some

On the commandline (terminal) in the *nix world, when you need to list all the files in a directory but ignore some based on file extension e.g. pdf, sh, tsv etc. Then the command below is quite appropriate. Remember to update the list of extensions you want to ignore i.e. “sh|tsv|rb|properties”

ls -l | grep -Ev '\.(sh|tsv|rb|properties)$' | column

Let me know if this is useful.

Project Automation Guide

Test Automation Guide

Having worked in various environments where test automation is a constant part of the processes and in some cases migration from manual testing to automated testing taking place,  I am putting together a guide on how to introduce test automation as a reminder for myself and other testers or developer tester about to embark on such tasks.   Here’s a snapshot of what I will be covering:

  1. How to assess your current state of test automation readiness
  2. How to develop a plan that will optimize test automation, people and processes
  3. How a data-driven, scriptless test technology can benefit any sized organization
  4. How to develop and maintain high levels of regression testing needed, using dynamic automated methods
  5. How to leverage the best mix of tools and methods to accelerate success
  6. How to quantify the value of automation for your business

Tried Selenium in Python or Scala before?

Selenium comes in many flavours – really programming languages, and my recent encounter with Python pushed me in the direction of trying what I had developed in Java and C# in Python. Due to the nature of my job and projects, I get the nudge to learn new programming languages and that has afforded me the opportunity to embrace many of them like Java, C#, Python, Groovy, Javascript, PHP, Ruby and more recently Scala. I am currently reading the Scala in Action book and will attempt to create a Selenium framework in Scala. Before I do that, I will go the way of Python for basic setup and a quick test in Selenium.

First and foremost, let talk about the development environment. I love Macs! They make my life easy and cool. I will be setting up the Selenium (WebDriver) framework using GIT as my source control management tool.


  1. Terminal
  2. Homebrew
  3. Pip
  4. GIT
  5. Python IDE : PyCharm, Aptana, Eclipse – the choice is yours. PyCharm is my choice.
  6. Jenkins


On your Mac, open a terminal session, type in brew. You should get some error message if Homebrew is not already installed. If Homebrew is not installed, fire up your browser and rush down to BREW to get your Mac brewed. Follow the steps and you should have that lovely tool up and running on your machine. Once installed, run the command: brew update.
It should inform you your machine has nothing installed via Brew…or something similar.
That’s your first taste of brew (I don’t drink by the way).
To keep your machine always upto date, I have a very basic BASH script in GitHub. It does the job.

Next on the agenda is getting the IDE installed. That’s simple, in your browser, go to JetBrains. Download the community version and you should be fine. Once installed, move to the next step.

Back to the terminal, let’s install Pip, a Python package manager.