BigData and Agile seem not to be friendly in the past but that is no more the case. One of the important points in processes data is data integrity. Assuming you are pulling data from an API(Application Programming Interface) and performing some processing on the result before dumping as utf-8 gzipped csv files on Amazon’s S3. The task is to confirm that the files are properly encoded(UTF-8), each file has the appropriate headers, each row in each file do not have missing data and finally produce a report with filenames, column count, records count and encoding type. There are many languages today and we can use any BUT speed is of great importance. also, we want to have a Jenkins (Continuous Integration Server) job running.
I have decided to use Bash to perform these checks and will do it twice! First, I will use basic Bash commands and then will use the csvkit (http://csvkit.readthedocs.org/). The other tool in the mix is the AWS commandline tool(aws-cli)
On the commandline (terminal) in the *nix world, when you need to list all the files in a directory but ignore some based on file extension e.g. pdf, sh, tsv etc. Then the command below is quite appropriate. Remember to update the list of extensions you want to ignore i.e. “sh|tsv|rb|properties”
ls -l | grep -Ev '\.(sh|tsv|rb|properties)$' | column
Let me know if this is useful.
Test Automation Guide
Having worked in various environments where test automation is a constant part of the processes and in some cases migration from manual testing to automated testing taking place, I am putting together a guide on how to introduce test automation as a reminder for myself and other testers or developer tester about to embark on such tasks. Here’s a snapshot of what I will be covering:
- How to assess your current state of test automation readiness
- How to develop a plan that will optimize test automation, people and processes
- How a data-driven, scriptless test technology can benefit any sized organization
- How to develop and maintain high levels of regression testing needed, using dynamic automated methods
- How to leverage the best mix of tools and methods to accelerate success
- How to quantify the value of automation for your business
First and foremost, let talk about the development environment. I love Macs! They make my life easy and cool. I will be setting up the Selenium (WebDriver) framework using GIT as my source control management tool.
- Python IDE : PyCharm, Aptana, Eclipse – the choice is yours. PyCharm is my choice.
On your Mac, open a terminal session, type in
brew. You should get some error message if Homebrew is not already installed. If Homebrew is not installed, fire up your browser and rush down to BREW to get your Mac brewed. Follow the steps and you should have that lovely tool up and running on your machine. Once installed, run the command:
It should inform you your machine has nothing installed via Brew…or something similar.
That’s your first taste of brew (I don’t drink by the way).
To keep your machine always upto date, I have a very basic BASH script in GitHub. It does the job.
Next on the agenda is getting the IDE installed. That’s simple, in your browser, go to JetBrains. Download the community version and you should be fine. Once installed, move to the next step.
Back to the terminal, let’s install Pip, a Python package manager.