Python: Cleanup all Conda environments

Python: Cleanup all Conda environments

Conda is a popular open-source package manager and environment management system used for Python programming language. It is widely used by data scientists and machine learning engineers for managing packages and dependencies. In some cases, you may need to remove all environments in Conda. This can be helpful when you want to start with a clean slate or when you want to switch to a different version of Conda. In this blog post, we will discuss how to remove all environments in Conda.

Before we begin, it is important to note that removing all environments in Conda will delete all your existing environments and their associated packages. This process is irreversible, so it is recommended that you create a backup of your environment and package information before proceeding with the removal process. You can follow the steps below to achieve your goal.

Step 1: Open the Anaconda prompt or terminal The first step to removing all environments in Conda is to open the Anaconda prompt or terminal. You can do this by searching for “Anaconda prompt” or “Anaconda terminal” in your operating system’s search bar.

Step 2: Deactivate all active environments Before you can remove all environments, you must first deactivate all active environments. You can do this by running the following command in the Anaconda prompt or terminal:

conda deactivate

This will deactivate all active environments, and you will see that your prompt or terminal no longer displays the name of the active environment.

Step 3: Remove all environments To remove all environments in Conda, you can use the conda env list command to list all existing environments, and then use a loop to remove each environment one by one. Run the following commands in the Anaconda prompt or terminal:

conda env list

This command will list all the environments that you have created in Conda. Make sure that you have deactivated all active environments before running the following command:

for env in $(conda env list | awk '{print $1}' | grep -v '#' | grep -v 'base' | grep -v 'Name'); do
    conda remove --name $env --all -y
done

This loop will remove all environments except for the base environment, which is the default environment created by Conda.

Step 4: Confirm removal After running the loop to remove all environments, you can confirm that all environments have been removed by running the conda env list command again. This command should now display only the base environment.

In conclusion, removing all environments in Conda is a simple process that involves deactivating all active environments and then using a loop to remove each environment one by one. It is important to note that this process is irreversible, so it is recommended that you create a backup of your environment and package information before proceeding with the removal process.

Compare Millions of messages using Spark(PySpark)

Compare Millions of messages using Spark(PySpark)

Using the Power of Apache Spark(PySpark) to Verify Loads of Messages

Imagine a scenario where you have a system costing you about £30K to run and maintain coupled with the fact that it is a necessary component in your company’s audit trail hence it cannot deprecated just like that — as this will be an expensive business decision.

Overview of Legacy System

All events in the system emit messages for audit purposes and the messages are immediately available, for initially 7 days and then 30 days. The messages are archived once the 30 days period is achieved. Archive location of storage is simply AWS S3.

Stored messages can be retrieved from S3 as at when needed. 

Various teams in the organisation need access to these audit messages and the messages are all in JSON format with the content not sorted i.e:

Message 1:

{ “username”: “bonzo”, “location”: “London”, “subject”: “mathematics”, “activityTimestamp”: “2017–02–01:21:32:00Z”}

Message 2:

{ “username”: “iyabo”, , “subject”: “english literature”, “activityTimestamp”: “2017–05–09:06:03:09Z”, “location”: “Abeokuta”}

A project to replace this legacy system has been initiated and will use Apache Kafka as a platform to produce and consume these audit events. Kafka is an open-source platform and if you stick with the free version rather than subscribe to the services provided by Confluent , the parent company, then you have to play at a lot with this beauty called Kafka. If you want to know more about Kafka, simply visit Confluent.io (don’t click away now!)

The legacy system produces audit messages in JSON format hence the new system being will use the same message protocol. The real challenge is to ensure that the new system produces exactly the same message(s) generated by the legacy system so that when legacy system is switched-off, no audit message is missing. In short, like for like system.

The key question that must be answered is this:

“Do we have all messages?”

If yes, then let’s get some messages randomly from legacy systems and confirm our new system has the exact message.

To verify massive data (BigData), you use either Stir and Compare(Sampling) or Minus queries. In simple terms, Stir and Compare, takes a sample from source data and checks if that chunk of data is present in the target data. Simple? This strategy has limitations as we are talking about big data — it won’t fit in Excel spreadsheet (over 1 Million rows!).

As for Minus Query strategy, it’s simply using machines to do the job. Run a query on source data and target data, then subtract one result-set from the other, the outcome is the set of differences between the two data sources. Please note the queries may be in different query languages (SQL/HQL — data people you understand).

Back to the task.

You can go about setting up complex tools and running scripts and any other thing that makes you feel you’re using technology to solve the problem.

Hold on, there’s a simple solution: Apache Spark. You can checkout a quick overview and quick start on Spark here and the PySpark flavour can be found here.

Let’s assume you’re up and running with PySpark, then you need to load the data you need to compare. Let’s call the 2 sets of data, primary and secondary. Primary being the source data(legacy) and Secondary the new source of data.

In Spark, we use RDD(Resilient Distributed Datasets) to hold the data and in this case I will use rdd1 and rdd2.

Tried Selenium in Python or Scala before?

Selenium comes in many flavours – really programming languages, and my recent encounter with Python pushed me in the direction of trying what I had developed in Java and C# in Python. Due to the nature of my job and projects, I get the nudge to learn new programming languages and that has afforded me the opportunity to embrace many of them like Java, C#, Python, Groovy, Javascript, PHP, Ruby and more recently Scala. I am currently reading the Scala in Action book and will attempt to create a Selenium framework in Scala. Before I do that, I will go the way of Python for basic setup and a quick test in Selenium.

First and foremost, let talk about the development environment. I love Macs! They make my life easy and cool. I will be setting up the Selenium (WebDriver) framework using GIT as my source control management tool.

Tools

  1. Terminal
  2. Homebrew
  3. Pip
  4. GIT
  5. Python IDE : PyCharm, Aptana, Eclipse – the choice is yours. PyCharm is my choice.
  6. Jenkins

Setup

On your Mac, open a terminal session, type in brew. You should get some error message if Homebrew is not already installed. If Homebrew is not installed, fire up your browser and rush down to BREW to get your Mac brewed. Follow the steps and you should have that lovely tool up and running on your machine. Once installed, run the command: brew update.
It should inform you your machine has nothing installed via Brew…or something similar.
That’s your first taste of brew (I don’t drink by the way).
To keep your machine always upto date, I have a very basic BASH script in GitHub. It does the job.

Next on the agenda is getting the IDE installed. That’s simple, in your browser, go to JetBrains. Download the community version and you should be fine. Once installed, move to the next step.

Back to the terminal, let’s install Pip, a Python package manager.