By running a custom-built Elasticsearch on AWS, you have to do everything on the console. AWS has its Elasticsearch offering but I had this project handed over to me and it’s running an old instance of Elasticsearch before AWS had its own.
Data pollution is a common problem and you have to know exactly what to do to ensure effective cleansing of such data when it happens. So, I had a case of polluted data that if not treated will put my client in a very bad state – such that the customers can sue my client. First and foremost, the data pollution was not my fault. With that out of the way, I had to trace the journey of the data to identify the source of the pollution. Let me describe the system a bit, so you get the picture. The infrastructure has 4 main components. The first component is a system that generates CSV files based on user searches. The second component inserts each user search field value in a database(sort of). The third component picks up the generated CSV files, populates an instance of Elasticsearch and deletes the CSV file after 3 hours in which case 2 other new files have been added to the CSV repository.
# # Elasticsearch Monitoring
# Cluster Health
# Green: excellent
# Yellow: one replica is missing
# Red: at least one primary shard is down
curl -X GET http://localhost:9200/_cluster/health | python -m json.tool
curl -X GET http://${ip_address}:9200/_cluster/health | python -m json.tool
# Specific Cluster Health
curl -XGET http://localhost:9200/_cluster/health?level=indices | python -m json.tool
# Check Status via colours - green, yellow, red
curl -XGET http://localhost:9200/_cluster/health?wait_for_status=green | python -m json.tool
# Shard level
curl -XGET http://localhost:9200/_cluster/health?level=shards | python -m json.tool
curl -XGET http://localhost:9200/_all/_stats | python -m json.tool
# Bikes
curl -XGET http://localhost:9200/bike_deals/_stats | python -m json.tool
# Cars
curl -XGET http://localhost:9200/car_deals/_stats | python -m json.tool
# Multiple indices check
curl -XGET http://localhost:9200/bike_deals,car_deals/_stats | python -m json.tool
# Check Nodes
curl -XGET http://localhost:9200/_nodes/_stats | python -m json.tool
# DELETE all deals on specific index on Elastic
curl -XDELETE 'http://localhost:9200/bike_deals/?pretty=true' | python -m json.tool #powerful! Be careful!!!!
curl -XDELETE 'http://localhost:9200/bike_deals/_query' -d '{ "query" : { "match_all" : {} } }' | python -m json.tool
curl -XDELETE 'http://localhost:9200/car_deals/_query' -d '{ "query" : { "match_all" : {} } }'