Cleaning Twitter location data

I came across an interesting data pre-processing problem while I was looking at Pfizer vaccination tweets data from Kaggle. In this dataset, there are (1) hashtags (2) irrelevant words like “Your Bed”, “Global” and (3) inconsistent location format issue such as (country), (state, country), (country, country), (country code). Here’s a snippet of how the data looks like: