Monday, October 10, 2016
Beloved musicians of 2000 still working it *snap snap* today?
For this case study, ‘working it’ is defined as the artist securing a spot on the Billboard Top 100. To determine if they made the cut we must first import the data. I worked with a csv file with all data from the year 2000. It is important in your research to make sure you’re working with relevant data AND that it is good data.
Seems like a no-brainer, but we wouldn’t want to import data from 1999. (At least not for this project.) Once you’ve correctly imported relevant data it is time to explore. Exploring your data is an important first step. You want to get comfortable in the data and know what exactly you have to work with.
A great way to think about this: what are some quick snapshots you can grab to get the big picture quickly? My go-to commands for this are .describe, .head(), .tail(), .columns. Pulling these first gives me a good idea of what is contained in the data—especially if it is a massive dataset. When you glance at the head or tail, you start to form an idea about the meat of the data without being overwhelmed.
Once you have an idea of what’s there it’s time to find out what IS there and shouldn’t be, as well as what is missing. This step is called cleaning your data. Let’s face it—nobody likes dirty data. It includes duplicate records, incomplete or outdated data, and the improper parsing of record fields. By assessing the dirty data upfront you will save yourself time and improper outputs. To help you identify it, here are some examples of dirty data:
No boss or business is going to be happy losing money. Period. You will never regret the extra time it takes to clean your data.
The next step is to visualize your data. Again, this is important where the boss is concerned. Let’s be honest—not many people like to look at number as much as we do. BUT, if we can’t make our insights and findings visually stimulating then you risk losing stakeholders interest. That would be a shame, because the data don’t lie—what you’ve found is important and needs to be conveyed.
Visualizations are a great way to do that. Software like Tableau is great at taking spreadsheets and turning them into pretty, pretty pictures.
An added bonus is the time these visualizations save. They help the brain to quickly grasp the big picture thus saving time and money. #HappyBoss
The next step is to create a problem statement. This is where you can let your curiosity and nosiness flow freely. After I looked at our initial data, I wondered, “Hmmm I recognize some of these names. I wonder if any of them are still kicking today? How many of them just have sad Instagram accounts?
I’ve found once you are comfortable with the data and the data is clean it is much easier to deep dive and find insights that will affect change within your project. Once there’s a good foundation you’re free to follow your questions and afterwards make some bomb visualizations to illustrate what you found.
I’d love to stay and chat more—but it’s time to find out if Limp Bizkit has an Instagram account.
--Megan
Subscribe to:
Posts (Atom)