Monday, October 10, 2016
Beloved musicians of 2000 still working it *snap snap* today?
For this case study, ‘working it’ is defined as the artist securing a spot on the Billboard Top 100. To determine if they made the cut we must first import the data. I worked with a csv file with all data from the year 2000. It is important in your research to make sure you’re working with relevant data AND that it is good data. Seems like a no-brainer, but we wouldn’t want to import data from 1999. (At least not for this project.) Once you’ve correctly imported relevant data it is time to explore. Exploring your data is an important first step. You want to get comfortable in the data and know what exactly you have to work with. A great way to think about this: what are some quick snapshots you can grab to get the big picture quickly? My go-to commands for this are .describe, .head(), .tail(), .columns. Pulling these first gives me a good idea of what is contained in the data—especially if it is a massive dataset. When you glance at the head or tail, you start to form an idea about the meat of the data without being overwhelmed. Once you have an idea of what’s there it’s time to find out what IS there and shouldn’t be, as well as what is missing. This step is called cleaning your data. Let’s face it—nobody likes dirty data. It includes duplicate records, incomplete or outdated data, and the improper parsing of record fields. By assessing the dirty data upfront you will save yourself time and improper outputs. To help you identify it, here are some examples of dirty data: