ASSIGNMENT #1
Attached Files:
BAS120 – Assignment #1.docx BAS120 – Assignment #1.docx – Alternative Formats (26.731 KB)
Carvana_Data_Dictionary.txt (2.747 KB)
used car auction.xlsx (1,008.768 KB)
Using the attached dataset, complete the following tasks:
- Provide a thorough overview of each variable, along with possible values, any irregularities between values, and notes about formatting
- Write a robust list of questions and ideas about the dataset (at least 5)
- Tell 2 stories with a high level of detail, using certain rows in the dataset as the facts.
- Create 4 new columns correctly: Model – adj, Miles per year, Price difference, Vehicle Type
The data comes from a Kaggle competition, and you can read more about it here: https://www.kaggle.com/c/DontGetKicked. Essentially, the data consists of cars that were sold at auction, usually to Used Car dealerships. All of the variables other than “IsBadBuy” were captured at the moment of sale. Dealers would know the make/model/mileage/etc and use that information to make a bid. “VehCost” is what the winning bid was. The “IsBadBuy” variable was added later according to some logic that determined whether the car turned out to have some major issues. We don’t know exactly what the issues were, or how extensive, but they were enough to receive the flag (IsBadBuy = 1 means “Yes, it was a bad buy”; Only cars with a value of 1 are “kicked”). This boolean variable is equivalent to the Survived variable on the Titanic dataset, and we will do a lot of analysis on this in Weeks 4 and 5. It’s what we call a “target variable” in predictive modeling (also “response variable”) — basically, it’s what we’ll be trying to predict. That is, can we use all of the other values in a row to predict the IsBadBuy value, and how accurate are our predictions? (we won’t be making any predictions in this class, but we will explore how other variables affect the IsBadBuy variable).
0 comments