Data Analysis Programming

0 comments

Python Pandas Reference Resources

Consult these learning and reference resources as needed. Of course you may also search the wider web, but the pandas documentation is excellent and worth searching first.

Python Pandas Udemy Course

This is still the best all-around introduction to pandas attributes and methods, with great explanations and examples to illustrate. But now you can start using it topically by browsing the video titles to find what you need.

Reference Resources from the Pandas Documentation

These reference resources are well organized, with convenient options for searching and scanning relevant attributes and methods. The examples are brief, but they provide a good reference and reminder as you become more confident.

3.1 Project: NBA Data Prep plus Calculated Fields

Overview

We have been tasked to prepare the NBA data set for analysis. Specifically, we have been asked to create two new columns as calculated fields: Height_Inches and BMI (Body Mass Index).

Required Learning Resources

Data Analysis with Pandas and Python – Udemy Course

https://www.udemy.com/course/data-analysis-with-pandas/learn (Links to an external site.)

Required Videos:

  • 60. Drop DataFrame Rows with Null Values
  • 62. Convert DataFrame Column Types with the astype Method
  • 97. More DataFrame String Methods — strip, lstrip, and rstrip
  • 99. Split Strings by Characters with the str.split Method
  • Other videos as needed

Other Sources to Consult as Needed

These reference resources are well organized, with convenient options for searching and scanning relevant attributes and methods. The examples are brief, but they provide a good reference and reminder as you become more confident.

Project Guide

Download the project guide:

3.1 NBA Data Prep plus Calculated Fields.docx

Video Tips

Below are a series of short videos to provide help with specific parts of this project. As always, feel free to do as much as you can on your own, and then consult my videos as needed to help you over a hump.

If you’re still stuck with something after watching my videos — that can happen! Please share your case in our project discussion board.

Creating a Table of Contents for your Jupyter Notebook

I’d like you to start using markdown cells to provide useful detail regarding your projects. Here’s a quick tip on setting up a table of contents at the top of a notebook — something I’d like you to do from now on.

https://www.loom.com/share/d1983a7e2b394b93ac41ba0a7bf5ae7b

Reviewing Data with .info() and .describe()

In this 7-minute video, I highlight a few things I look for — data types, null values, summary statistics — when loading and reviewing data. Toward the end of the video, I discuss the custom code snippet I use to make the statistical summary from .describe() easier and quicker to read.

https://www.loom.com/share/22c6ed0059634cd4b33f5b7019d00190

Dropping Useless Records and Reviewing the Results

In this very quick video, I briefly review the .dropna() parameters we needed for this step, and I review the updated data set.

https://www.loom.com/share/b693c85aebe646df9d4f93e9c989babe

Converting Jersey Numbers to Strings

There’s a little challenge in this process. I walk through an efficient way to tackle it.

https://www.loom.com/share/7b5fd95c6e7e4a00b00a98e565d261d8

Part 2 of the above: Replacing the Current Number Column with the New Data Type

https://www.loom.com/share/6e76fbaa02cd44318549d5413fd6942b

Create the Height_Inches Column

This is one of the thornier problems. I’ll walk you through it. Be sure to consult and learn about the .str.split()method!

https://www.loom.com/share/33a5140a47ef448f8a8e29c7ce890b5c

Create the BMI Column

After what we just did, this step is pretty easy! We’ll use the Weight and Height_Inches columns to create a new BMI column.

https://www.loom.com/share/5f4f6b09682f4437ac4355dd24ab894d

Organize Columns

Here’s a handy way to reorganize the columns in your dataframe.

https://www.loom.com/share/fa0a266987f34e8287422bbe9c24b8fe

You handle the rest!

I’m confident you can do it. Be sure to consult your learning and reference resources as needed. (And as always, well-written web searches are your friend.)

3.1 Discussion

Use this discussion board to share tips, recommendations, and discussion regarding the assigned project(s).

To foster dialogue, I’m requiring everyone to post something — whether a question, a reflection, or feedback to other users. Here are recommended ideas to prompt your contributions:

  • Have you gotten stuck? You’re undoubtedly not alone. Please describe your problem and include screenshots if relevant.
  • Can you provide help in response to another student’s question? If so, please do!
  • Did you discover any helpful tips in this process? Please share it. Provide relevant screenshots, links, etc.

3.2 Notes: Descriptive Statistics Concepts

In our upcoming projects we’ll use the pandas .describe()method to see a statistical summary of the numerical columns in our data sets.

To prepare for that, I’d like you to load up on — or refresh your memory regarding — a few key statistical concepts. The insights we gain from these summary statistics can be important and helpful!

Project Guide

Download and follow the project guide:

Descriptive Statistics Concepts.docx

About the Author

Follow me


{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}