Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python

Basketball stats tool. Cleaning data.

Super lost on how to even start on cleaning data. line 11 # clean the player data without changing the original data

if __name__ == "__main__":

    import constants.py

    clean_data(PLAYERS)

    players_new_collection = []

    def clean_data(name, guardians, experience, height):
        # clean the player data without changing the original data
        # Data to be cleaned: Height: This should be saved as an integer. Experience: This should be saved as a boolean value (True or False)

        # save it to a new collection - build a new collection with what you have learned up to this point.

    # Create a balance_teams function:
        # Now that the player data has been cleaned, balance the players across the three teams: Panthers, Bandits and Warriors. Make sure the teams have the same number of total players on them when your team balancing function has finished.
        # HINT: To find out how many players should be on each team, divide the length of players by the number of teams. Ex: num_players_team = len(PLAYERS) / len(TEAMS)

    # Console readability matters: When the menu or stats display to the console, it should display in a nice readable format. Use extra spaces or line breaks ('\n') to break up lines if needed. For example, '\nThis will start on a newline.'

    # Displaying the stats: When displaying the selected teams' stats to the screen you will want to include:
    # Team's name as a string
    # Total players on that team as an integer
    # The player names as strings separated by commas

1 Answer

Jennifer Nordell
seal-mask
STAFF
.a{fill-rule:evenodd;}techdegree
Jennifer Nordell
Treehouse Teacher

Hi there, matthew mahoney! It's easy to feel lost on this, so don't worry. I have a feeling that many assign an overly complex version of what it means to "clean the data". Really, we just mean change the data.

One thing you will probably want to do here is to make a copy of the original data. Using copy would be fine for structures containing simple values like strings or integers, but in the case of a list of dictionaries, you will want to make a "deepcopy". Take a look at the Python documentation for deepcopy.

Let's take a simple example.

from copy import deepcopy

students = [
    { 
      "first_name": "Matthew",
      "treehouse_student": "Yes"
    },
    {
        "first_name": "Jennifer",
        "treehouse_student": True
    }
]

students_copy = deepcopy(students)

original_matthew = students[0]["treehouse_student"]
print("=" * 10)
print(f"Original value: {original_matthew}")

for student in students_copy:  # loop through the copy
    if student["treehouse_student"] == "Yes":  # if that student has treehouse_student set to the string "Yes"
        student["treehouse_student"] = True    # change it to True

matthew = students_copy[0]["treehouse_student"]
print(f"New value: {matthew}")

This starts with a couple of students in a list. Each "student" is a dict. We can do a deepcopy and loop through the copy and then change what the value at any given key is.

But here's the point of the whole thing. Imagine that the players listed in the players file weren't just 18 kids, but several hundred. Also imagine that we had sent out spreadsheets to many many schools across the counties. We told them to fill in the "experience" column with True and False, but when we get them back only a little over half the schools actually did that. The rest had some combination of "Y" and "N" or "YES" and "NO" or "yes" and "no". The point is to make the data consistent across the column even when the information may have come from multiple sources.

Hope this helps! :sparkles: