Lesson Plan: Chapter 10

Connecting to CSTA Standards

GradesConceptSubconceptStandard NumberPractice
6-8Data & AnalysisCollection Visualization & Transformation2-DA-08Testing and Refining Computational Artifacts: 6.3

Collect data using computational tools and transform the data to make it more useful and reliable.

As students continue to build on their ability to organize and present data visually to support a claim, they will need to understand when and how to transform data for this purpose. Students should transform data to remove errors, highlight or expose relationships, and/or make it easier for computers to process. The cleaning of data is an important transformation for ensuring consistent format and reducing noise and errors (e.g., removing irrelevant responses in a survey). An example of a transformation that highlights a relationship is representing males and females as percentages of a whole instead of as individual counts.


Learning Outcomes/Goals

In this chapter, we take a sometimes tedious task - cleaning data - and turn it into a moment to learn about the very nature of data. Students can use Python code with the pandas library to comb through their dataset for inconsistencies. Students ages 13 and over can use Kaggle.com, a site where they can experiment with online editable notebooks, the tools of the data scientist, to work with a dataset about stones. Kaggle is an excellent learning environment with challenges and educational opportunities.


Differentiated Instruction

Lower level studentsHigher level students
Can create a notebook in Kaggle and follow the project recipe to clean the given datasetCan experiment with other datasets in Kaggle to try new data wrangling tasks

Transfer Learning

One of the first tasks of the data scientist, almost all raw data must be cleaned and shaped to reveal its 'secrets', or the meanings behind the relationships between various data points. Any dataset on Kaggle likely needs to be cleaned, so encourage experimentation with more heavy-handed or light-touch cleaning of the data of a given dataset.


Vocabulary

  • Data science: A field of inquiry that uses various methods to extract insights from data
  • Data visualization: The process of creating visual representations of data such as charts or graphs
  • pandas: The 'Python Data Analysis Library', pandas is a tool used to analyze data using the Python language

Assessment

Demonstrate knowledge of how to visually represent data

FormativeSummative
Research how the various methods to make 'pictures of data' or display data visualizations in a software program such as a notebook or in ExcelWrite a summary of the various methods that a data scientists uses to visualize data, and the lessons that can be extrapolated from these visualizations

Quiz Answers

Why would you need to clean your data?

a. Flawed data guarantees flawed results

b. Decisions made by analyzing data should be based on high-quality datasets

c. Both of these

What type of errors might need to be removed from data you collect?

a. Missing data

b. Obsolete data

c. Foreign data

What is data visualization used to do?

a. Create memes

b. Build GIFs and JPEGs

c. Learn more about your data by looking at it as a visual artifact


More Resources/Materials


Solution Code

The 500 Polar Rocks datasetopen in new window

The Kaggle Notebookopen in new window


Assignment and Rubric: Data Tidy-Up

Making data usable for analysis is a critical part of data science. Practice transforming data to remove errors, highlighting or exposing relationships, and making it more useful for analysis. Go to Kaggle and pick a dataset that you’d like to tidy up, as you practiced in this lesson. Pick four things to clean in your data and build a notebook on Kaggle that shows your work as you clean the data.

ExemplaryAdequateNeeds Improvement
The student picks a dataset from Kaggle and cleans it, explaining in a Kaggle notebook the process they follow and the lessons they learn about the datasetThe student creates a notebook and only partially cleans and explains the dataThe student cleans data without explaining lessons learned from the process

*tip: prior to saving as a PDF, select the 'light' mode at the top using the 'sun' icon.