Lesson Plan: Chapter 10
Connecting to CSTA Standards
|6-8||Data & Analysis||Collection Visualization & Transformation||2-DA-08||Testing and Refining Computational Artifacts: 6.3|
Collect data using computational tools and transform the data to make it more useful and reliable.
As students continue to build on their ability to organize and present data visually to support a claim, they will need to understand when and how to transform data for this purpose. Students should transform data to remove errors, highlight or expose relationships, and/or make it easier for computers to process. The cleaning of data is an important transformation for ensuring consistent format and reducing noise and errors (e.g., removing irrelevant responses in a survey). An example of a transformation that highlights a relationship is representing males and females as percentages of a whole instead of as individual counts.
In this chapter, we take a sometimes tedious task - cleaning data - and turn it into a moment to learn about the very nature of data. Students can use Python code with the pandas library to comb through their dataset for inconsistencies. Students ages 13 and over can use Kaggle.com, a site where they can experiment with online editable notebooks, the tools of the data scientist, to work with a dataset about stones. Kaggle is an excellent learning environment with challenges and educational opportunities.
|Lower level students||Higher level students|
|Can create a notebook in Kaggle and follow the project recipe to clean the given dataset||Can experiment with other datasets in Kaggle to try new data wrangling tasks|
One of the first tasks of the data scientist, almost all raw data must be cleaned and shaped to reveal its 'secrets', or the meanings behind the relationships between various data points. Any dataset on Kaggle likely needs to be cleaned, so encourage experimentation with more heavy-handed or light-touch cleaning of the data of a given dataset.
- Data science: A field of inquiry that uses various methods to extract insights from data
- Data visualization: The process of creating visual representations of data such as charts or graphs
- pandas: The 'Python Data Analysis Library', pandas is a tool used to analyze data using the Python language
Demonstrate knowledge of how to visually represent data
|Research how the various methods to make 'pictures of data' or display data visualizations in a software program such as a notebook or in Excel||Write a summary of the various methods that a data scientists uses to visualize data, and the lessons that can be extrapolated from these visualizations|
Why would you need to clean your data?
a. Flawed data guarantees flawed results
b. Decisions made by analyzing data should be based on high-quality datasets
c. Both of these
What type of errors might need to be removed from data you collect?
a. Missing data
b. Obsolete data
c. Foreign data
What is data visualization used to do?
a. Create memes
b. Build GIFs and JPEGs
c. Learn more about your data by looking at it as a visual artifact
- Beginner's Guide to Kaggle
- Data Visualization Lessons
- Examples of beautiful DataVis
- Data Visualization lessons on LinkedIn Learning
The Kaggle Notebook
Assignment and Rubric: Data Tidy-Up
Making data usable for analysis is a critical part of data science. Practice transforming data to remove errors, highlighting or exposing relationships, and making it more useful for analysis. Go to Kaggle and pick a dataset that you’d like to tidy up, as you practiced in this lesson. Pick four things to clean in your data and build a notebook on Kaggle that shows your work as you clean the data.
|The student picks a dataset from Kaggle and cleans it, explaining in a Kaggle notebook the process they follow and the lessons they learn about the dataset||The student creates a notebook and only partially cleans and explains the data||The student cleans data without explaining lessons learned from the process|
*tip: prior to saving as a PDF, select the 'light' mode at the top using the 'sun' icon.