The data for this analysis is a list of the recognized moons of the planets and of the largest potential dwarf planets of the Solar System. Data is available at: https://en.wikipedia.org/wiki/List_of_natural_satellites#List . It involves tasks dealing in data processing and analysis in Python by performing the data processing activities from start to finish.
Key tasks performed:
-
Imported the data from the list on the website into a DataFrame and printed the first ten rows.
-
Performed the following data cleaning/validation tasks: a. Renamed (name) the columns with clear names that someone unfamiliar with the data set would understand the meaning of the column. b. Reordered the columns in a way that makes the DataFrame easy to read/understand. c. Set an index for the DataFrame using an appropriate column or set of columns, such that each observation in the data set can be identified. d. Chose three columns to clean. Some ideas were : converting the column's data type, extracting bad characters, etc.
-
Created a research question that can be answered using the data. Then, performed the appropriate analysis needed to address the question.