IBM Data Science Capstone — “The Battle of Neighborhoods”

Introduction

https://fineartamerica.com/featured/dusk-in-the-city-of-lights-paris-france-sydspics-photography.html
  • Cafes and brasseries
  • Plazas and gardens
  • Art Museums
  • French Restaurants and wine bars

Data

Geo-Coordinate Data: Republic of France Open Platform Public Data

  1. Map the Arrondissements’ of Paris with geo-location data
  2. Call venue information from Foursquare within the city of Paris
  3. Bind the venues to their respective arrondissement from results of Step 1
  4. Utilize K-Means Clustering to find groups of arrondissements that share similarities but are not explicitly labeled as similar
  5. Identify which cluster(s) we can interpret to lend themselves to the lifestyle outlined above
  6. Explore the narrowed selection of clusters further to compare them in detail to find the arrondissement(s) with the best fit by utilizing Python’s Matplotlib Library

Methodology

  • First, we’ll import all necessary Python Libraries that will be need to collect, sparse, and analyze the data. Then, we’ll collect the Paris Arrondissement data from the JSON converted file and plot on a map.
Import Python Libraries for Data Analysis in a Jupyter Notebook
Convert JSON file to a CSV file and read to Pandas data frame — Displays information for each Arrondissement of Paris
SQL workbook example
Using geo-coordinates and CSV data, create a map of Paris and mark each Arrondissement
  • Using the Foursquare API we will call all venues in Paris and return their name, location, and category.
API call to request venue information
  • Utilizing a Pandas data frame we can build the below table concatenating each venue to it’s respective arrondissement within the data frame.
Code to build data frame with previously called Foursquare data
Data frame header (first 30 rows)
  • Lets pause here and do a quick check to see how many many venues have been returned for each arrondissement:
  • We will create a data frame that shows the top 10 most common venues in each arrondissement:
Data frame code
Resulting data frame
  • Based on all the information collected and parsed for Paris and its venues we have sufficient data to build our model. First, employing K-Means Clustering to group arrondissements together based on similar venue categories, checking if we can find a couple strong suggestions based on our parameters. We’ll then present our observations and findings utilizing Python’s Matplotlib library to granularly compare our clustered arrondissements. With this data, we will make a recommendation for Maya’s stay.
  • We’ll code and execute a K-Means Clustering model below:
K-Means Machine Learning cluster algorithm and data frame header results
  • Now, we’ll map our clusters and color coordinate them to visualize similar arrondissements and their locations throughout Paris.
Resulting clusters mapped for visualization

Results

Exploring the results, we can see that Cluster 4 has the most promising arrondissements based on finding an arrondissement with a high frequency of the following:

  • Cafes and Brasseries
  • Plazas and Gardens
  • Art Museums
  • French Restaurants and Wine Bars
  • Cocktail Bars and Bakeries
Cluster 4 detail
  • Below we will clean the data and remove details and venues that are of less importance and interest.
Our new data frame shows the results most pertinent to meeting our client’s expectations
  • Next, we’ll create a visual to evaluate the arrondissements in the cluster with more detail via a stacked bar chart:
Stacked bar chart showing the frequency of specified venues in each arrondissement that were previously identified in cluster 4.
  • Arrondissement 4, or the “Hotel de-Ville” neighborhood of Paris appears to have the most of what Maya would like for her trip to live like a Parisian. Let’s see what the data can tell us about how many of each venue is currently in this arrondissement:

Discussion

We can see above that the 4th Arrondissement has a wide selection of all parameters from our original search criteria. Based on the above work, we will recommend this arrondissement for Maya’s stay. This area of Paris affords itself to a wide variety of venue options that are most important to Maya.

  • For a sense of scope we’ll create a pie chart to aid in visualization of the proportion of each venue type within the neighborhood.

Conclusion

As a result of our analysis, we were able to identify 7 main neighborhoods or arrondissements which are suitable Parisian accommodations for Maya. As we dug in further, we uncovered that Arrondissement 4 checks off all of her preference boxes and will provide an ideal area for her dream Paris vacation.

References and Resources

  1. IBM Data Science Professional Certificate (https://www.coursera.org/professional-certificates/ibm-data-science)
  2. French Arrondissements JSON (https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e)
  3. Foursquare Developer Documentation (https://developer.foursquare.com/)
  4. This project utilized IBM Cloud technologies and services such as IBM Watson Studio and Cloud Object Storage (https://www.ibm.com/cloud)
  5. Jupyter Notebooks (https://jupyter.org/)
  6. GitHub Repository (https://github.com/flutieflakes/Coursera_Capstone)
  7. Python programming language (https://www.python.org/)
  • Pandas — Library for Data Analysis
  • NumPy — Library to handle data in a vectorized manner
  • JSON — Library to handle JSON files
  • Geopy — To retrieve location data
  • Requests — Library to handle http requests
  • Matplotlib — Python Plotting Module
  • Sklearn — Python Machine Learning Library
  • Folium — Map Rendering Library

Assignment Requirements

From IBM Data Science Final Assessment:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store