Capstone Project_The Battle of Neighborhood business in London
1. Introduction¶
1.1 Background¶
As one of the largest English cities, London has one of the most ethnically diverse population in the world. It is considered to the world’s cultural capital with a diverse range of cultures and people. Even though English is the official Language in London, there are over 300 languages spoken in the city. According to the 2011 census, over 36.7% of the London residents or 2,998,254 people are foreign-born, which is the second largest population of immigrants in the world right behind New York. Some of the largest ethnic groups in the city include Arabs, Chinese, Bangladeshis, Pakistanis, Indians, and Africans.
Because of the multitude of civilizations, London attracts investors from all over the world and it is one of the best investment cities in the world.
1.2 Problem¶
My friend asked me to help him exploring London city and choose the best location for his business as he is going to open an Arabic Coffee shop in London. He wants to provide authentic Arabic coffee to Arab and non-Arab people, so that the community knows the ancient original Arab culture, and he believes that there is no much competition in this field.
Arabic coffee is a version of the brewed coffee of Coffea arabica beans. Most Arab countries throughout the Middle East have developed distinct methods for brewing and preparing coffee. Cardamom is an often-added spice, but it can alternatively be served plain.
It was agreed that the location of the business should be close to the main most popular venues and neighborhoods in London, also the presence of Arab and Turkish restaurants would be an advantage.
In this project, we will Scrap London Boroughs, Neighborhoods and Postcode data from Wikipedia, then using geopy library to convert postcodes addresses into their equivalent latitude and longitude values.
After that, we will get the top 10 venues for each Neighborhood using Foursquare API, and using clustering K-Means algorithm to group the neighborhoods into clusters. We will also visualize the neighborhoods in London City and their emerging clusters using Folium library.
Finally, we will choose the best location for the business based on the criteria mentioned above.
2. Data acquisition and cleaning¶
2.1 Data sources¶
This project will rely on public data from Wikipedia which has the list of area of London with the postcode. And from Foursquare API to get the most common venues for each neighborhood. This data will help us exploring London neighborhoods and find the best location for the Arabic coffee shop business.
2.1 Data Cleaning¶
Data downloaded or scraped from multiple sources were combined into one table. There were many unwanted columns and raw’s that has been cleared using pandas library
3.0 Methodology¶
3.1 Analysis methodology¶
To find the best suitable location for this business we will follow below Methodologies:
- Scraping list of area (neighborhoods) in London with Postcodes from Wikipedia
- Get the location latitude and longitude for each neighborhood in London
- Select the borough that has maximum number of neighborhoods. “Barnet”
- Highlight top 10 Neighborhoods in Barnet based on number of venues.
- Get the top ten most venues for each neighborhood in Barnet.
- Cluster all Neighborhoods in Barnet based on their location.
- Select the cluster that has Middle Eastern Restaurants.
- Nominate the neighborhood that is among the top ten Neighborhoods in Barnet
4.0 Exploratory Data Analysis¶
4.1 Scrapping data¶
To find the best suitable location we will follow below Methodologies:
- Scraping list of area (neighborhoods) in London with Postcodes from below Wikipedia link: https://en.wikipedia.org/wiki/List_of_areas_of_London
# Lets first install main Libraries
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup
# Scrapping london Data from wikipedia source link: (https://en.wikipedia.org/wiki/List_of_areas_of_London)
url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
s = requests.Session()
response = s.get(url, timeout=10)
response
# If the request is successful, then reponse output = '200'.
# scrape the request response to HTML
soup = BeautifulSoup(response.content, 'html.parser')
# to view the content in html format
pretty_soup = soup.prettify()
# getting Wikipedia page title
soup.title.string
# find all the tables in the wikipedia link
all_tables=soup.find_all('table')
# get right table to scrap
right_table=soup.find('table', {"class":'wikitable sortable'})
# Number of columns in the table
for row in right_table.findAll("tr"):
cells = row.findAll('td')
len(cells)
# number of rows in the table including header
rows = right_table.findAll("tr")
len(rows)
# header attributes of the table
header = [th.text.rstrip() for th in rows[0].find_all('th')]
print(header)
print(len(header))
# Getting the table data
lst_data = []
for row in rows[1:]:
data = [d.text.rstrip() for d in row.find_all('td')]
lst_data.append(data)
# Convert the data into pandas dataframe
df = pd.DataFrame(lst_data)
df
# Adding Headr information to the df
df.columns = header
df.head()
# Renaming Column Names
df.rename(columns={"Location": "neighbourhood "})
df.rename(columns={"London borough": "borough "})
df = df.drop(['OS grid ref'], axis=1)
df
# Removing space everywhere
df.columns = df.columns.str.replace(' ', '_')
df
df.columns = [x.strip().replace(' ', '_') for x in df.columns]
df
df1
# Installaing required Packages
!pip -q install geopy
!pip -q install geocoder
#importing required libraries
import geocoder
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
# Geocoder Function
def get_latlng(arcgis_geocoder):
# Initialize the Location (lat. and long.) to "None"
lat_lng_coords = None
# While loop helps to create a continous run until all the location coordinates are geocoded
while(lat_lng_coords is None):
g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
lat_lng_coords = g.latlng
return lat_lng_coords
# Geocoder ends here # Thanks to Mr. Dayo John
df2 = df1
import time
start = time.time()
postal_codes = df2['Postcode_district']
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]
end = time.time()
print("Time of execution: ", end - start, "seconds")
# Adding Lat / long information corresponding to each Post_Code
df2 = df1
df2_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df2['Latitude'] = df2_coordinates['Latitude']
df2['Longitude'] = df2_coordinates['Longitude']
# define an instance of the geocoder
address = 'London, United Kingdom'
geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))
df2
# create map of london using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, London_borough, neighbourhood in zip(df2['Latitude'], df2['Longitude'], df2['London_borough'], df2['neighbourhood']):
label = '{}, {}'.format(neighbourhood, London_borough)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_newyork)
map_newyork