Capstone Project_The Battle of Neighborhood business in London

capstone_london_darwish

The Battle of Neighborhood — Arabic Coffee Shop in London

By: Ahmed B Darwish

abdarwish@outlook.com

1. Introduction

1.1 Background

As one of the largest English cities, London has one of the most ethnically diverse population in the world. It is considered to the world’s cultural capital with a diverse range of cultures and people. Even though English is the official Language in London, there are over 300 languages spoken in the city. According to the 2011 census, over 36.7% of the London residents or 2,998,254 people are foreign-born, which is the second largest population of immigrants in the world right behind New York. Some of the largest ethnic groups in the city include Arabs, Chinese, Bangladeshis, Pakistanis, Indians, and Africans.

Because of the multitude of civilizations, London attracts investors from all over the world and it is one of the best investment cities in the world.

1.2 Problem

My friend asked me to help him exploring London city and choose the best location for his business as he is going to open an Arabic Coffee shop in London. He wants to provide authentic Arabic coffee to Arab and non-Arab people, so that the community knows the ancient original Arab culture, and he believes that there is no much competition in this field.

Arabic coffee is a version of the brewed coffee of Coffea arabica beans. Most Arab countries throughout the Middle East have developed distinct methods for brewing and preparing coffee. Cardamom is an often-added spice, but it can alternatively be served plain.

It was agreed that the location of the business should be close to the main most popular venues and neighborhoods in London, also the presence of Arab and Turkish restaurants would be an advantage.

In this project, we will Scrap London Boroughs, Neighborhoods and Postcode data from Wikipedia, then using geopy library to convert postcodes addresses into their equivalent latitude and longitude values.

After that, we will get the top 10 venues for each Neighborhood using Foursquare API, and using clustering K-Means algorithm to group the neighborhoods into clusters. We will also visualize the neighborhoods in London City and their emerging clusters using Folium library.

Finally, we will choose the best location for the business based on the criteria mentioned above.

2. Data acquisition and cleaning

2.1 Data sources

This project will rely on public data from Wikipedia which has the list of area of London with the postcode. And from Foursquare API to get the most common venues for each neighborhood. This data will help us exploring London neighborhoods and find the best location for the Arabic coffee shop business.

2.1 Data Cleaning

Data downloaded or scraped from multiple sources were combined into one table. There were many unwanted columns and raw’s that has been cleared using pandas library

3.0 Methodology

3.1 Analysis methodology

To find the best suitable location for this business we will follow below Methodologies:

  1. Scraping list of area (neighborhoods) in London with Postcodes from Wikipedia
  2. Get the location latitude and longitude for each neighborhood in London
  3. Select the borough that has maximum number of neighborhoods. “Barnet”
  4. Highlight top 10 Neighborhoods in Barnet based on number of venues.
  5. Get the top ten most venues for each neighborhood in Barnet.
  6. Cluster all Neighborhoods in Barnet based on their location.
  7. Select the cluster that has Middle Eastern Restaurants.
  8. Nominate the neighborhood that is among the top ten Neighborhoods in Barnet

4.0 Exploratory Data Analysis

4.1 Scrapping data

To find the best suitable location we will follow below Methodologies:

In [5]:
# Lets first install main Libraries
import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup
In [2]:
# Scrapping london Data from wikipedia source link: (https://en.wikipedia.org/wiki/List_of_areas_of_London)
 
url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
s = requests.Session()
response = s.get(url, timeout=10)
response
# If the request is successful, then reponse output = '200'.
Out[2]:
<Response [200]>
In [6]:
# scrape the request response to HTML
soup = BeautifulSoup(response.content, 'html.parser')

# to view the content in html format
pretty_soup = soup.prettify()
In [7]:
# getting Wikipedia page title
soup.title.string
Out[7]:
'List of areas of London - Wikipedia'
In [8]:
# find all the tables in the wikipedia link
all_tables=soup.find_all('table')

# get right table to scrap
right_table=soup.find('table', {"class":'wikitable sortable'})
In [9]:
# Number of columns in the table
for row in right_table.findAll("tr"):
    cells = row.findAll('td')

len(cells)
Out[9]:
6
In [10]:
# number of rows in the table including header
rows = right_table.findAll("tr")
len(rows)
Out[10]:
533
In [11]:
# header attributes of the table
header = [th.text.rstrip() for th in rows[0].find_all('th')]
print(header)
print(len(header))
['Location', 'London\xa0borough', 'Post town', 'Postcode\xa0district', 'Dial\xa0code', 'OS grid ref']
6
In [12]:
# Getting the table data 
lst_data = []
for row in rows[1:]:
            data = [d.text.rstrip() for d in row.find_all('td')]
            lst_data.append(data)
In [13]:
# Convert the data into pandas dataframe
df = pd.DataFrame(lst_data)
df
Out[13]:
0 1 2 3 4 5
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020 TQ465785
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020 TQ205805
2 Addington Croydon[8] CROYDON CR0 020 TQ375645
3 Addiscombe Croydon[8] CROYDON CR0 020 TQ345665
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020 TQ478728
... ... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020 TQ435795
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020 TQ225655
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020 TQ225815
530 Yeading Hillingdon HAYES UB4 020 TQ115825
531 Yiewsley Hillingdon WEST DRAYTON UB7 020 TQ063804

532 rows × 6 columns

In [14]:
# Adding Headr information to the df
df.columns = header
df.head()
Out[14]:
Location London borough Post town Postcode district Dial code OS grid ref
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020 TQ465785
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020 TQ205805
2 Addington Croydon[8] CROYDON CR0 020 TQ375645
3 Addiscombe Croydon[8] CROYDON CR0 020 TQ345665
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020 TQ478728
In [15]:
# Renaming Column Names
df.rename(columns={"Location": "neighbourhood "})
Out[15]:
neighbourhood London borough Post town Postcode district Dial code OS grid ref
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020 TQ465785
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020 TQ205805
2 Addington Croydon[8] CROYDON CR0 020 TQ375645
3 Addiscombe Croydon[8] CROYDON CR0 020 TQ345665
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020 TQ478728
... ... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020 TQ435795
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020 TQ225655
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020 TQ225815
530 Yeading Hillingdon HAYES UB4 020 TQ115825
531 Yiewsley Hillingdon WEST DRAYTON UB7 020 TQ063804

532 rows × 6 columns

In [16]:
df.rename(columns={"London borough": "borough "})
df = df.drop(['OS grid ref'], axis=1)
Out[16]:
Location London borough Post town Postcode district Dial code OS grid ref
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020 TQ465785
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020 TQ205805
2 Addington Croydon[8] CROYDON CR0 020 TQ375645
3 Addiscombe Croydon[8] CROYDON CR0 020 TQ345665
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020 TQ478728
... ... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020 TQ435795
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020 TQ225655
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020 TQ225815
530 Yeading Hillingdon HAYES UB4 020 TQ115825
531 Yiewsley Hillingdon WEST DRAYTON UB7 020 TQ063804

532 rows × 6 columns

In [20]:
df
Out[20]:
neighbourhood London borough Post town Postcode district Dial code
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020
2 Addington Croydon[8] CROYDON CR0 020
3 Addiscombe Croydon[8] CROYDON CR0 020
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020
... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020
530 Yeading Hillingdon HAYES UB4 020
531 Yiewsley Hillingdon WEST DRAYTON UB7 020

532 rows × 5 columns

In [21]:
# Removing space everywhere
df.columns = df.columns.str.replace(' ', '_')
df
Out[21]:
neighbourhood London borough Post_town Postcode district Dial code
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020
2 Addington Croydon[8] CROYDON CR0 020
3 Addiscombe Croydon[8] CROYDON CR0 020
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020
... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020
530 Yeading Hillingdon HAYES UB4 020
531 Yiewsley Hillingdon WEST DRAYTON UB7 020

532 rows × 5 columns

In [22]:
df.columns = [x.strip().replace(' ', '_') for x in df.columns]
df
Out[22]:
neighbourhood London borough Post_town Postcode district Dial code
0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 020
1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 020
2 Addington Croydon[8] CROYDON CR0 020
3 Addiscombe Croydon[8] CROYDON CR0 020
4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 020
... ... ... ... ... ...
527 Woolwich Greenwich LONDON SE18 020
528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 020
529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 020
530 Yeading Hillingdon HAYES UB4 020
531 Yiewsley Hillingdon WEST DRAYTON UB7 020

532 rows × 5 columns

In [26]:
df1
Out[26]:
Unnamed: 0 neighbourhood London_borough Post_town Postcode_district Dial_code
0 0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 20
1 1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 20
2 2 Addington Croydon[8] CROYDON CR0 20
3 3 Addiscombe Croydon[8] CROYDON CR0 20
4 4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 20
... ... ... ... ... ... ...
527 527 Woolwich Greenwich LONDON SE18 20
528 528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 20
529 529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 20
530 530 Yeading Hillingdon HAYES UB4 20
531 531 Yiewsley Hillingdon WEST DRAYTON UB7 20

532 rows × 6 columns

In [27]:
# Installaing required Packages
!pip -q install geopy
!pip -q install geocoder
WARNING: You are using pip version 20.1.1; however, version 20.2.4 is available.
You should consider upgrading via the 'c:\users\abdar\appdata\local\programs\python\python37-32\python.exe -m pip install --upgrade pip' command.
WARNING: You are using pip version 20.1.1; however, version 20.2.4 is available.
You should consider upgrading via the 'c:\users\abdar\appdata\local\programs\python\python37-32\python.exe -m pip install --upgrade pip' command.
In [28]:
#importing required libraries
import geocoder
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 
In [29]:
# Geocoder Function 
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Geocoder ends here # Thanks to Mr. Dayo John
In [40]:
df2 = df1
In [31]:
import time
start = time.time()
postal_codes = df2['Postcode_district']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]
end = time.time()
print("Time of execution: ", end - start, "seconds")
Time of execution:  481.7160575389862 seconds
In [37]:
# Adding Lat / long information corresponding to each Post_Code
df2 = df1
df2_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df2['Latitude'] = df2_coordinates['Latitude']
df2['Longitude'] = df2_coordinates['Longitude']
In [34]:
# define an instance of the geocoder
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="London_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))
The geograpical coordinate of London City are 51.5073219, -0.1276474.
In [168]:
df2
Out[168]:
Unnamed: 0 neighbourhood London_borough Post_town Postcode_district Dial_code Latitude Longitude
0 0 Abbey Wood Bexley, Greenwich [7] LONDON SE2 20 51.492450 0.121270
1 1 Acton Ealing, Hammersmith and Fulham[8] LONDON W3, W4 20 51.513240 -0.267460
2 2 Addington Croydon[8] CROYDON CR0 20 51.384755 -0.051498
3 3 Addiscombe Croydon[8] CROYDON CR0 20 51.384755 -0.051498
4 4 Albany Park Bexley BEXLEY, SIDCUP DA5, DA14 20 51.452068 0.172207
... ... ... ... ... ... ... ... ...
527 527 Woolwich Greenwich LONDON SE18 20 51.482070 0.071430
528 528 Worcester Park Sutton, Kingston upon Thames WORCESTER PARK KT4 20 51.506420 -0.127210
529 529 Wormwood Scrubs Hammersmith and Fulham LONDON W12 20 51.506450 -0.236910
530 530 Yeading Hillingdon HAYES UB4 20 51.506420 -0.127210
531 531 Yiewsley Hillingdon WEST DRAYTON UB7 20 51.506420 -0.127210

532 rows × 8 columns

In [35]:
# create map of london using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, London_borough, neighbourhood in zip(df2['Latitude'], df2['Longitude'], df2['London_borough'], df2['neighbourhood']):
    label = '{}, {}'.format(neighbourhood, London_borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork
Out[35]:
Make this Notebook Trusted to load map: File -> Trust Notebook