Author

Sources and descriptions of datasets

TPS MCI 2014 to 2018

Major Crime Indicators (MCI)

Toronto Police Service Public Safety Data Portal

“MCI_2014_to_2018.csv”

http://data.torontopolice.on.ca/pages/glossary:

For the most part, the statistics on the following pages use an incident-based counting method. Generally, each type of major crime that occurred during an incident will be counted. For example, if an assault and a break and enter took place in the same incident, they would be counted once in each category. Statistics Canada also presents incident-based crime statistics, but generally counts only the most serious offence per incident. Some other police services present their crime statistics using the offence-based method, which counts all offences in each incident. It is important to note these differences when comparing our crime statistics to those provided by Statistics Canada or by other police agencies.

Assault. The direct or indirect application of force to another person, or the attempt or threat to apply force to another person, without that person’s consent.

Robbery. The act of taking property from another person or business by the use of force or intimidation in the presence of the victim.

Break and Enter. The act of entering a place with the intent to commit an indictable offence therein.

Auto Theft. The act of taking or another person’s vehicle (not including attempts). Auto Theft figures represent the number of vehicles stolen.

Theft Over. The act of stealing property in excess of $5,000 (excluding auto theft).

City of Toronto Neighbourhood Profiles

Neighbourhood Profiles

Statistics Canada does not release data at the level of Toronto’s social planning neighbourhoods. Neighbourhood level data for 2016 are initially calculated by summing data for the Census Tracts which comprise each neighbourhood.

“neighbourhood-profiles-2016-csv.csv”

Loading, understanding and cleaning datasets

library(data.table) #fread, setcolorder, rbindlist
library(sp) #used by rgdal
library(rgdal) #readOGR
library(ggplot2) #fortify
library(plyr) #join
library(scales) #scale_fill_distiller
library(ggmap) #theme_nothing
library(rgeos) #gCentroid
library(forecast) #autoplot ts, auto.arima

MCI dataset and definitions of UCR codes

MCI_dt <- fread("MCI_2014_to_2018.csv")
str(MCI_dt)
Classes ‘data.table’ and 'data.frame':  167525 obs. of  29 variables:
 $ X                  : num  -79.3 -79.5 -79.5 -79.6 -79.5 ...
 $ Y                  : num  43.7 43.8 43.7 43.7 43.7 ...
 $ Index_             : int  214 215 216 217 218 219 220 221 222 223 ...
 $ event_unique_id    : chr  "GO-20141948968" "GO-20141950728" "GO-20141956416" "GO-20141956867" ...
 $ occurrencedate     : chr  "2014-04-24T11:29:00.000Z" "2014-04-24T13:00:00.000Z" "2014-04-25T13:20:00.000Z" "2014-04-24T17:00:00.000Z" ...
 $ reporteddate       : chr  "2014-04-24T12:46:00.000Z" "2014-04-24T15:58:00.000Z" "2014-04-25T13:52:00.000Z" "2014-04-25T10:30:00.000Z" ...
 $ premisetype        : chr  "Commercial" "House" "Apartment" "Outside" ...
 $ ucr_code           : int  1610 2120 1430 1430 1430 1430 1430 1420 1420 1420 ...
 $ ucr_ext            : int  200 200 100 100 100 100 100 100 100 100 ...
 $ offence            : chr  "Robbery - Mugging" "B&E" "Assault" "Assault" ...
 $ reportedyear       : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ reportedmonth      : chr  "April" "April" "April" "April" ...
 $ reportedday        : int  24 24 25 25 25 25 3 3 3 3 ...
 $ reporteddayofyear  : int  114 114 115 115 115 115 123 123 123 123 ...
 $ reporteddayofweek  : chr  "Thursday" "Thursday" "Friday" "Friday" ...
 $ reportedhour       : int  12 15 13 10 16 22 3 4 4 4 ...
 $ occurrenceyear     : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ occurrencemonth    : chr  "April" "April" "April" "April" ...
 $ occurrenceday      : int  24 24 25 24 25 25 3 3 3 3 ...
 $ occurrencedayofyear: int  114 114 115 114 115 115 123 123 123 123 ...
 $ occurrencedayofweek: chr  "Thursday" "Thursday" "Friday" "Thursday" ...
 $ occurrencehour     : int  11 13 13 17 16 22 1 4 4 4 ...
 $ MCI                : chr  "Robbery" "Break and Enter" "Assault" "Assault" ...
 $ Division           : chr  "D55" "D31" "D12" "D23" ...
 $ Hood_ID            : int  68 24 30 4 114 73 64 79 79 79 ...
 $ Neighbourhood      : chr  "North Riverdale (68)" "Black Creek (24)" "Brookhaven-Amesbury (30)" "Rexdale-Kipling (4)" ...
 $ Lat                : num  43.7 43.8 43.7 43.7 43.7 ...
 $ Long               : num  -79.3 -79.5 -79.5 -79.6 -79.5 ...
 $ ObjectId           : int  1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, ".internal.selfref")=<externalptr> 
unique(MCI_dt$premisetype)
[1] "Commercial" "House"      "Apartment"  "Outside"    "Other"     
sort(unique(MCI_dt$ucr_code))
 [1] 1410 1420 1430 1440 1450 1455 1457 1460 1461 1462 1470 1475 1480 1610 2120 2121 2125 2130 2132
[20] 2133 2135
sort(unique(MCI_dt$Hood_ID))
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24
 [25]  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48
 [49]  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96
 [97]  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
[121] 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
unique(MCI_dt$MCI)
[1] "Robbery"         "Break and Enter" "Assault"         "Theft Over"      "Auto Theft"     

Uniform Crime Reporting Survey (UCR). UCR Incident-Based Survey: RDC User Manual:

  • 1410 - Aggravated Assault – Level 3
  • 1420 - Assault with Weapon or Causing Bodily Harm – Level 2
  • 1430 - Assault – Level 1
  • 1440 - Unlawfully Causing Bodily Harm
  • 1450 - Discharge Firearm with Intent
  • 1455 - Using Firearm/Imitation of Firearm in commission of offence
  • 1457 - Pointing a Firearm
  • 1460 - Assault Against Peace-Public Officer
  • 1461 – Assault against Peace Officer with a Weapon or Causing Bodily Harm
  • 1462 – Aggravated Assault against Peace Officer
  • 1470 - Criminal Negligence Causing Bodily Harm
  • 1475 – Trap Likely To or Causing Bodily Harm
  • 1480 - Other Assaults
  • 1610 - Robbery
  • 2120 - Break and Enter
  • 2121 – Break and Enter to Steal Firearm
  • 2125 – Break and Enter of a Motor Vehicle to obtain a Firearm
  • 2130 – Theft over $5,000
  • 2132 – Theft over $5,000 from a Motor Vehicle
  • 2133 – Shoplifting over $5,000
  • 2135 – Theft of a Motor Vehicle

Statistics Canada:

Assault refers to three levels of physical assaults which include the following categories:

Common assault, (section 265). This includes the Criminal Code category assault (level 1). This is the least serious form of assault and includes pushing, slapping, punching, and face-to-face verbal threats.

Major assault levels 2 and 3, (sections 267, 268). This includes more serious forms of assault, i.e. assault with a weapon or causing bodily harm (level 2) and aggravated assault (level 3). Assault level 2 involves carrying, using or threatening to use a weapon against someone or causing someone bodily harm. Assault level 3 involves wounding, maiming, disfiguring or endangering the life of someone.

Criminal Code (R.S.C., 1985, c. C-46)

bodily harm means any hurt or injury to a person that interferes with the health or comfort of the person and that is more than merely transient or trifling in nature; (lésions corporelles)

Criminal negligence
219 (1) Every one is criminally negligent who
(a) in doing anything, or
(b) in omitting to do anything that it is his duty to do,
shows wanton or reckless disregard for the lives or safety
of other persons.

Assault
265 (1) A person commits an assault when
(a) without the consent of another person, he applies force intentionally to that other person, directly or indirectly;
(b) he attempts or threatens, by an act or a gesture, to apply force to another person, if he has, or causes that other person to believe on reasonable grounds that he has, present ability to effect his purpose; or
(c) while openly wearing or carrying a weapon or an imitation thereof, he accosts or impedes another person or begs.

Breaking and entering with intent, committing offence or breaking out
348 (1) Every one who
(a) breaks and enters a place with intent to commit an indictable offence therein,
(b) breaks and enters a place and commits an indictable offence therein, or
(c) breaks out of a place after
(i) committing an indictable offence therein, or
(ii) entering the place with intent to commit an indictable offence therein, is guilty (d) if the offence is committed in relation to a dwelling-house, of an indictable offence and liable to imprisonment for life, and
(e) if the offence is committed in relation to a place other than a dwelling-house, of an indictable offence and liable to imprisonment for a term not exceeding ten years or of an offence punishable on summary conviction.

Robbery
343 Every one commits robbery who
(a) steals, and for the purpose of extorting whatever is stolen or to prevent or overcome resistance to the stealing, uses violence or threats of violence to a person or property;
(b) steals from any person and, at the time he steals or immediately before or immediately thereafter, wounds, beats, strikes or uses any personal violence to that person;
(c) assaults any person with intent to steal from him; or
(d) steals from any person while armed with an offensive weapon or imitation thereof.

Breaking and Entering in Canada - 2002, Juristat, Statistics Canada – Catalogue no. 85-002-XPE, Vol. 24, no. 5, page 1: > In 2002, over 31,000 persons were charged with B&E, the vast majority of whom were male (91%). Four in ten persons charged with B&E were youths. For property and violent crimes overall, youths represented 26% and 16% of persons charged, respectively.

Mathieu Charron, Neighbourhood Characteristics and the Distribution of Police-reported Crime in the City of Toronto, Canadian Centre For Justice Statistics, Statistics Canada, Catalogue no. 85-561-M, no. 18.
p. 11:

Crimes reported to the police are not randomly distributed throughout Toronto, but are concentrated in certain areas. An examination of local crime rates (the relationship between the number of crimes and the population at a local level) shows that the rates of violent crime are higher near the downtown core and in the east and northwest areas of the city (Map 5; See ‘Mapping techniques’ in the Methodology section for technical details.), which correspond roughly to the neighbourhoods along the Canadian National railway and to the areas where residents earn the lowest individual incomes (Map 3). There are some hot spots within these areas that have higher rates.Some of these are Danforth, downtown east side and the intersections of Lawrence and Morningside, Jane and Finch, and Jane and Eglinton.

p. 12:

In contrast, in the north area along Yonge Street, where residents earn a higher income, the violent crime rate is much lower than average. The business district—the Bay Street area where most of the workers in the finance and insurance industry are employed—has a violent crime rate well below the average for the city of Toronto. This differs from most of the other Canadian cities that have been the focus of studies, where the violent crime rate in the centre was high (Fitzgerald et al. 2004; Wallace et al. 2006; Kitchen 2006; Charron 2008). A similar situation was noted in Montréal, where the crime hot spots were spread out in many areas of the city (Savoie et al. 2006). The results suggest that the complex social geography of large cities like Toronto and Montréal is related to the spatial organization of crime.

pp. 12-13:

Several neighbourhood characteristics vary according to the local police-reported crime rate. Neighbourhoods with a high rate of violent crime are more densely populated and have a higher percentage of residents living in multi-unit dwellings.They also have the highest percentages of children (under the age of 15), renters, single-parent families and visible minorities. The residents of these neighbourhoods are also less likely to have a university degree, more likely to earn a lower wage, and more likely to live in low-income households.

p. 23:

As for demographic characteristics, rates of harassment and common assault increase with the proportion of children (under 15) and of young men (aged 20 to 29). Rates of sexual assaults, threats, major assaults and robberies decrease as the proportion of people aged 65 and older increases.

p. 24:

Motor vehicle theft rates are higher in neighbourhoods with higher proportions of children (under 15) and young men aged 20 to 29. They are also higher in neighbourhoods where access to socio-economic resources is limited or where there is a subway or train station, as well as in clusters of commercial and manufacturing activity.

p. 25:

The spatial structure of breaking and entering varies essentially with urban and economic activity characteristics. More specifically, results show that breaking and entering is relatively more frequent in central neighbourhoods, with high commercial activity, but less so in areas with high numbers of office jobs (Table 9).

p. 26:

Uttering threats, major assault and drug offences showed the closest association with access to socio-economic resources. Other strong links were noted for mischief, motor vehicle theft, robbery, sexual assault and common assault. Only other thefts (which exclude shoplifting, theft from a motor vehicle and motor vehicle theft) and breaking and entering were not significantly associated with access to socio-economic resources. Economic vulnerability was associated with generally serious violent crimes: robbery, major assault, sexual assault and uttering threats. It was not related to common assault, harassment or any type of property crime.

City of Toronto Neighbourhood Profiles

Neighbourhoods are aggregated from census tracts.

neighbourhoods_dt <- fread("neighbourhood-profiles-2016-csv.csv")

neighbourhoods_dt <- neighbourhoods_dt[, -c("_id", "Category", "Topic", "Data Source", "City of Toronto")]

Select census variables.

v <- c("Neighbourhood Number",
       "Population, 2016",
       "Land area in square kilometres",
       "Children (0-14 years)",
       "Seniors (65+ years)",
       "Private households by household size",
       "Average household size",
       "In low income based on the Low-income cut-offs, after tax (LICO-AT)",
       "Prevalence of low income based on the Low-income cut-offs, after tax (LICO-AT) (%)",
       "Renter",
       "Spending 30% or more of income on shelter costs",
       "University certificate, diploma or degree at bachelor level or above",
       "Unemployed (Males)",
       "Unemployment rate (Males)",
       "Public transit",
       "Walked",
       "Employment income: Average amount ($)",
       "Social assistance benefits: Population with an amount")

Assuring myself that “Renter” only occurs once, and therefore is not counted both for households and persons but only for households.

length(grep("Renter", neighbourhoods_dt$Characteristic))
[1] 1
neighbourhoods_v <-
neighbourhoods_dt[Characteristic %in% v,]

neighbourhoods_v <- transpose(neighbourhoods_v)

head(neighbourhoods_v)

Land area is in square kilometers. Children are children 0 to 14. Households_unaffordable is the number of households spending 30% or more of income on shelter costs: see Canada Mortgage and Housing Corporation, About Affordable Housing in Canada.

neighbourhoods_census <- neighbourhoods_v[!1,.(id=V1, Population=V2, Land_area=V3, Children=V4,
                                              Seniors=V5, Households=V6, Average_household_size=V7,
                                              LICO=V8, LICO_prevalence=V9, Renters=V10,
                                              Households_unaffordable=V11,
                                              Unemployed_males=V12, Unemployment_rate_males=V13,
                                              Public_transit_to_work=V14, Walk_to_work=V15,
                                              Average_employment_income=V16,
                                              Social_assistance_recipients=V17)]

head(neighbourhoods_census) 
neighbourhoods_census$id <- as.integer(neighbourhoods_census$id)

neighbourhoods_census$Population <- as.integer(gsub(",", "", neighbourhoods_census$Population))

neighbourhoods_census$Land_area <- as.numeric(neighbourhoods_census$Land_area)

neighbourhoods_census$Children <- as.integer(gsub(",", "", neighbourhoods_census$Children))

neighbourhoods_census$Seniors <- as.integer(gsub(",", "", neighbourhoods_census$Seniors))

neighbourhoods_census$Households <- as.integer(neighbourhoods_census$Households)

neighbourhoods_census$Average_household_size <- as.numeric(neighbourhoods_census$Average_household_size)

neighbourhoods_census$LICO <- as.integer(gsub(",", "", neighbourhoods_census$LICO))

neighbourhoods_census$LICO_prevalence <- as.numeric(neighbourhoods_census$LICO_prevalence)

neighbourhoods_census$Renters <- as.integer(gsub(",", "", neighbourhoods_census$Renters))

neighbourhoods_census$Households_unaffordable <- as.integer(gsub(",", "", neighbourhoods_census$Households_unaffordable))

neighbourhoods_census$Unemployed_males <- as.integer(gsub(",", "", neighbourhoods_census$Unemployed_males))

neighbourhoods_census$Unemployment_rate_males <- as.numeric(neighbourhoods_census$Unemployment_rate_males)

neighbourhoods_census$Public_transit_to_work <- as.integer(gsub(",", "", neighbourhoods_census$Public_transit_to_work))

neighbourhoods_census$Walk_to_work <- as.integer(gsub(",", "", neighbourhoods_census$Walk_to_work))

neighbourhoods_census$Average_employment_income <- as.numeric(gsub(",", "", neighbourhoods_census$Average_employment_income))

neighbourhoods_census$Social_assistance_recipients <- as.integer(gsub(",", "", neighbourhoods_census$Social_assistance_recipients))

neighbourhoods_census <- neighbourhoods_census[order(id)]

str(neighbourhoods_census)
Classes ‘data.table’ and 'data.frame':  140 obs. of  17 variables:
 $ id                          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Population                  : int  33312 32954 10360 10529 9456 22000 22156 10948 15535 11051 ...
 $ Land_area                   : num  29.81 4.52 3.31 2.49 2.86 ...
 $ Children                    : int  5060 7090 1730 1640 1805 4240 3555 1450 2120 1770 ...
 $ Seniors                     : int  4980 3560 1880 1730 1275 3585 4905 3045 3290 2025 ...
 $ Households                  : int  10280 9880 3280 3845 3220 7785 8510 4135 6260 3865 ...
 $ Average_household_size      : num  3.2 3.32 3.09 2.69 2.93 2.82 2.6 2.45 2.43 2.86 ...
 $ LICO                        : int  4550 7140 1485 1640 1695 4340 2470 1090 1250 660 ...
 $ LICO_prevalence             : num  13.8 21.8 14.7 15.8 17.9 19.7 11.2 10.8 8.2 6 ...
 $ Renters                     : int  3275 5455 1245 1685 1470 3735 3925 1620 2745 595 ...
 $ Households_unaffordable     : int  3270 3715 1065 1185 1080 2730 2645 1325 1900 750 ...
 $ Unemployed_males            : int  870 890 260 290 245 475 440 215 250 205 ...
 $ Unemployment_rate_males     : num  9.2 11.4 9.8 10.4 10.5 8.8 7.8 8.4 5.7 6.7 ...
 $ Public_transit_to_work      : int  4380 4110 1030 1345 1330 2665 2380 1200 2010 950 ...
 $ Walk_to_work                : int  425 385 110 150 70 270 140 65 175 75 ...
 $ Average_employment_income   : num  33340 28126 34385 35988 33188 ...
 $ Social_assistance_recipients: int  1290 2915 650 720 705 1710 840 410 370 145 ...
 - attr(*, ".internal.selfref")=<externalptr> 

Data manipulation

MCI

MCI_2018 <- MCI_dt[reportedyear==2018]

MCI_2018_nbd <- MCI_2018[, c("MCI", "Hood_ID")]

str(MCI_2018_nbd)
Classes ‘data.table’ and 'data.frame':  36303 obs. of  2 variables:
 $ MCI    : chr  "Assault" "Robbery" "Break and Enter" "Break and Enter" ...
 $ Hood_ID: int  75 86 132 121 121 1 122 77 86 31 ...
 - attr(*, ".internal.selfref")=<externalptr> 

The MCI dataset classifies reports as Assault, Auto Theft, Break and Enter, Robbery, and Theft Over.

MCI_2018_grouped <- MCI_2018_nbd[,.(Number_of_reports=.N),by=.(id=Hood_ID, category=MCI)]

MCI_2018_grouped <- MCI_2018_grouped[order(id)]
Assault_MCI <- MCI_2018_grouped[category=="Assault", .(Assault_reports=sum(Number_of_reports)), by=.(id)]

Auto_theft_MCI <- MCI_2018_grouped[category=="Auto Theft", .(Auto_theft_reports=sum(Number_of_reports)), by=.(id)]

BE_MCI <- MCI_2018_grouped[category=="Break and Enter", .(BE_reports=sum(Number_of_reports)), by=.(id)]

Robbery_MCI <- MCI_2018_grouped[category=="Robbery", .(Robbery_reports=sum(Number_of_reports)), by=.(id)]

Theft_over_MCI <- MCI_2018_grouped[category=="Theft Over", .(Theft_over_reports=sum(Number_of_reports)), by=.(id)]

Merge demographic table and MCI table

neighbourhoods_merged <- neighbourhoods_census

neighbourhoods_merged <- merge(neighbourhoods_merged, Assault_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Robbery_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, BE_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Theft_over_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Auto_theft_MCI, by="id", all=TRUE)

#Robbery and Theft Over have missing values
neighbourhoods_merged[is.na(neighbourhoods_merged)] <- 0

str(neighbourhoods_merged)
Classes ‘data.table’ and 'data.frame':  140 obs. of  22 variables:
 $ id                          : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Population                  : int  33312 32954 10360 10529 9456 22000 22156 10948 15535 11051 ...
 $ Land_area                   : num  29.81 4.52 3.31 2.49 2.86 ...
 $ Children                    : int  5060 7090 1730 1640 1805 4240 3555 1450 2120 1770 ...
 $ Seniors                     : int  4980 3560 1880 1730 1275 3585 4905 3045 3290 2025 ...
 $ Households                  : int  10280 9880 3280 3845 3220 7785 8510 4135 6260 3865 ...
 $ Average_household_size      : num  3.2 3.32 3.09 2.69 2.93 2.82 2.6 2.45 2.43 2.86 ...
 $ LICO                        : int  4550 7140 1485 1640 1695 4340 2470 1090 1250 660 ...
 $ LICO_prevalence             : num  13.8 21.8 14.7 15.8 17.9 19.7 11.2 10.8 8.2 6 ...
 $ Renters                     : int  3275 5455 1245 1685 1470 3735 3925 1620 2745 595 ...
 $ Households_unaffordable     : int  3270 3715 1065 1185 1080 2730 2645 1325 1900 750 ...
 $ Unemployed_males            : int  870 890 260 290 245 475 440 215 250 205 ...
 $ Unemployment_rate_males     : num  9.2 11.4 9.8 10.4 10.5 8.8 7.8 8.4 5.7 6.7 ...
 $ Public_transit_to_work      : int  4380 4110 1030 1345 1330 2665 2380 1200 2010 950 ...
 $ Walk_to_work                : int  425 385 110 150 70 270 140 65 175 75 ...
 $ Average_employment_income   : num  33340 28126 34385 35988 33188 ...
 $ Social_assistance_recipients: int  1290 2915 650 720 705 1710 840 410 370 145 ...
 $ Assault_reports             : int  284 259 56 72 75 101 75 46 18 17 ...
 $ Robbery_reports             : num  69 73 11 25 15 18 16 6 7 2 ...
 $ BE_reports                  : int  154 28 18 28 7 40 65 31 44 22 ...
 $ Theft_over_reports          : num  50 3 2 4 1 3 4 3 5 5 ...
 $ Auto_theft_reports          : int  495 73 46 54 37 57 51 16 18 20 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "id"

Calculate ratios of MCI to population

neighbourhoods_merged$Assault_ratio <- neighbourhoods_merged$Assault_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Auto_theft_ratio <- neighbourhoods_merged$Auto_theft_reports/neighbourhoods_merged$Population

neighbourhoods_merged$BE_ratio <- neighbourhoods_merged$BE_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Robbery_ratio <- neighbourhoods_merged$Robbery_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Theft_over_ratio <- neighbourhoods_merged$Theft_over_reports/neighbourhoods_merged$Population

Calculate ratios of census variables to population

toronto_avg_household_size <- neighbourhoods_merged[, mean(Average_household_size)]

toronto_avg_employment_income <- neighbourhoods_merged[, mean(Average_employment_income)]

toronto_avg_unemployment_rate_males <- neighbourhoods_merged[, mean(Unemployment_rate_males)]

toronto_avg_household_size
[1] 2.491643
toronto_avg_employment_income
[1] 55698.18
toronto_avg_unemployment_rate_males
[1] 8.108571
neighbourhoods_merged$Children_ratio <- neighbourhoods_merged$Children/neighbourhoods_merged$Population

neighbourhoods_merged$Seniors_ratio <- neighbourhoods_merged$Seniors/neighbourhoods_merged$Population

neighbourhoods_merged$Renters_ratio <- neighbourhoods_merged$Renters/neighbourhoods_merged$Households

neighbourhoods_merged$Households_unaffordable_ratio <- neighbourhoods_merged$Households_unaffordable/neighbourhoods_merged$Households

neighbourhoods_merged$Public_transit_to_work_ratio <- neighbourhoods_merged$Public_transit_to_work/neighbourhoods_merged$Population

neighbourhoods_merged$Social_assistance_recipients_ratio <- neighbourhoods_merged$Social_assistance_recipients/neighbourhoods_merged$Population

neighbourhoods_merged$Average_household_size_ratio <- neighbourhoods_merged$Average_household_size/toronto_avg_household_size

neighbourhoods_merged$Average_employment_income_ratio <- neighbourhoods_merged$Average_employment_income/toronto_avg_employment_income

neighbourhoods_merged$Unemployment_rate_males_ratio <- neighbourhoods_merged$Unemployment_rate_males/toronto_avg_unemployment_rate_males

Choropleths by neighbourhood

Shapefile

Neighbourhoods (WGS84). City of Toronto, Social Development, Finance & Administration

https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#a45bd45a-ede8-730e-1abc-93105b2c439f

“neighbourhoods_planning_areas_wgs84.zip”

nbds <- readOGR("C:/Users/14165/Desktop/Shapefiles/neighbourhoods_planning_areas_wgs84", "NEIGHBORHOODS_WGS84")
OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\14165\Desktop\Shapefiles\neighbourhoods_planning_areas_wgs84", layer: "NEIGHBORHOODS_WGS84"
with 140 features
It has 2 fields

Add “id” column

nbds@data$id <- as.integer(nbds@data$AREA_S_CD)

Make centroids of each neighbourhood, for placing labels when plotting

nbds.centroids  <- as.data.frame(gCentroid(nbds, byid = TRUE))

Add “id” column

nbds.centroids$id <- nbds@data$id

Shapefile processing

nbds.points = fortify(nbds, region = "id")

nbds.df = join(nbds.points, nbds@data, by = "id")

Merge neighbourhood shapefile and dataframe

nbds_MCI <- merge(nbds.df, neighbourhoods_merged, by = "id")

Low income cut-off (LICO) prevalence

Make and plot choropleth

p.LICO_percent <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = LICO_prevalence/100), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="LICO prevalence", labels=percent_format(accuracy=1), palette = "RdPu", trans = "reverse", breaks = pretty_breaks(n = 10)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="LICO households/total households, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.LICO_percent + guides(fill = guide_legend(reverse = TRUE))

Assaults

Make and plot choropleth

p.assaults <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = Assault_reports), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="Assaults", palette = "YlOrRd", trans = "reverse", breaks = pretty_breaks(n = 8)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="Number of assault reports in Toronto by neighbourhood, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.assaults + guides(fill = guide_legend(reverse = TRUE))

p.assaults_ratio <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = Assault_ratio), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="Assaults/Population", palette = "Reds", trans = "reverse", breaks = pretty_breaks(n = 8)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="Number of assault reports/population count in Toronto by neighbourhood, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.assaults_ratio + guides(fill = guide_legend(reverse = TRUE))

Density Maps of MCI Using stat_density2d

https://stats.stackexchange.com/questions/31726/scatterplot-with-contour-heat-overlay

https://gist.github.com/lmullen/8375785

https://gis.stackexchange.com/questions/165974/r-fortify-causing-polygons-to-tear

MCI_Assault_XY <- MCI_dt[MCI=="Assault",.(X,Y)]

MCI_Auto_Theft_XY <- MCI_dt[MCI=="Auto Theft",.(X,Y)]

MCI_BE_XY <- MCI_dt[MCI=="Break and Enter",.(X,Y)]

MCI_Robbery_XY <- MCI_dt[MCI=="Robbery", .(X,Y)]

MCI_Theft_Over_XY <- MCI_dt[MCI=="Theft Over", .(X,Y)]

Shapefiles

Read shapefiles

torontoBoundary_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/torontoBoundary_wgs84", "citygcs_regional_mun_wgs84")
OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\14165\Desktop\Shapefiles\torontoBoundary_wgs84", layer: "citygcs_regional_mun_wgs84"
with 1 features
It has 3 fields
Integer64 fields read as strings:  AREA_ID OBJECTID 
TTC_subway_lines_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/TTC_subway_lines_wgs84", "TTC_SUBWAY_LINES_WGS84")
OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\14165\Desktop\Shapefiles\TTC_subway_lines_wgs84", layer: "TTC_SUBWAY_LINES_WGS84"
with 4 features
It has 3 fields
Integer64 fields read as strings:  RID 
centreline_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/centreline_wgs84", "CENTRELINE_WGS84")
OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\14165\Desktop\Shapefiles\centreline_wgs84", layer: "CENTRELINE_WGS84"
with 69378 features
It has 17 fields
Integer64 fields read as strings:  GEO_ID LFN_ID FNODE TNODE 

Every linear feature has feature code (FCODE) defined as follow:

201100 Highway
201101 Highway Ramp
201200 Major Arterial Road
201201 Major Arterial Road Ramp
201300 Minor Arterial Road
201301 Minor Arterial Road Ramp
201400 Collector Road
201401 Collector Road Ramp
201500 Local Road
201600 Other Road
201601 Other Ramp
201700 Laneways
201800 Pending
201803 Access Road
201801 Busway
202001 Major Railway
202002 Minor Railway
202003 Railway under construction/proposed
203001 River
203002 Creek/Tributary
204001 Trail
204002 Walkway
205001 Hydro Line
206001 Major Shoreline
206002 Minor Shoreline (Land locked)

centreline_wgs84_major <- centreline_wgs84[centreline_wgs84@data$FCODE %in% c(201100, 201200, 201300, 201400),]
torontoBoundary_wgs84.df <- fortify(torontoBoundary_wgs84)
Regions defined for each Polygons
TTC_subway_lines_wgs84.df <- fortify(TTC_subway_lines_wgs84)

centreline_wgs84_major.df <- fortify(centreline_wgs84_major)
ggplot()+geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .2)

Assault Reports in Toronto, 2014-2018

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Assault_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red")+
 theme_nothing(legend = TRUE) + 
  labs(title="Assault Reports in Toronto, 2014-2018")

Auto Theft Reports in Toronto, 2014-2018

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Auto_Theft_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Auto Theft Reports in Toronto, 2014-2018")

Change bandwidth parameter h

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Auto_Theft_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, h=0.05, n=300, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Auto Theft Reports in Toronto, 2014-2018. h=0.05")

Break and Enter Reports in Toronto, 2014-2018

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_BE_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Break and Enter Reports in Toronto, 2014-2018")

Robbery Reports in Toronto, 2014-2018

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Robbery_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Robbery Reports in Toronto, 2014-2018")

Theft Over $5000 Reports in Toronto, 2014-2018

ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Theft_Over_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Theft Over $5000 Reports in Toronto, 2014-2018")

Correlations Between MCI and Demographics

str(neighbourhoods_merged)
Classes ‘data.table’ and 'data.frame':  140 obs. of  36 variables:
 $ id                                : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Population                        : int  33312 32954 10360 10529 9456 22000 22156 10948 15535 11051 ...
 $ Land_area                         : num  29.81 4.52 3.31 2.49 2.86 ...
 $ Children                          : int  5060 7090 1730 1640 1805 4240 3555 1450 2120 1770 ...
 $ Seniors                           : int  4980 3560 1880 1730 1275 3585 4905 3045 3290 2025 ...
 $ Households                        : int  10280 9880 3280 3845 3220 7785 8510 4135 6260 3865 ...
 $ Average_household_size            : num  3.2 3.32 3.09 2.69 2.93 2.82 2.6 2.45 2.43 2.86 ...
 $ LICO                              : int  4550 7140 1485 1640 1695 4340 2470 1090 1250 660 ...
 $ LICO_prevalence                   : num  13.8 21.8 14.7 15.8 17.9 19.7 11.2 10.8 8.2 6 ...
 $ Renters                           : int  3275 5455 1245 1685 1470 3735 3925 1620 2745 595 ...
 $ Households_unaffordable           : int  3270 3715 1065 1185 1080 2730 2645 1325 1900 750 ...
 $ Unemployed_males                  : int  870 890 260 290 245 475 440 215 250 205 ...
 $ Unemployment_rate_males           : num  9.2 11.4 9.8 10.4 10.5 8.8 7.8 8.4 5.7 6.7 ...
 $ Public_transit_to_work            : int  4380 4110 1030 1345 1330 2665 2380 1200 2010 950 ...
 $ Walk_to_work                      : int  425 385 110 150 70 270 140 65 175 75 ...
 $ Average_employment_income         : num  33340 28126 34385 35988 33188 ...
 $ Social_assistance_recipients      : int  1290 2915 650 720 705 1710 840 410 370 145 ...
 $ Assault_reports                   : int  284 259 56 72 75 101 75 46 18 17 ...
 $ Robbery_reports                   : num  69 73 11 25 15 18 16 6 7 2 ...
 $ BE_reports                        : int  154 28 18 28 7 40 65 31 44 22 ...
 $ Theft_over_reports                : num  50 3 2 4 1 3 4 3 5 5 ...
 $ Auto_theft_reports                : int  495 73 46 54 37 57 51 16 18 20 ...
 $ Assault_ratio                     : num  0.00853 0.00786 0.00541 0.00684 0.00793 ...
 $ Auto_theft_ratio                  : num  0.01486 0.00222 0.00444 0.00513 0.00391 ...
 $ BE_ratio                          : num  0.00462 0.00085 0.00174 0.00266 0.00074 ...
 $ Robbery_ratio                     : num  0.00207 0.00222 0.00106 0.00237 0.00159 ...
 $ Theft_over_ratio                  : num  0.001501 0.000091 0.000193 0.00038 0.000106 ...
 $ Children_ratio                    : num  0.152 0.215 0.167 0.156 0.191 ...
 $ Seniors_ratio                     : num  0.149 0.108 0.181 0.164 0.135 ...
 $ Renters_ratio                     : num  0.319 0.552 0.38 0.438 0.457 ...
 $ Households_unaffordable_ratio     : num  0.318 0.376 0.325 0.308 0.335 ...
 $ Public_transit_to_work_ratio      : num  0.1315 0.1247 0.0994 0.1277 0.1407 ...
 $ Social_assistance_recipients_ratio: num  0.0387 0.0885 0.0627 0.0684 0.0746 ...
 $ Average_household_size_ratio      : num  1.28 1.33 1.24 1.08 1.18 ...
 $ Average_employment_income_ratio   : num  0.599 0.505 0.617 0.646 0.596 ...
 $ Unemployment_rate_males_ratio     : num  1.13 1.41 1.21 1.28 1.29 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr "id"
neighbourhoods_ratios <-
  neighbourhoods_merged[, c("Assault_ratio",
                            "Auto_theft_ratio",
                            "BE_ratio",
                            "Robbery_ratio",
                            "Theft_over_ratio",
                            "Average_household_size",
                            "LICO_prevalence",
                            "Children_ratio",
                            "Seniors_ratio",
                            "Renters_ratio",
                            "Public_transit_to_work_ratio",
                            "Social_assistance_recipients_ratio",
                            "Average_household_size_ratio",
                            "Average_employment_income_ratio",
                            "Unemployment_rate_males_ratio")]

Compute correlations; there are only weak correlations between the MCI and demographic ratios I have selected.

cor(as.matrix(neighbourhoods_ratios))
                                   Assault_ratio Auto_theft_ratio     BE_ratio Robbery_ratio
Assault_ratio                          1.0000000      0.123934572  0.644161811     0.8466125
Auto_theft_ratio                       0.1239346      1.000000000  0.177476235     0.2621782
BE_ratio                               0.6441618      0.177476235  1.000000000     0.5913394
Robbery_ratio                          0.8466125      0.262178218  0.591339393     1.0000000
Theft_over_ratio                       0.6465254      0.356012795  0.692406847     0.5872684
Average_household_size                -0.3220854      0.337037837 -0.318775580    -0.2446828
LICO_prevalence                        0.5444735     -0.115264927  0.273804880     0.4082138
Children_ratio                        -0.3295457      0.127253992 -0.454609155    -0.2649427
Seniors_ratio                         -0.3511057      0.028090974 -0.050108996    -0.2796248
Renters_ratio                          0.3868284     -0.194433863  0.179501393     0.2721201
Public_transit_to_work_ratio           0.1356827     -0.260180807 -0.026893672     0.1262704
Social_assistance_recipients_ratio     0.4263891      0.003831698  0.002276327     0.3276343
Average_household_size_ratio          -0.3220854      0.337037837 -0.318775580    -0.2446828
Average_employment_income_ratio       -0.1899689     -0.095820295  0.158920915    -0.1389625
Unemployment_rate_males_ratio          0.1713490      0.118402893 -0.067685554     0.1195029
                                   Theft_over_ratio Average_household_size LICO_prevalence
Assault_ratio                            0.64652538            -0.32208540       0.5444735
Auto_theft_ratio                         0.35601280             0.33703784      -0.1152649
BE_ratio                                 0.69240685            -0.31877558       0.2738049
Robbery_ratio                            0.58726840            -0.24468277       0.4082138
Theft_over_ratio                         1.00000000            -0.25953552       0.2641209
Average_household_size                  -0.25953552             1.00000000      -0.1531719
LICO_prevalence                          0.26412090            -0.15317188       1.0000000
Children_ratio                          -0.42530132             0.62537285      -0.1074141
Seniors_ratio                           -0.12776436             0.20648236      -0.3889159
Renters_ratio                            0.13890737            -0.55429267       0.6618480
Public_transit_to_work_ratio            -0.01837898            -0.55228205       0.2120016
Social_assistance_recipients_ratio      -0.01122991             0.05802879       0.6612614
Average_household_size_ratio            -0.25953552             1.00000000      -0.1531719
Average_employment_income_ratio          0.03930987            -0.24239320      -0.4525759
Unemployment_rate_males_ratio           -0.05914277             0.46818268       0.5901170
                                   Children_ratio Seniors_ratio Renters_ratio
Assault_ratio                         -0.32954565   -0.35110573     0.3868284
Auto_theft_ratio                       0.12725399    0.02809097    -0.1944339
BE_ratio                              -0.45460916   -0.05010900     0.1795014
Robbery_ratio                         -0.26494269   -0.27962482     0.2721201
Theft_over_ratio                      -0.42530132   -0.12776436     0.1389074
Average_household_size                 0.62537285    0.20648236    -0.5542927
LICO_prevalence                       -0.10741409   -0.38891587     0.6618480
Children_ratio                         1.00000000   -0.14210117    -0.1488113
Seniors_ratio                         -0.14210117    1.00000000    -0.4107354
Renters_ratio                         -0.14881132   -0.41073539     1.0000000
Public_transit_to_work_ratio          -0.18119889   -0.45232803     0.5736128
Social_assistance_recipients_ratio     0.29844040   -0.42513857     0.5328017
Average_household_size_ratio           0.62537285    0.20648236    -0.5542927
Average_employment_income_ratio       -0.06997932    0.18820202    -0.2084157
Unemployment_rate_males_ratio          0.38403635   -0.11106102     0.2099765
                                   Public_transit_to_work_ratio Social_assistance_recipients_ratio
Assault_ratio                                        0.13568268                        0.426389126
Auto_theft_ratio                                    -0.26018081                        0.003831698
BE_ratio                                            -0.02689367                        0.002276327
Robbery_ratio                                        0.12627040                        0.327634301
Theft_over_ratio                                    -0.01837898                       -0.011229909
Average_household_size                              -0.55228205                        0.058028794
LICO_prevalence                                      0.21200157                        0.661261447
Children_ratio                                      -0.18119889                        0.298440398
Seniors_ratio                                       -0.45232803                       -0.425138569
Renters_ratio                                        0.57361284                        0.532801690
Public_transit_to_work_ratio                         1.00000000                        0.189770279
Social_assistance_recipients_ratio                   0.18977028                        1.000000000
Average_household_size_ratio                        -0.55228205                        0.058028794
Average_employment_income_ratio                     -0.11905638                       -0.514634851
Unemployment_rate_males_ratio                       -0.16862450                        0.604488407
                                   Average_household_size_ratio Average_employment_income_ratio
Assault_ratio                                       -0.32208540                     -0.18996891
Auto_theft_ratio                                     0.33703784                     -0.09582029
BE_ratio                                            -0.31877558                      0.15892091
Robbery_ratio                                       -0.24468277                     -0.13896248
Theft_over_ratio                                    -0.25953552                      0.03930987
Average_household_size                               1.00000000                     -0.24239320
LICO_prevalence                                     -0.15317188                     -0.45257594
Children_ratio                                       0.62537285                     -0.06997932
Seniors_ratio                                        0.20648236                      0.18820202
Renters_ratio                                       -0.55429267                     -0.20841573
Public_transit_to_work_ratio                        -0.55228205                     -0.11905638
Social_assistance_recipients_ratio                   0.05802879                     -0.51463485
Average_household_size_ratio                         1.00000000                     -0.24239320
Average_employment_income_ratio                     -0.24239320                      1.00000000
Unemployment_rate_males_ratio                        0.46818268                     -0.43807384
                                   Unemployment_rate_males_ratio
Assault_ratio                                         0.17134895
Auto_theft_ratio                                      0.11840289
BE_ratio                                             -0.06768555
Robbery_ratio                                         0.11950289
Theft_over_ratio                                     -0.05914277
Average_household_size                                0.46818268
LICO_prevalence                                       0.59011703
Children_ratio                                        0.38403635
Seniors_ratio                                        -0.11106102
Renters_ratio                                         0.20997653
Public_transit_to_work_ratio                         -0.16862450
Social_assistance_recipients_ratio                    0.60448841
Average_household_size_ratio                          0.46818268
Average_employment_income_ratio                      -0.43807384
Unemployment_rate_males_ratio                         1.00000000

Time Series

str(MCI_dt)
Classes ‘data.table’ and 'data.frame':  167525 obs. of  29 variables:
 $ X                  : num  -79.3 -79.5 -79.5 -79.6 -79.5 ...
 $ Y                  : num  43.7 43.8 43.7 43.7 43.7 ...
 $ Index_             : int  214 215 216 217 218 219 220 221 222 223 ...
 $ event_unique_id    : chr  "GO-20141948968" "GO-20141950728" "GO-20141956416" "GO-20141956867" ...
 $ occurrencedate     : chr  "2014-04-24T11:29:00.000Z" "2014-04-24T13:00:00.000Z" "2014-04-25T13:20:00.000Z" "2014-04-24T17:00:00.000Z" ...
 $ reporteddate       : chr  "2014-04-24T12:46:00.000Z" "2014-04-24T15:58:00.000Z" "2014-04-25T13:52:00.000Z" "2014-04-25T10:30:00.000Z" ...
 $ premisetype        : chr  "Commercial" "House" "Apartment" "Outside" ...
 $ ucr_code           : int  1610 2120 1430 1430 1430 1430 1430 1420 1420 1420 ...
 $ ucr_ext            : int  200 200 100 100 100 100 100 100 100 100 ...
 $ offence            : chr  "Robbery - Mugging" "B&E" "Assault" "Assault" ...
 $ reportedyear       : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ reportedmonth      : chr  "April" "April" "April" "April" ...
 $ reportedday        : int  24 24 25 25 25 25 3 3 3 3 ...
 $ reporteddayofyear  : int  114 114 115 115 115 115 123 123 123 123 ...
 $ reporteddayofweek  : chr  "Thursday" "Thursday" "Friday" "Friday" ...
 $ reportedhour       : int  12 15 13 10 16 22 3 4 4 4 ...
 $ occurrenceyear     : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ occurrencemonth    : chr  "April" "April" "April" "April" ...
 $ occurrenceday      : int  24 24 25 24 25 25 3 3 3 3 ...
 $ occurrencedayofyear: int  114 114 115 114 115 115 123 123 123 123 ...
 $ occurrencedayofweek: chr  "Thursday" "Thursday" "Friday" "Thursday" ...
 $ occurrencehour     : int  11 13 13 17 16 22 1 4 4 4 ...
 $ MCI                : chr  "Robbery" "Break and Enter" "Assault" "Assault" ...
 $ Division           : chr  "D55" "D31" "D12" "D23" ...
 $ Hood_ID            : int  68 24 30 4 114 73 64 79 79 79 ...
 $ Neighbourhood      : chr  "North Riverdale (68)" "Black Creek (24)" "Brookhaven-Amesbury (30)" "Rexdale-Kipling (4)" ...
 $ Lat                : num  43.7 43.8 43.7 43.7 43.7 ...
 $ Long               : num  -79.3 -79.5 -79.5 -79.6 -79.5 ...
 $ ObjectId           : int  1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "index")= int 
  ..- attr(*, "__reportedyear")= int  1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, "__MCI")= int  3 4 5 6 7 8 9 10 11 12 ...
MCI_dt_dates <- MCI_dt[,.(reportedyear,reportedmonth,MCI)]

MCI_dt_dates$reportedmonth <- match(MCI_dt_dates$reportedmonth, month.name)

str(MCI_dt_dates)
Classes ‘data.table’ and 'data.frame':  167525 obs. of  3 variables:
 $ reportedyear : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ reportedmonth: int  4 4 4 4 4 4 5 5 5 5 ...
 $ MCI          : chr  "Robbery" "Break and Enter" "Assault" "Assault" ...
 - attr(*, ".internal.selfref")=<externalptr> 

Assault Reports Time Series

Assault_dt <- MCI_dt_dates[MCI=="Assault",.N, by = .(reportedyear,reportedmonth)]

Assault_dt <- Assault_dt[order(reportedyear,reportedmonth)]

str(Assault_dt)
Classes ‘data.table’ and 'data.frame':  60 obs. of  3 variables:
 $ reportedyear : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ reportedmonth: int  1 2 3 4 5 6 7 8 9 10 ...
 $ N            : int  1188 1162 1228 1232 1502 1556 1377 1469 1490 1409 ...
 - attr(*, ".internal.selfref")=<externalptr> 
Assault.ts <- ts(Assault_dt$N, start = 2014, frequency = 12)
autoplot(Assault.ts)

Assault.ts.components <- decompose(Assault.ts)
autoplot(Assault.ts.components)

Assault.ts.stl <- stl(Assault.ts, s.window = "periodic")
autoplot(Assault.ts.stl)

Assault.ts.arima <- auto.arima(Assault.ts)

Assault.ts.arima
Series: Assault.ts 
ARIMA(0,1,2)(1,1,0)[12] 

Coefficients:
          ma1     ma2     sar1
      -1.1118  0.3465  -0.5257
s.e.   0.1713  0.1833   0.1308

sigma^2 estimated as 5311:  log likelihood=-269.35
AIC=546.7   AICc=547.65   BIC=554.1
Assault.ts.arima.forecast <- forecast(Assault.ts.arima, level = c(95), h = 12)

autoplot(Assault.ts.arima.forecast)

Theft Over $5000 Time Series

Theft_Over_dt <- MCI_dt_dates[MCI=="Theft Over",.N, by = .(reportedyear,reportedmonth)]

Theft_Over_dt <- Theft_Over_dt[order(reportedyear,reportedmonth)]

Theft_Over.ts <- ts(Theft_Over_dt$N, start = 2014, frequency = 12)

Theft_Over.ts.components <- decompose(Theft_Over.ts)

Theft_Over.ts.stl <- stl(Theft_Over.ts, s.window = "periodic")
autoplot(Theft_Over.ts.components)

autoplot(Theft_Over.ts.stl)

Theft_Over.ts.arima <- auto.arima(Theft_Over.ts)

Theft_Over.ts.arima
Series: Theft_Over.ts 
ARIMA(1,1,1) 

Coefficients:
         ar1      ma1
      0.3119  -0.8859
s.e.  0.1429   0.0635

sigma^2 estimated as 176.4:  log likelihood=-235.79
AIC=477.58   AICc=478.02   BIC=483.81
---
title: "Toronto Police Service Major Crime Indicators 2014 to 2018"
output: 
  html_notebook: 
    toc: yes
---

# Author

* Jordan Bell
* July 10, 2019
* <https://jordanbell2357.github.io/MCI.nb.html>

# Sources and descriptions of datasets

## TPS MCI 2014 to 2018

Major Crime Indicators (MCI)

[Toronto Police Service Public Safety Data Portal](http://data.torontopolice.on.ca/)

"MCI_2014_to_2018.csv"

<http://data.torontopolice.on.ca/pages/glossary>:

> For the most part, the statistics on the following pages use an incident-based counting method. Generally, each type of major crime that occurred during an incident will be counted. For example, if an assault and a break and enter took place in the same incident, they would be counted once in each category. Statistics Canada also presents incident-based crime statistics, but generally counts only the most serious offence per incident. Some other police services present their crime statistics using the offence-based method, which counts all offences in each incident. It is important to note these differences when comparing our crime statistics to those provided by Statistics Canada or by other police agencies.

> **Assault**. The direct or indirect application of force to another person, or the attempt or threat to apply force to another person, without that person’s consent.
>
> **Robbery**. The act of taking property from another person or business by the use of force or intimidation in the presence of the victim.
>
> **Break and Enter**. The act of entering a place with the intent to commit an indictable offence therein.
>
> **Auto Theft**. The act of taking or another person's vehicle (not including attempts). Auto Theft figures represent the number of vehicles stolen.
>
> **Theft Over**. The act of stealing property in excess of $5,000 (excluding auto theft).


## City of Toronto Neighbourhood Profiles

[Neighbourhood Profiles](https://portal0.cf.opendata.inter.sandbox-toronto.ca/dataset/neighbourhood-profiles/)

> Statistics Canada does not release data at the level of Toronto's social planning neighbourhoods. Neighbourhood level data for 2016 are initially calculated by summing data for the Census Tracts which comprise each neighbourhood.

"neighbourhood-profiles-2016-csv.csv"



# Loading, understanding and cleaning datasets

```{r}
library(data.table) #fread, setcolorder, rbindlist
library(sp) #used by rgdal
library(rgdal) #readOGR
library(ggplot2) #fortify
library(plyr) #join
library(scales) #scale_fill_distiller
library(ggmap) #theme_nothing
library(rgeos) #gCentroid
library(forecast) #autoplot ts, auto.arima
```


## MCI dataset and definitions of UCR codes

```{r}
MCI_dt <- fread("MCI_2014_to_2018.csv")
```

```{r}
str(MCI_dt)
```

```{r}
unique(MCI_dt$premisetype)

sort(unique(MCI_dt$ucr_code))

sort(unique(MCI_dt$Hood_ID))

unique(MCI_dt$MCI)
```

Uniform Crime Reporting Survey (UCR). [UCR Incident-Based Survey: RDC User Manual](https://gsg.uottawa.ca/data/teaching/crm/rdc_users_nanual_final_2013_feb_7rvd.pdf):

>* 1410 - Aggravated Assault – Level 3
>* 1420	- Assault with Weapon or Causing Bodily Harm – Level 2
>* 1430	- Assault – Level 1
>* 1440	- Unlawfully Causing Bodily Harm
>* 1450	- Discharge Firearm with Intent
>* 1455 - Using Firearm/Imitation of Firearm in commission of offence 
>* 1457 - Pointing a Firearm
>* 1460 -	Assault Against Peace-Public Officer
>* 1461 – Assault against Peace Officer with a Weapon or Causing Bodily Harm 
>* 1462 – Aggravated Assault against Peace Officer
>* 1470 -	Criminal Negligence Causing Bodily Harm
>* 1475 – Trap Likely To or Causing Bodily Harm
>* 1480 -	Other Assaults
>* 1610 - Robbery
>* 2120 -	Break and Enter
>* 2121 – Break and Enter to Steal Firearm 
>* 2125 – Break and Enter of a Motor Vehicle to obtain a Firearm
>* 2130 – Theft over $5,000
>* 2132 – Theft over $5,000 from a Motor Vehicle
>* 2133 – Shoplifting over $5,000 
>* 2135 – Theft of a Motor Vehicle

[Statistics Canada](https://www150.statcan.gc.ca/n1/pub/85-224-x/2008000/dd-eng.htm):

>Assault refers to three levels of physical assaults which include the following categories:
>
>Common assault, (section 265). This includes the Criminal Code category assault (level 1). This is the least serious form of assault and includes pushing, slapping, punching, and face-to-face verbal threats.
>
>Major assault levels 2 and 3, (sections 267, 268). This includes more serious forms of assault, i.e. assault with a weapon or causing bodily harm (level 2) and aggravated assault (level 3). Assault level 2 involves carrying, using or threatening to use a weapon against someone or causing someone bodily harm. Assault level 3 involves wounding, maiming, disfiguring or endangering the life of someone.

[Criminal Code (R.S.C., 1985, c. C-46)](https://laws-lois.justice.gc.ca/PDF/C-46.pdf)

> **bodily harm** means any hurt or injury to a person that
interferes with the health or comfort of the person and
that is more than merely transient or trifling in nature; (*lésions corporelles*)

> **Criminal negligence**  
> 219 (1) Every one is criminally negligent who  
> (a) in doing anything, or  
> (b) in omitting to do anything that it is his duty to do,  
> shows wanton or reckless disregard for the lives or safety  
> of other persons.

> **Assault**  
> 265 (1) A person commits an assault when  
> (a) without the consent of another person, he applies
force intentionally to that other person, directly or indirectly;  
> (b) he attempts or threatens, by an act or a gesture, to
apply force to another person, if he has, or causes that
other person to believe on reasonable grounds that he
has, present ability to effect his purpose; or  
> (c) while openly wearing or carrying a weapon or an
imitation thereof, he accosts or impedes another person or begs.

> **Breaking and entering with intent, committing offence or breaking out**  
> 348 (1) Every one who  
> (a) breaks and enters a place with intent to commit an indictable offence therein,  
> (b) breaks and enters a place and commits an indictable offence therein, or  
> (c) breaks out of a place after  
> (i) committing an indictable offence therein, or  
> (ii) entering the place with intent to commit an indictable offence therein,
is guilty
> (d) if the offence is committed in relation to a
dwelling-house, of an indictable offence and liable to
imprisonment for life, and  
> (e) if the offence is committed in relation to a place
other than a dwelling-house, of an indictable offence
and liable to imprisonment for a term not exceeding
ten years or of an offence punishable on summary
conviction.

> **Robbery**  
> 343 Every one commits robbery who  
> (a) steals, and for the purpose of extorting whatever is
stolen or to prevent or overcome resistance to the
stealing, uses violence or threats of violence to a person or property;  
> (b) steals from any person and, at the time he steals or
immediately before or immediately thereafter,
wounds, beats, strikes or uses any personal violence to
that person;  
> (c) assaults any person with intent to steal from him; or  
> (d) steals from any person while armed with an offensive weapon or imitation thereof.


[Breaking and Entering in Canada - 2002](http://www.publications.gc.ca/Collection-R/Statcan/85-002-XIE/0050485-002-XIE.pdf), Juristat,
Statistics Canada – Catalogue no. 85-002-XPE, Vol. 24, no. 5, page 1:
> In 2002, over 31,000 persons were charged with B&E, the vast majority of whom were male (91%). Four in ten persons
charged with B&E were youths. For property and violent crimes overall, youths represented 26% and 16% of persons
charged, respectively.

Mathieu Charron, [Neighbourhood Characteristics
and the Distribution of
Police-reported Crime in the City
of Toronto](https://www150.statcan.gc.ca/n1/en/pub/85-561-m/85-561-m2009018-eng.pdf?st=KOlsEaoK),
Canadian Centre For Justice Statistics, Statistics Canada,
Catalogue no. 85-561-M, no. 18.  
p. 11:

> Crimes reported to the police are not randomly distributed throughout Toronto, but are concentrated in certain
areas. An examination of local crime rates (the relationship between the number of crimes and the population
at a local level) shows that the rates of violent crime are higher near the downtown core and in the east and
northwest areas of the city (Map 5; See 'Mapping techniques' in the Methodology section for technical details.),
which correspond roughly to the neighbourhoods along the Canadian National railway and to the areas where
residents earn the lowest individual incomes (Map 3). There are some hot spots within these areas that have
higher rates.Some of these are Danforth, downtown east side and the intersections of Lawrence and Morningside,
Jane and Finch, and Jane and Eglinton.

p. 12:  

> In contrast, in the north area along Yonge Street, where residents earn a higher income, the violent crime rate is
much lower than average. The business district—the Bay Street area where most of the workers in the finance
and insurance industry are employed—has a violent crime rate well below the average for the city of Toronto. This
differs from most of the other Canadian cities that have been the focus of studies, where the violent crime rate in the
centre was high (Fitzgerald et al. 2004; Wallace et al. 2006; Kitchen 2006; Charron 2008). A similar situation was
noted in Montréal, where the crime hot spots were spread out in many areas of the city (Savoie et al. 2006). The
results suggest that the complex social geography of large cities like Toronto and Montréal is related to the spatial
organization of crime.

pp. 12-13:

> Several neighbourhood characteristics vary according to the local police-reported crime rate. Neighbourhoods with a
high rate of violent crime are more densely populated and have a higher percentage of residents living in multi-unit
dwellings.They also have the highest percentages of children (under the age of 15), renters, single-parent families
and visible minorities. The residents of these neighbourhoods are also less likely to have a university degree, more
likely to earn a lower wage, and more likely to live in low-income households.

p. 23:

> As for demographic characteristics, rates of harassment and common assault increase with the proportion of children (under 15) and of young men (aged 20 to 29). Rates of sexual assaults, threats, major assaults and robberies
decrease as the proportion of people aged 65 and older increases.

p. 24:  

> Motor vehicle theft rates are higher in neighbourhoods with higher proportions of children (under 15) and young men
aged 20 to 29. They are also higher in neighbourhoods where access to socio-economic resources is limited or
where there is a subway or train station, as well as in clusters of commercial and manufacturing activity.

p. 25:

> The spatial structure of breaking and entering varies essentially with urban and economic activity characteristics. More specifically, results show that breaking and entering is relatively more frequent in central neighbourhoods, with
high commercial activity, but less so in areas with high numbers of office jobs (Table 9).

p. 26:

> Uttering threats, major assault and drug offences showed the closest association with access to socio-economic
resources. Other strong links were noted for mischief, motor vehicle theft, robbery, sexual assault and common
assault. Only other thefts (which exclude shoplifting, theft from a motor vehicle and motor vehicle theft) and breaking
and entering were not significantly associated with access to socio-economic resources.
Economic vulnerability was associated with generally serious violent crimes: robbery, major assault, sexual assault
and uttering threats. It was not related to common assault, harassment or any type of property crime.


## City of Toronto Neighbourhood Profiles

Neighbourhoods are aggregated from census tracts.

```{r}
neighbourhoods_dt <- fread("neighbourhood-profiles-2016-csv.csv")

neighbourhoods_dt <- neighbourhoods_dt[, -c("_id", "Category", "Topic", "Data Source", "City of Toronto")]
```

Select census variables.
```{r}
v <- c("Neighbourhood Number",
       "Population, 2016",
       "Land area in square kilometres",
       "Children (0-14 years)",
       "Seniors (65+ years)",
       "Private households by household size",
       "Average household size",
       "In low income based on the Low-income cut-offs, after tax (LICO-AT)",
       "Prevalence of low income based on the Low-income cut-offs, after tax (LICO-AT) (%)",
       "Renter",
       "Spending 30% or more of income on shelter costs",
       "University certificate, diploma or degree at bachelor level or above",
       "Unemployed (Males)",
       "Unemployment rate (Males)",
       "Public transit",
       "Walked",
       "Employment income: Average amount ($)",
       "Social assistance benefits: Population with an amount")
```

Assuring myself that "Renter" only occurs once, and therefore is not counted both for households
and persons but only for households.

```{r}
length(grep("Renter", neighbourhoods_dt$Characteristic))
```

```{r}
neighbourhoods_v <-
neighbourhoods_dt[Characteristic %in% v,]

neighbourhoods_v <- transpose(neighbourhoods_v)

head(neighbourhoods_v)
```

Land area is in square kilometers.
Children are children 0 to 14.
Households_unaffordable is the number of households spending 30% or more of income on shelter
costs: see Canada Mortgage and Housing Corporation, [About Affordable Housing in Canada](https://www.cmhc-schl.gc.ca/en/developing-and-renovating/develop-new-affordable-housing/programs-and-information/about-affordable-housing-in-canada).

```{r}
neighbourhoods_census <- neighbourhoods_v[!1,.(id=V1, Population=V2, Land_area=V3, Children=V4,
                                              Seniors=V5, Households=V6, Average_household_size=V7,
                                              LICO=V8, LICO_prevalence=V9, Renters=V10,
                                              Households_unaffordable=V11,
                                              Unemployed_males=V12, Unemployment_rate_males=V13,
                                              Public_transit_to_work=V14, Walk_to_work=V15,
                                              Average_employment_income=V16,
                                              Social_assistance_recipients=V17)]

head(neighbourhoods_census) 
```

```{r}
neighbourhoods_census$id <- as.integer(neighbourhoods_census$id)

neighbourhoods_census$Population <- as.integer(gsub(",", "", neighbourhoods_census$Population))

neighbourhoods_census$Land_area <- as.numeric(neighbourhoods_census$Land_area)

neighbourhoods_census$Children <- as.integer(gsub(",", "", neighbourhoods_census$Children))

neighbourhoods_census$Seniors <- as.integer(gsub(",", "", neighbourhoods_census$Seniors))

neighbourhoods_census$Households <- as.integer(neighbourhoods_census$Households)

neighbourhoods_census$Average_household_size <- as.numeric(neighbourhoods_census$Average_household_size)

neighbourhoods_census$LICO <- as.integer(gsub(",", "", neighbourhoods_census$LICO))

neighbourhoods_census$LICO_prevalence <- as.numeric(neighbourhoods_census$LICO_prevalence)

neighbourhoods_census$Renters <- as.integer(gsub(",", "", neighbourhoods_census$Renters))

neighbourhoods_census$Households_unaffordable <- as.integer(gsub(",", "", neighbourhoods_census$Households_unaffordable))

neighbourhoods_census$Unemployed_males <- as.integer(gsub(",", "", neighbourhoods_census$Unemployed_males))

neighbourhoods_census$Unemployment_rate_males <- as.numeric(neighbourhoods_census$Unemployment_rate_males)

neighbourhoods_census$Public_transit_to_work <- as.integer(gsub(",", "", neighbourhoods_census$Public_transit_to_work))

neighbourhoods_census$Walk_to_work <- as.integer(gsub(",", "", neighbourhoods_census$Walk_to_work))

neighbourhoods_census$Average_employment_income <- as.numeric(gsub(",", "", neighbourhoods_census$Average_employment_income))

neighbourhoods_census$Social_assistance_recipients <- as.integer(gsub(",", "", neighbourhoods_census$Social_assistance_recipients))

neighbourhoods_census <- neighbourhoods_census[order(id)]

str(neighbourhoods_census)
```


# Data manipulation

## MCI

```{r}
MCI_2018 <- MCI_dt[reportedyear==2018]

MCI_2018_nbd <- MCI_2018[, c("MCI", "Hood_ID")]

str(MCI_2018_nbd)
```

The MCI dataset classifies reports as
Assault, Auto Theft, Break and Enter, Robbery, and Theft Over.

```{r}
MCI_2018_grouped <- MCI_2018_nbd[,.(Number_of_reports=.N),by=.(id=Hood_ID, category=MCI)]

MCI_2018_grouped <- MCI_2018_grouped[order(id)]
```

```{r}
Assault_MCI <- MCI_2018_grouped[category=="Assault", .(Assault_reports=sum(Number_of_reports)), by=.(id)]

Auto_theft_MCI <- MCI_2018_grouped[category=="Auto Theft", .(Auto_theft_reports=sum(Number_of_reports)), by=.(id)]

BE_MCI <- MCI_2018_grouped[category=="Break and Enter", .(BE_reports=sum(Number_of_reports)), by=.(id)]

Robbery_MCI <- MCI_2018_grouped[category=="Robbery", .(Robbery_reports=sum(Number_of_reports)), by=.(id)]

Theft_over_MCI <- MCI_2018_grouped[category=="Theft Over", .(Theft_over_reports=sum(Number_of_reports)), by=.(id)]
```


## Merge demographic table and MCI table

```{r}
neighbourhoods_merged <- neighbourhoods_census

neighbourhoods_merged <- merge(neighbourhoods_merged, Assault_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Robbery_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, BE_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Theft_over_MCI, by="id", all=TRUE)

neighbourhoods_merged <- merge(neighbourhoods_merged, Auto_theft_MCI, by="id", all=TRUE)

#Robbery and Theft Over have missing values
neighbourhoods_merged[is.na(neighbourhoods_merged)] <- 0

str(neighbourhoods_merged)
```
Calculate ratios of MCI to population
```{r}
neighbourhoods_merged$Assault_ratio <- neighbourhoods_merged$Assault_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Auto_theft_ratio <- neighbourhoods_merged$Auto_theft_reports/neighbourhoods_merged$Population

neighbourhoods_merged$BE_ratio <- neighbourhoods_merged$BE_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Robbery_ratio <- neighbourhoods_merged$Robbery_reports/neighbourhoods_merged$Population

neighbourhoods_merged$Theft_over_ratio <- neighbourhoods_merged$Theft_over_reports/neighbourhoods_merged$Population
```

Calculate ratios of census variables to population

```{r}
toronto_avg_household_size <- neighbourhoods_merged[, mean(Average_household_size)]

toronto_avg_employment_income <- neighbourhoods_merged[, mean(Average_employment_income)]

toronto_avg_unemployment_rate_males <- neighbourhoods_merged[, mean(Unemployment_rate_males)]

toronto_avg_household_size

toronto_avg_employment_income

toronto_avg_unemployment_rate_males
```

```{r}
neighbourhoods_merged$Children_ratio <- neighbourhoods_merged$Children/neighbourhoods_merged$Population

neighbourhoods_merged$Seniors_ratio <- neighbourhoods_merged$Seniors/neighbourhoods_merged$Population

neighbourhoods_merged$Renters_ratio <- neighbourhoods_merged$Renters/neighbourhoods_merged$Households

neighbourhoods_merged$Households_unaffordable_ratio <- neighbourhoods_merged$Households_unaffordable/neighbourhoods_merged$Households

neighbourhoods_merged$Public_transit_to_work_ratio <- neighbourhoods_merged$Public_transit_to_work/neighbourhoods_merged$Population

neighbourhoods_merged$Social_assistance_recipients_ratio <- neighbourhoods_merged$Social_assistance_recipients/neighbourhoods_merged$Population

neighbourhoods_merged$Average_household_size_ratio <- neighbourhoods_merged$Average_household_size/toronto_avg_household_size

neighbourhoods_merged$Average_employment_income_ratio <- neighbourhoods_merged$Average_employment_income/toronto_avg_employment_income

neighbourhoods_merged$Unemployment_rate_males_ratio <- neighbourhoods_merged$Unemployment_rate_males/toronto_avg_unemployment_rate_males
```


# Choropleths by neighbourhood

## Shapefile

**Neighbourhoods (WGS84)**. City of Toronto, Social Development, Finance & Administration   

<https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#a45bd45a-ede8-730e-1abc-93105b2c439f>  

"neighbourhoods_planning_areas_wgs84.zip"

```{r}
nbds <- readOGR("C:/Users/14165/Desktop/Shapefiles/neighbourhoods_planning_areas_wgs84", "NEIGHBORHOODS_WGS84")
```
Add "id" column
```{r}
nbds@data$id <- as.integer(nbds@data$AREA_S_CD)
```
Make centroids of each neighbourhood, for placing labels when plotting
```{r}
nbds.centroids  <- as.data.frame(gCentroid(nbds, byid = TRUE))
```
Add "id" column
```{r}
nbds.centroids$id <- nbds@data$id
```
Shapefile processing
```{r}
nbds.points = fortify(nbds, region = "id")

nbds.df = join(nbds.points, nbds@data, by = "id")
```

## Merge neighbourhood shapefile and dataframe

```{r}
nbds_MCI <- merge(nbds.df, neighbourhoods_merged, by = "id")
```

## Low income cut-off (LICO) prevalence

Make and plot choropleth

```{r}
p.LICO_percent <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = LICO_prevalence/100), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="LICO prevalence", labels=percent_format(accuracy=1), palette = "RdPu", trans = "reverse", breaks = pretty_breaks(n = 10)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="LICO households/total households, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.LICO_percent + guides(fill = guide_legend(reverse = TRUE))
```

## Assaults

Make and plot choropleth
```{r}
p.assaults <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = Assault_reports), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="Assaults", palette = "YlOrRd", trans = "reverse", breaks = pretty_breaks(n = 8)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="Number of assault reports in Toronto by neighbourhood, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.assaults + guides(fill = guide_legend(reverse = TRUE))
```

```{r}
p.assaults_ratio <- ggplot() +
  geom_polygon(data = nbds_MCI, 
               aes(x = long, y = lat, group = group, fill = Assault_ratio), 
               color = "black", size = 0.2) + 
  coord_map() + 
  scale_fill_distiller(name="Assaults/Population", palette = "Reds", trans = "reverse", breaks = pretty_breaks(n = 8)) + 
  theme_nothing(legend = TRUE) + 
  labs(title="Number of assault reports/population count in Toronto by neighbourhood, 2018") + 
  geom_text(aes(x=x,y=y, group=NULL, label=id), data = nbds.centroids, size = 2)

p.assaults_ratio + guides(fill = guide_legend(reverse = TRUE))
```



# Density Maps of MCI Using stat_density2d

<https://stats.stackexchange.com/questions/31726/scatterplot-with-contour-heat-overlay>

<https://gist.github.com/lmullen/8375785>

<https://gis.stackexchange.com/questions/165974/r-fortify-causing-polygons-to-tear>

```{r}
MCI_Assault_XY <- MCI_dt[MCI=="Assault",.(X,Y)]

MCI_Auto_Theft_XY <- MCI_dt[MCI=="Auto Theft",.(X,Y)]

MCI_BE_XY <- MCI_dt[MCI=="Break and Enter",.(X,Y)]

MCI_Robbery_XY <- MCI_dt[MCI=="Robbery", .(X,Y)]

MCI_Theft_Over_XY <- MCI_dt[MCI=="Theft Over", .(X,Y)]
```

## Shapefiles

Read shapefiles

```{r}
torontoBoundary_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/torontoBoundary_wgs84", "citygcs_regional_mun_wgs84")

TTC_subway_lines_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/TTC_subway_lines_wgs84", "TTC_SUBWAY_LINES_WGS84")

centreline_wgs84 <- readOGR("C:/Users/14165/Desktop/Shapefiles/centreline_wgs84", "CENTRELINE_WGS84")
```

> Every linear feature has feature code (FCODE) defined as follow:
>
> 201100	Highway  
> 201101	Highway Ramp  
> 201200	Major Arterial Road  
> 201201	Major Arterial Road Ramp  
> 201300	Minor Arterial Road  
> 201301	Minor Arterial Road Ramp  
> 201400	Collector Road  
> 201401	Collector Road Ramp  
> 201500	Local Road  
> 201600	Other Road  
> 201601	Other Ramp  
> 201700	Laneways  
> 201800	Pending  
> 201803  Access Road  
> 201801  Busway  
> 202001	Major Railway  
> 202002	Minor Railway  
> 202003	Railway under construction/proposed  
> 203001	River  
> 203002	Creek/Tributary  
> 204001	Trail  
> 204002	Walkway  
> 205001	Hydro Line  
> 206001	Major Shoreline  
> 206002	Minor Shoreline (Land locked)  


```{r}
centreline_wgs84_major <- centreline_wgs84[centreline_wgs84@data$FCODE %in% c(201100, 201200, 201300, 201400),]
```

```{r}
torontoBoundary_wgs84.df <- fortify(torontoBoundary_wgs84)

TTC_subway_lines_wgs84.df <- fortify(TTC_subway_lines_wgs84)

centreline_wgs84_major.df <- fortify(centreline_wgs84_major)
```

```{r}
ggplot()+geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .2)
```

## Assault Reports in Toronto, 2014-2018

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Assault_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red")+
 theme_nothing(legend = TRUE) + 
  labs(title="Assault Reports in Toronto, 2014-2018")
```

## Auto Theft Reports in Toronto, 2014-2018

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Auto_Theft_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Auto Theft Reports in Toronto, 2014-2018")
```

Change bandwidth parameter h

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Auto_Theft_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, h=0.05, n=300, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Auto Theft Reports in Toronto, 2014-2018. h=0.05")
```


## Break and Enter Reports in Toronto, 2014-2018

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_BE_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Break and Enter Reports in Toronto, 2014-2018")
```

## Robbery Reports in Toronto, 2014-2018

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Robbery_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Robbery Reports in Toronto, 2014-2018")
```

## Theft Over $5000 Reports in Toronto, 2014-2018

```{r}
ggplot() + 
  geom_polygon(data = torontoBoundary_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'black', size = 1, fill=NA) +
  geom_path(data = TTC_subway_lines_wgs84.df, aes(x = long, y = lat, group = group),
          color = 'red', size = 1) +
  geom_path(data = centreline_wgs84_major.df, aes(x = long, y = lat, group = group),
          color = 'black', size = .1) +
  stat_density2d(data=MCI_Theft_Over_XY, aes(x=X, y=Y, fill=..level..), alpha=0.2, geom = 'polygon', colour = 'black', contour=TRUE) +
  scale_fill_continuous(low="yellow",high="red") +
 theme_nothing(legend = TRUE) + 
  labs(title="Theft Over $5000 Reports in Toronto, 2014-2018")
```


# Correlations Between MCI and Demographics

```{r}
str(neighbourhoods_merged)
```

```{r}
neighbourhoods_ratios <-
  neighbourhoods_merged[, c("Assault_ratio",
                            "Auto_theft_ratio",
                            "BE_ratio",
                            "Robbery_ratio",
                            "Theft_over_ratio",
                            "Average_household_size",
                            "LICO_prevalence",
                            "Children_ratio",
                            "Seniors_ratio",
                            "Renters_ratio",
                            "Public_transit_to_work_ratio",
                            "Social_assistance_recipients_ratio",
                            "Average_household_size_ratio",
                            "Average_employment_income_ratio",
                            "Unemployment_rate_males_ratio")]
```

Compute correlations; there are only weak correlations between the MCI and demographic ratios I have selected.

```{r}
cor(as.matrix(neighbourhoods_ratios))
```


# Time Series

```{r}
str(MCI_dt)
```

```{r}
MCI_dt_dates <- MCI_dt[,.(reportedyear,reportedmonth,MCI)]

MCI_dt_dates$reportedmonth <- match(MCI_dt_dates$reportedmonth, month.name)

str(MCI_dt_dates)
```

## Assault Reports Time Series

```{r}
Assault_dt <- MCI_dt_dates[MCI=="Assault",.N, by = .(reportedyear,reportedmonth)]

Assault_dt <- Assault_dt[order(reportedyear,reportedmonth)]

str(Assault_dt)
```

```{r}
Assault.ts <- ts(Assault_dt$N, start = 2014, frequency = 12)
```

```{r}
autoplot(Assault.ts)
```

```{r}
Assault.ts.components <- decompose(Assault.ts)
```

```{r}
autoplot(Assault.ts.components)
```

```{r}
Assault.ts.stl <- stl(Assault.ts, s.window = "periodic")
```

```{r}
autoplot(Assault.ts.stl)
```

```{r}
Assault.ts.arima <- auto.arima(Assault.ts)

Assault.ts.arima
```

```{r}
Assault.ts.arima.forecast <- forecast(Assault.ts.arima, level = c(95), h = 12)

autoplot(Assault.ts.arima.forecast)
```

## Theft Over $5000 Time Series

```{r}
Theft_Over_dt <- MCI_dt_dates[MCI=="Theft Over",.N, by = .(reportedyear,reportedmonth)]

Theft_Over_dt <- Theft_Over_dt[order(reportedyear,reportedmonth)]

Theft_Over.ts <- ts(Theft_Over_dt$N, start = 2014, frequency = 12)

Theft_Over.ts.components <- decompose(Theft_Over.ts)

Theft_Over.ts.stl <- stl(Theft_Over.ts, s.window = "periodic")
```

```{r}
autoplot(Theft_Over.ts.components)
```

```{r}
autoplot(Theft_Over.ts.stl)
```


```{r}
Theft_Over.ts.arima <- auto.arima(Theft_Over.ts)

Theft_Over.ts.arima
```
