Skip to content

ismailolatunji/Crime-Analysis-Using-PySpark-and-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Crime Analysis Using PySpark and Python

Table of Contents

Project Overview

This project performs crime data analysis using Apache Spark and Python. The dataset includes police-recorded crimes and offense codes. The goal is to process, clean, and analyze the data to identify key crime trends and patterns, which can aid law enforcement agencies, policymakers, and researchers in understanding crime dynamics and making data-driven decisions.

Significance

  • Understanding crime patterns is crucial for law enforcement and urban planning. By analyzing historical crime data, authorities can:

  • Identify high-crime areas and allocate resources accordingly.

  • Detect seasonal or time-based crime trends.

  • Improve public safety policies through data-driven insights.

  • Assist in predictive policing and crime prevention strategies.

  • Leverage machine learning for predictive modeling of future crime trends.

Datasets Used

  1. Police Recorded Crime (PRC) Dataset: Contains records of reported crimes across different categories from 2013 to 2023.

  2. Offense Codes Dataset: Provides descriptions and classifications for different types of offenses.

Features and Steps

Environment Setup:

  • Installation of Java, Apache Spark, and necessary Python libraries.

Screenshot 2025-03-11 155329

Screenshot 2025-03-11 155516

  • Configuration of SparkSession for distributed data processing.

Screenshot 2025-03-11 155532

Data Preprocessing

  • Cleaning the dataset
  • Renaming & Standardising the data
  • Handling missing/null values to ensure data integrity.
  • Merging multiple data sources into a unified dataset.
  • Creating Spark DataFrames and temporary SQL tables for analysis.

Screenshot 2025-03-11 160941

Screenshot 2025-03-11 155924

Screenshot 2025-03-11 160223

Screenshot 2025-03-11 160258

Exploratory Data Analysis

  • Identifying the most common offenses.

  • Aggregating crime counts across various dimensions (time, location, type of crime).

  • Visualizing trends using Matplotlib and Seaborn.

Key Research Questions:

1. What is the most frequently committed offense?

Screenshot 2025-03-11 160420

2. Which law enforcement agencies (Force Names) have the highest number of recorded offences?

Screenshot 2025-03-11 161245

3. How does the distribution of offences vary across different financial quarters?

Screenshot 2025-03-11 161400

Technologies Used

  • Apache Spark: For large-scale distributed data processing.

  • Python: Using Pandas, PySpark, Matplotlib, and Seaborn for analysis and visualization.

  • SQL Queries: For data aggregation, filtering, and trend analysis.

  • Machine Learning (PySpark MLlib): Predictive modeling to anticipate future crime trends.

Recommendations

  • Resource Allocation: Deploy more law enforcement personnel in high-crime areas based on identified trends.

  • Public Awareness Campaigns: Educate citizens on crime-prone areas and preventive measures.

  • Predictive Policing: Use data insights to anticipate and prevent future crimes.

  • Policy Implementation: Governments can use the findings to introduce policies that address high-risk crime categories.

  • Further Research: Conduct deeper analysis by incorporating socioeconomic and demographic factors to understand root causes.

Insights and Findings

  • The dataset highlights specific high-frequency offenses, enabling better resource distribution.

  • Crime occurrences show seasonal and geographical trends, which can aid in crime prevention strategies.

  • Data visualization provides insights into crime distributions, helping in policymaking and law enforcement strategies.

image

image

About

Distributed analysis of 10+ years of UK police crime data using PySpark to identify geographic concentration patterns and long-term trend dynamics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors