Data Bolts - Codedex Hackathon 2024

Some Context...

Data Bolt is a comprehensive tool designed to visualize Olympic data and predict future performance trends of various countries. By utilizing historical data and machine learning models, it generates insightful graphs and predictions, helping analysts and enthusiasts to understand and forecast Olympic outcomes.

Predictive Analysis with Linear Regression

To predict the expected performance of countries in future Olympic events, we employed linear regression models. Linear regression is a simple yet powerful statistical method for modeling the relationship between a dependent variable and one or more independent variables. In this project, the historical performance data of countries, including the number of medals won in past Olympics, served as our dataset.

Process:

Data Collection: We gathered historical data on Olympic performances from various sources, including medal counts, host countries, and athlete details.
Data Cleaning: The data was cleaned and preprocessed to handle missing values, outliers, and ensure consistency.
Feature Selection: Relevant features such as past performance, GDP, population, and other socio-economic indicators were selected.
Model Training: Using the cleaned data, we trained a linear regression model to learn the patterns and trends.
Prediction: The trained model was used to predict future performance, allowing us to estimate the number of medals countries might win in upcoming Olympic events.

Example Code Snippet:


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import pandas as pd

# Load data
data = pd.read_csv('Data/Olympics/olympic_medals.csv')

# Feature selection and target variable
X = data[['Year', 'GDP', 'Population']]
y = data['Medal_Count']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

Automation with Python Scripting

To streamline the data processing and analysis, we utilized Python scripting to automate various tasks, including compiling results into metadata, generating graphs, and creating reports.

Steps:

Data Compilation: Automated scripts read data from multiple sources and compile them into a unified dataset.
Graph Generation: Visualization scripts use libraries like matplotlib and seaborn to create graphs that represent the data visually.
Report Generation: Scripts generate detailed reports that include graphs, statistical summaries, and predictions.

Example Code Snippet for Graph Generation:


import matplotlib.pyplot as plt
import seaborn as sns

# Load data
data = pd.read_csv('Data/Olympics/olympic_medals.csv')

# Plotting
plt.figure(figsize=(10, 6))
sns.lineplot(data=data, x='Year', y='Medal_Count', hue='Country')
plt.title('Olympic Medal Count Over the Years')
plt.xlabel('Year')
plt.ylabel('Medal Count')
plt.legend(title='Country')
plt.savefig('Webpage/static/Figure_1.png')

Getting Started

Dependencies

Python 3.6 or higher
pandas
numpy
seaborn
matplotlib
scikit-learn

To install the necessary Python libraries, you can use the following command:


pip install pandas numpy seaborn matplotlib scikit-learn

Installing

Clone the repository:


git clone https://github.com/yourusername/databolt.git

Navigate to the project directory:
```
cd databolt

                        
```
Ensure the data files are in the correct directory structure as expected by the scripts.

Executing program

Ensure the data files are in the following paths:
- `Data/Olympics/olympic_hosts.csv`
- `Data/Olympics/olympic_medals.csv`
- `Data/Olympics/olympic_results.csv`
- `Data/Olympics/olympic_athletes.csv`
- `Data/Olympics/Summer-Olympic-medals-1976-to-2008.csv`

Run the visualization script:


python DataSetVisualier9000.py

Run the prediction script:


python OlymPicsGenerator500.py

The first script (DataSetVisualiser9000) will generate a singular graph, which can be saved if wanted. The second script (OlymPicsGenerator500) will generate plots based on the provided data and save them to an output folder directory.

Help

For common issues or further assistance, you can use the following command:


python -m pip help

Check if all required libraries are installed and the data paths are correct.

Authors

Contributors names and contact info:

Faisal Mujawar Linkedin
Mei Li Garcia O’Neil Linkedin
Anthony Padilla Linkedin

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

Inspiration, code snippets, etc.:

Olympics 2024 Prediction

Brought to you by Data Bolt