This repository contains code and data for analyzing real estate trends, predicting house prices, estimating time on the market, and building an interactive dashboard for visualization. It is structured to cater to data scientists, real estate analysts, and developers looking to understand property market dynamics.
- Introduction
- Project Structure
- House Price Prediction
- Time on Market Prediction
- Real Estate Dashboard
- Data Description
- How to Run
- Results and Visualizations
Real estate is one of the most dynamic industries influenced by various factors such as location, property features, market trends, and economic conditions. This project aims to predict:
- House prices using machine learning techniques.
- Time a house will remain on the market.
Additionally, an interactive dashboard is built to visualize real estate data trends and predictions.
- House_price_prediction.ipynb: Notebook for house price prediction using regression models.
- Time_on_market_prediction.ipynb: Notebook to estimate the time a property will stay on the market.
- House_price_prediction_dash_app.ipynb: Code to create a Dash app for real estate data visualization.
- train.csv: Training dataset containing property details and market data.
- data_description.txt: Detailed description of each feature in the dataset.
The notebook House_price_prediction.ipynb
uses regression techniques to predict house prices based on various features like location, size, condition, and amenities.
-
Data Preprocessing:
- Handle missing values.
- Encode categorical features.
- Scale numeric features.
-
Model Building:
- XGBoost, LightGBM, and CatBoost regressors.
- Combine predictions using an ensemble approach.
-
Evaluation:
- RMSE and R2 scores.
- XGBoost: RMSE: 26656.93, R²: 90.74%
- LightGBM: RMSE: 28984.71, R²: 89.05%
- CatBoost: RMSE: 27045.08, R²: 90.46%
The final predictions are averaged across the three models and saved to submission.csv
.
Time_on_market_prediction.ipynb
estimates the number of days a property will stay on the market using features like price, location, and condition.
-
Feature Engineering:
- Derive features like price per square foot.
- Create binary variables for high-demand neighborhoods.
-
Modeling:
- Random Forest Regressor with 500 estimators.
-
Evaluation:
- RMSE and R2 scores.
- Time on Market Prediction:
- RMSE: 8.03
- R²: 99.94%
The predictions are saved to time_on_market_submission.csv
.
The House_price_prediction_dash_app.ipynb
creates an interactive dashboard using the Dash framework.
- Interactive visualizations for:
- Price trends by neighborhood.
- Correlation between features and price.
- Renovation trends and quality analysis.
- Geospatial maps for SalePrice distribution.
- Filters to explore specific property types, locations, or price ranges.
-
SalePrice Over Time (Year vs. Month) A heatmap showing average
SalePrice
distribution byYear
andMonth
. -
Living Area vs SalePrice Interactive scatter plot with a dropdown filter for Neighborhood.
-
Overall Quality vs SalePrice Boxplot showing the impact of
Overall Quality
onSalePrice
with a slider for construction year. -
Average SalePrice by Neighborhood Horizontal bar chart ranking neighborhoods by average SalePrice.
-
SalePrice Distribution by Garage Type and Building Type Boxplots categorizing SalePrice based on
GarageType
andBldgType
. -
Price Impact of Renovations Visualizing SalePrice difference for renovated vs. non-renovated properties.
-
Neighborhood Map Geospatial map displaying SalePrice distribution with latitude and longitude filters.
-
Correlation Heatmap Heatmap showing feature correlations.
-
Properties Built Per Year Histogram of property counts by construction year.
-
Yearly Renovation Trends Bar chart showing the number of renovations across years.
The dashboard is built using Dash and Plotly. Key components include:
@app.callback(
Output("scatter_plot", "figure"),
Input("neighborhood_filter", "value")
)
def update_scatter(neighborhoods):
filtered_data = data if not neighborhoods else data[data['Neighborhood'].isin(neighborhoods)]
fig = px.scatter(
filtered_data, x="GrLivArea", y="SalePrice", color="SalePrice",
color_continuous_scale="Spectral", title="Living Area vs. SalePrice",
labels={"GrLivArea": "Living Area (sqft)", "SalePrice": "Sale Price"}
)
fig.update_layout(plot_bgcolor="#F5F5DC", title_font_color="#4B0082")
return fig
- Navigate to the
House_price_prediction_dash_app.ipynb
file. - Run the script to start the Dash server:
python house_price_prediction_dash_app.py
- Open the browser at
http://127.0.0.1:8050
to interact with the dashboard.
The dataset train.csv
is accompanied by data_description.txt
, which provides detailed metadata for each column. Below are some key features:
- MSSubClass: Identifies the type of dwelling involved in the sale.
- MSZoning: General zoning classification (e.g., Residential, Commercial).
- LotFrontage: Linear feet of street connected to the property.
- GrLivArea: Above grade (ground) living area square feet.
- OverallQual: Rates the overall material and finish of the house.
- YearBuilt: Original construction date of the house.
- YearRemodAdd: Remodel date (if no remodeling, equals YearBuilt).
- Neighborhood: Physical locations within Ames city limits.
- SalePrice: Sale price of the property (target variable).
For a complete description, refer to data_description.txt
.
- Clone the repository:
git clone https://github.com/yourusername/real-estate-analysis.git
- Install dependencies:
pip install -r requirements.txt
- Run Jupyter notebooks for prediction:
jupyter notebook
- Launch the dashboard:
python house_price_prediction_dash_app.py
This project is licensed under the Apache License. See LICENSE
for more details.
Contributions are welcome. Please create a pull request or open an issue for suggestions and improvements.