Geospatial Data Analysis and Mapping Guide

Analyze location-based datasets to uncover geographic patterns, build heat maps, optimize service areas, and make data-driven decisions about physical expansion or logistics.

Prompt Template

You are a geospatial data analyst. Help me analyze location-based data to uncover geographic patterns and make strategic decisions.

**Dataset description:** [e.g., 50,000 customer orders with lat/long coordinates, timestamps, and order values across 15 cities]
**Business context:** [e.g., food delivery startup deciding where to open 3 new dark kitchens]
**Available tools:** [e.g., Python with pandas, geopandas, folium; or SQL + Google BigQuery GIS]
**Key questions to answer:** [e.g., Where are demand hotspots? Which areas are underserved? What's the optimal radius for each kitchen?]
**Constraints:** [e.g., each kitchen covers max 5km radius, budget for 3 locations]

Provide:
1. **Data Preparation** — cleaning steps for geographic data (handling invalid coordinates, projections, geocoding)
2. **Exploratory Spatial Analysis** — code to generate:
   - Point density heat maps
   - Cluster analysis (DBSCAN or K-means on coordinates)
   - Spatial autocorrelation (Moran's I)
3. **Service Area Optimization** — algorithm to find optimal facility locations:
   - Weighted center-of-gravity method
   - Coverage maximization with radius constraints
   - Demand-weighted scoring per candidate location
4. **Visualization Code** — interactive maps with folium/plotly showing:
   - Customer density layers
   - Proposed locations with coverage circles
   - Revenue potential per zone
5. **Decision Framework** — ranked location recommendations with supporting metrics
6. **Monitoring Plan** — how to track location performance post-launch

Example Output

Geospatial Analysis: Dark Kitchen Location Optimization

1. Data Preparation

import geopandas as gpd

import pandas as pd

from shapely.geometry import Point

# Clean coordinates

df = df[(df['lat'].between(-90, 90)) & (df['lng'].between(-180, 180))]

df['geometry'] = df.apply(lambda r: Point(r['lng'], r['lat']), axis=1)

gdf = gpd.GeoDataFrame(df, crs='EPSG:4326')

# Project to local CRS for distance calculations (meters)

gdf_proj = gdf.to_crs('EPSG:3857')

3. Optimal Location Analysis

from sklearn.cluster import KMeans

import numpy as np

# Demand-weighted K-means (k=3 for 3 kitchens)

coords = gdf_proj[['geometry']].apply(

lambda g: [g.geometry.x, g.geometry.y], axis=1,

result_type='expand'

)

weights = gdf_proj['order_value'].values

kmeans = KMeans(n_clusters=3, random_state=42)

kmeans.fit(coords, sample_weight=weights)

5. Ranked Recommendations

| Rank | Location | Orders in 5km | Revenue Potential | Competition |

|------|----------|---------------|-------------------|-------------|

| 1 | Downtown East (40.72, -73.99) | 12,400 | $890K/yr | Low |

| 2 | Midtown West (40.76, -73.98) | 9,800 | $720K/yr | Medium |

| 3 | Brooklyn Heights (40.69, -73.99) | 8,200 | $615K/yr | Low |

Tips for Best Results

  • 💡Always validate coordinate data before analysis — even 1% invalid points can skew cluster centers significantly
  • 💡Use projected coordinate systems (meters) for distance calculations, not raw lat/long degrees
  • 💡Combine geospatial analysis with temporal patterns — demand at 6pm vs 6am may suggest very different optimal locations
  • 💡Interactive maps are far more persuasive to stakeholders than static charts — use folium or Kepler.gl for presentations