Modern GIS toolkit for Python - Simplifying geospatial workflows with built-in data sources, intelligent caching, and fluent APIs
The Universal IO System is the cornerstone of PyMapGIS, providing a unified interface for reading geospatial data from any source through the pmg.read()
function. This system abstracts away the complexity of different data formats, sources, and protocols behind a simple, consistent API.
Universal IO System
├── read() Function # Main entry point
├── DataSourceRegistry # Plugin management
├── DataSourcePlugin # Base plugin class
├── FormatDetector # Automatic format detection
├── CacheManager # Intelligent caching
└── URLParser # URL scheme handling
User Request → URL Parsing → Plugin Selection →
Cache Check → Data Retrieval → Format Processing →
Validation → Caching → Return GeoDataFrame/DataArray
census://
- US Census Bureau data (ACS, Decennial)tiger://
- TIGER/Line geographic boundariesfile://
- Local file systemhttp://
/ https://
- Remote HTTP resourcess3://
- Amazon S3 storagegs://
- Google Cloud Storageazure://
- Azure Blob Storageftp://
- FTP serversscheme://[authority]/[path]?[query]#[fragment]
Examples:
census://acs/acs5?year=2022&geography=county&variables=B01003_001E
tiger://county?year=2022&state=06
file://./data/counties.shp
s3://bucket/path/to/data.geojson
from abc import ABC, abstractmethod
from typing import Union, Dict, Any
import geopandas as gpd
import xarray as xr
class DataSourcePlugin(ABC):
"""Base class for data source plugins."""
@property
@abstractmethod
def schemes(self) -> List[str]:
"""URL schemes handled by this plugin."""
pass
@abstractmethod
def can_handle(self, url: str) -> bool:
"""Check if plugin can handle the given URL."""
pass
@abstractmethod
def read(self, url: str, **kwargs) -> Union[gpd.GeoDataFrame, xr.DataArray]:
"""Read data from the given URL."""
pass
@abstractmethod
def get_metadata(self, url: str) -> Dict[str, Any]:
"""Get metadata about the data source."""
pass
census://
)Purpose: Access US Census Bureau data
Features:
URL Format:
census://dataset/table?parameters
Examples:
census://acs/acs5?year=2022&geography=county&variables=B01003_001E
census://decennial/pl?year=2020&geography=state&variables=P1_001N
Implementation Details:
tiger://
)Purpose: Access TIGER/Line geographic boundaries
Features:
URL Format:
tiger://geography?parameters
Examples:
tiger://county?year=2022&state=06
tiger://tract?year=2022&state=06&county=001
file://
)Purpose: Read local geospatial files
Features:
URL Format:
file://path/to/file
Examples:
file://./data/counties.shp
file:///absolute/path/to/data.geojson
http://
, https://
)Purpose: Read remote geospatial data
Features:
URL Format:
https://example.com/path/to/data.format
Examples:
https://example.com/data/counties.geojson
https://api.example.com/data?format=geojson
Purpose: Access cloud-stored geospatial data
S3 Plugin (s3://
):
# Configuration
pmg.settings.aws_access_key_id = "your_key"
pmg.settings.aws_secret_access_key = "your_secret"
# Usage
data = pmg.read("s3://bucket/path/to/data.geojson")
GCS Plugin (gs://
):
# Configuration
pmg.settings.google_credentials = "path/to/credentials.json"
# Usage
data = pmg.read("gs://bucket/path/to/data.geojson")
The system automatically detects data formats based on:
.geojson
, .json
).shp
+ supporting files).gpkg
).kml
, .kmz
).gml
).csv
).tif
, .tiff
).nc
).zarr
).h5
, .hdf5
)# Vector data returns GeoDataFrame
counties = pmg.read("file://counties.shp")
assert isinstance(counties, gpd.GeoDataFrame)
# Raster data returns DataArray
elevation = pmg.read("file://elevation.tif")
assert isinstance(elevation, xr.DataArray)
Cache keys are generated based on:
import pymapgis as pmg
# Check cache status
stats = pmg.cache.stats()
# Clear specific cache
pmg.cache.clear(pattern="census://*")
# Purge old cache entries
pmg.cache.purge(older_than="7d")
class PyMapGISError(Exception):
"""Base exception for PyMapGIS."""
pass
class DataSourceError(PyMapGISError):
"""Error in data source operations."""
pass
class FormatError(PyMapGISError):
"""Error in format detection or processing."""
pass
class CacheError(PyMapGISError):
"""Error in caching operations."""
pass
from pymapgis.io.base import DataSourcePlugin
class CustomPlugin(DataSourcePlugin):
@property
def schemes(self):
return ["custom"]
def can_handle(self, url):
return url.startswith("custom://")
def read(self, url, **kwargs):
# Custom implementation
return gdf
def get_metadata(self, url):
return {"source": "custom", "format": "geojson"}
# Register plugin
pmg.io.register_plugin(CustomPlugin())
from pymapgis.io.formats.base import FormatHandler
class CustomFormatHandler(FormatHandler):
@property
def extensions(self):
return [".custom"]
def can_handle(self, path_or_url):
return path_or_url.endswith(".custom")
def read(self, path_or_url, **kwargs):
# Custom format reading logic
return gdf
# Register handler
pmg.io.register_format_handler(CustomFormatHandler())
import pymapgis as pmg
# Cache configuration
pmg.settings.cache_dir = "./custom_cache"
pmg.settings.cache_ttl = "1d"
# Network configuration
pmg.settings.request_timeout = 30
pmg.settings.max_retries = 3
# Data source configuration
pmg.settings.census_api_key = "your_key"
pmg.settings.default_crs = "EPSG:4326"
# Cache settings
export PYMAPGIS_CACHE_DIR="./cache"
export PYMAPGIS_CACHE_TTL="1d"
# API keys
export CENSUS_API_KEY="your_key"
# Network settings
export PYMAPGIS_REQUEST_TIMEOUT="30"
export PYMAPGIS_MAX_RETRIES="3"
from pymapgis.testing import MockDataSource
# Create mock for testing
mock_source = MockDataSource(
scheme="test",
data=test_geodataframe
)
with mock_source:
data = pmg.read("test://sample")
assert data.equals(test_geodataframe)
Next: Vector Operations for spatial vector processing details