Skip to content

Embeddings

2026

  • Streaming millions of TESSERA tiles over HTTP with Zarr v3: Describes how TESSERA's geospatial embedding system was restructured from millions of individual NumPy files into sharded Zarr v3 stores per year, enabling efficient HTTP range requests for single-pixel to regional data retrieval with xarray/dask compatibility. [Keywords: TESSERA Zarr embeddings HTTP geospatial xarray dask cloud native]

  • The Technical Debt of Earth Embedding Products: Examines fragmentation and interoperability challenges in Earth embedding products, arguing that standardizing how embeddings are distributed, stored, and accessed is the real bottleneck for geospatial foundation models. [Keywords: embeddings geospatial foundation models interoperability technical debt cloud native]

2025

  • GeoVibes: A geospatial tool for evaluating embedding models through interactive similarity search, using geoparquet and Python for nearest-neighbor queries and binary classifier training with spatial cross-validation. [Keywords: embeddings geospatial similarity search geoparquet Python classification]

  • SkyScript: A large, semantically diverse image-text dataset for remote sensing containing 5.2 million image-text pairs with 29,000+ semantic tags, designed for vision-language model (CLIP) development. [Keywords: VLM CLIP satellite imagery text remote sensing embeddings dataset]

  • Scalable Geospatial Data Generation Using AlphaEarth Foundations Model: Paper on using AlphaEarth foundation model embeddings for transfer learning in forest monitoring applications. [Keywords: foundation model embeddings AlphaEarth forest transfer learning]

  • TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis: Foundation model paper generating 128-dimensional embeddings from satellite time-series for land classification and canopy height prediction at 10-meter global resolution. [Keywords: foundation model embeddings time series Sentinel-2 land classification canopy height]

  • TESSERA GitHub: Open-source implementation of the TESSERA foundation model that processes satellite time-series imagery to generate embeddings for Earth observation tasks. [Keywords: foundation model embeddings satellite Python open source]

  • What Do Embeddings Actually Encode in Earth Observation Foundation Models?: LinkedIn post discussing what semantic information EO foundation model embeddings actually capture. [Keywords: embeddings foundation models Earth observation semantics]

  • Air Quality Using Satellite Embedding: Preprint on using satellite-derived embeddings for air quality estimation and monitoring. [Keywords: air quality satellite embeddings remote sensing]

  • Text Embeddings for Semantic Search with Overture: Research on text embedding-based semantic search over Overture Maps places dataset. [Keywords: embeddings semantic search Overture Maps NLP geospatial]

  • OSM Embeddings - SRAI: SRAI (Spatial Representations for AI) Python library for geospatial machine learning on vector geometries, enabling spatial data download, regionalization, and vector embeddings for ML tasks. [Keywords: OSM embeddings spatial AI Python geospatial ML]

Earlier

  • AlphaEarthFire: AlphaEarth × MODIS burn dataset builder and model trainer using AEF embeddings to model slow fire variables and predict forest fires. [Keywords: embeddings fire prediction MODIS AlphaEarth foundation model Python]