API Documentation

Backend API and data services

Key Features

  • Temperature data endpoints
  • Data processing and analysis
  • Authentication and authorization

TempHist API

A FastAPI backend for historical temperature data using Visual Crossing with comprehensive caching, rate limiting, and monitoring capabilities.

πŸš€ Features

  • Historical Temperature Data: 50 years of temperature data for any location
  • Enhanced Caching: Cloudflare-optimized caching with ETags and conditional requests
  • Async Job Processing: Heavy computations handled asynchronously with job tracking
  • Rate Limiting: Built-in protection against API abuse and misuse
  • Weather Forecasts: Current weather data and forecasts
  • Performance Monitoring: Built-in profiling and monitoring tools
  • Cache Prewarming: Automated cache warming for popular locations
  • CORS Enabled: Ready for web applications
  • Production Ready: Deploy on Railway or Render

πŸ“š Documentation

πŸ“‹ Requirements

  • Python 3.8+
  • Redis server (local or cloud) - Required
  • PostgreSQL database (local or cloud) - Optional, for persistent cache with location aliasing
  • Visual Crossing API key

βš™οΈ Configuration

Create a .env file with the following variables:

Required API Keys

VISUAL_CROSSING_API_KEY=your_key_here OPENWEATHER_API_KEY=your_key_here API_ACCESS_TOKEN=your_key_here

Redis Configuration

REDIS_URL=redis://localhost:6379 # Optional, defaults to localhost CACHE_ENABLED=true # Optional, defaults to true

PostgreSQL Configuration (for persistent cache)

TEMPHIST_PG_DSN=postgresql://user:password@localhost:5432/temphist # Optional

Or use DATABASE_URL (Heroku/Railway standard)

DATABASE_URL=postgresql://user:password@localhost:5432/temphist # Optional

Debugging

DEBUG=false # Set to true for development

Rate Limiting Configuration

RATE_LIMIT_ENABLED=true # Defaults to true MAX_LOCATIONS_PER_HOUR=10 # Defaults to 10 MAX_REQUESTS_PER_HOUR=100 # Defaults to 100 RATE_LIMIT_WINDOW_HOURS=1 # Defaults to 1

Service Token Rate Limiting (for API_ACCESS_TOKEN)

High limits for cache warming, but protects against abuse if token is compromised

SERVICE_TOKEN_REQUESTS_PER_HOUR=5000 # Defaults to 5000 requests/hour SERVICE_TOKEN_LOCATIONS_PER_HOUR=500 # Defaults to 500 unique locations/hour SERVICE_TOKEN_WINDOW_HOURS=1 # Defaults to 1 hour sliding window

CORS Configuration

CORS_ORIGINS=https://yourdomain.com,https://staging.yourdomain.com # Comma-separated list of allowed origins CORS_ORIGIN_REGEX=^https://.*\.yourdomain\.com$ # Regex pattern for allowed origins

IP Address Management

IP_WHITELIST=192.168.1.100,10.0.0.5 # IPs exempt from rate limiting IP_BLACKLIST=192.168.1.200,10.0.0.99 # IPs blocked entirely

Data Filtering

FILTER_WEATHER_DATA=true # Filter to essential temperature data only

Authentication Tokens

API_ACCESS_TOKEN=your_api_token_here # API access token for automated systems

πŸ› οΈ Installation

  1. Clone the repository

   git clone 
   cd TempHist-api
   

  1. Create virtual environment

   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   

  1. Install dependencies

   pip install -r requirements.txt
   

  1. Set up environment

   cp .env.example .env  # Create from template
   # Edit .env with your API keys
   

  1. Start Redis (if running locally)

   redis-server
   

  1. Start PostgreSQL (if running locally and using persistent cache)

   # macOS with Homebrew
   brew services start postgresql

# Linux with systemd sudo systemctl start postgresql

# Or use Docker docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=password postgres:15

# Create database createdb temphist
  1. Start the job worker service (in a separate terminal)

   python worker_service.py
   

  1. Run the development server (in another terminal)
   uvicorn main:app --reload
   

πŸš€ Enhanced Caching System

The API now includes a comprehensive multi-layer caching system optimized for Cloudflare and high performance:

Multi-Layer Caching Architecture

The API employs a sophisticated caching strategy with multiple layers:

  1. PostgreSQL Persistent Cache - Long-term storage for historical temperature data with location aliasing
  1. Redis Cache Layer - Fast in-memory cache for frequently accessed data
  1. Improved Temporal Tolerance Cache - Smart approximate matching for near-identical requests
  1. CDN Edge Caching - Cloudflare-optimized headers for edge caching

Cache Features

#### Core Capabilities

  • Strong Cache Headers: ETags, Last-Modified, and Cache-Control headers
  • Conditional Requests: 304 Not Modified responses for unchanged data
  • Single-Flight Protection: Prevents cache stampedes with Redis locks
  • Canonical Cache Keys: Normalized keys for maximum hit rates
  • Async Job Processing: Heavy computations handled asynchronously
  • Cache Metrics: Hit/miss counters and performance monitoring
  • Cache Prewarming: Automated warming for popular locations

#### PostgreSQL Persistent Cache with Location Aliasing

The API uses PostgreSQL as a persistent cache for historical temperature data with intelligent location deduplication:

Key Features:
  • Location Canonicalization: Automatically merges nearby locations (within 25km) to reduce duplicate API calls
  • Persistent Aliases: Once a location is resolved, future requests use the alias without re-running matching logic
  • Coordinate-Based Deduplication: Locations with similar coordinates share the same cached data
  • Preapproved Location Support: Maintains compatibility with curated location list
  • Race-Safe Operations: Uses ON CONFLICT clauses to handle concurrent requests
Schema:
-- Canonical locations table
CREATE TABLE locations (
    id BIGSERIAL PRIMARY KEY,
    original_name TEXT NOT NULL,
    normalized_name TEXT NOT NULL UNIQUE,
    resolved_name TEXT,
    latitude DOUBLE PRECISION,
    longitude DOUBLE PRECISION,
    timezone TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Aliases mapping for location deduplication CREATE TABLE location_aliases ( alias_normalized_name TEXT PRIMARY KEY, location_id BIGINT REFERENCES locations(id) ON DELETE CASCADE, created_at TIMESTAMPTZ DEFAULT NOW() );

-- Daily temperature cache CREATE TABLE daily_temperatures ( location_id BIGINT REFERENCES locations(id) ON DELETE CASCADE, day DATE NOT NULL, temp_c DOUBLE PRECISION, temp_max_c DOUBLE PRECISION, temp_min_c DOUBLE PRECISION, payload JSONB NOT NULL, source TEXT NOT NULL, updated_at TIMESTAMPTZ NOT NULL, PRIMARY KEY (location_id, day) );
Location Resolution Order:
  1. Check location_aliases table for existing alias
  1. Check locations table for direct match
  1. Apply preapproved location metadata if available
  1. Find nearby existing location within 25km using haversine distance
  1. Create new canonical location as fallback
Configuration:

PostgreSQL connection

TEMPHIST_PG_DSN=postgresql://user:password@host:5432/dbname

Or use DATABASE_URL (standard Heroku/Railway variable)

DATABASE_URL=postgresql://user:password@host:5432/dbname
Current Year Handling:

The API treats the current year differently from historical years to ensure data freshness:

  • Relaxed Coverage Thresholds: Current year requires only 50% coverage (vs 90% for historical years)
- Weekly: 4 of 7 days (57% vs 86%) - Monthly: 19 of 31 days (61% vs 90%) - Yearly: 183 of 365 days (50% vs 90%)
  • No Caching When Missing: Responses missing the current year are not cached, forcing fresh fetches
  • Smart Retries: Each request retries until current year data is available
  • Historical Gaps Preserved: Missing historical years (e.g., 1995-2004 for some locations) are cached normally

This ensures current year data appears as soon as it's available while accepting permanent gaps in historical data.

#### Improved Temporal Tolerance Caching

Smart caching with temporal tolerance reduces redundant API calls:

Location Canonicalization:
  • Consistent keys for location variations: "London, England, United Kingdom" β†’ "london_england"
  • Handles international characters: SΓ£o Paulo, MΓΌnchen, etc.
  • Removes common country suffixes automatically
Temporal Tolerance:
  • Yearly Data: Β±7 days tolerance for yearly aggregations
  • Monthly Data: Β±2 days tolerance for monthly aggregations
  • Daily Data: Exact match only (no tolerance)
  • Smart Fallback: Uses Redis sorted sets to find nearest cached date within tolerance
Metadata Tracking: When serving approximate data, comprehensive metadata is included:
{
  "meta": {
    "requested": {
      "location": "London, England",
      "end_date": "2024-01-15"
    },
    "served_from": {
      "canonical_location": "london_england",
      "end_date": "2024-01-16",
      "temporal_delta_days": 1
    },
    "approximate": {
      "temporal": true
    }
  }
}
Performance Benefits:
  • Cache hit rate: Improved from ~60-70% to ~85-95%
  • Response time: Reduced by 200-500ms average
  • API cost reduction: ~40-60% fewer Visual Crossing requests

Cache Endpoints

#### Regular Endpoints (with enhanced caching)

All existing endpoints now include enhanced caching

GET /v1/records/{period}/{location}/{identifier}

#### Async Job Endpoints (New)

Create async job for heavy computations

POST /v1/records/{period}/{location}/{identifier}/async

Check job status and retrieve results

GET /v1/jobs/{job_id}

Job status values: pending, processing, ready, error

Async Job Processing

The API now supports asynchronous processing for heavy computations:

#### Job Lifecycle
  1. Submit Job: POST to async endpoint returns 202 Accepted with job_id
  1. Monitor Progress: GET /v1/jobs/{job_id} to check status
  1. Retrieve Results: When status is ready, results are available
  1. Error Handling: Failed jobs return status error with details

#### Example Async Usage

Start async job for heavy computation

curl -X POST "https://api.temphist.com/v1/records/daily/london/01-15/async" \ -H "Authorization: Bearer YOUR_TOKEN"

Response: {"job_id": "job_abc123", "status": "pending", "message": "Job queued"}

Check job status

curl "https://api.temphist.com/v1/jobs/job_abc123" \ -H "Authorization: Bearer YOUR_TOKEN"

Response when ready:

{

"job_id": "job_abc123",

"status": "ready",

"result": { / temperature data / },

"created_at": "2024-01-15T10:00:00Z",

"completed_at": "2024-01-15T10:02:30Z"

}

#### Job Worker Process

The system includes a background worker that processes async jobs:

Start the job worker (typically run as a separate service)

python job_worker.py

The worker will:

- Poll Redis for pending jobs

- Process heavy computations

- Cache results automatically

- Update job status to 'ready' or 'error'

Cache Management

View cache statistics and performance

GET /cache-stats

Reset cache statistics

POST /cache-stats/reset

Invalidate specific cache entries

DELETE /cache/invalidate/key/{cache_key} DELETE /cache/invalidate/location/{location} DELETE /cache/invalidate/pattern

Cache health check

GET /cache-stats/health

Cache Prewarming

Prewarm popular locations for various endpoints

python prewarm.py --locations 20 --days 7

Run comprehensive load tests

python load_test_script.py --requests 1000 --concurrent 10

Test cache performance specifically

python load_test_script.py --endpoint cache --requests 500

Cache Headers and ETag Support

All endpoints now support conditional requests:

First request returns data with ETag

curl -H "Authorization: Bearer YOUR_TOKEN" \ "https://api.temphist.com/v1/records/daily/london/01-15"

Response includes: ETag: "abc123", Cache-Control: "public, max-age=3600"

Subsequent requests with ETag return 304 if unchanged

curl -H "Authorization: Bearer YOUR_TOKEN" \ -H "If-None-Match: abc123" \ "https://api.temphist.com/v1/records/daily/london/01-15"

Response: 304 Not Modified (use cached data)

For detailed Cloudflare optimization guidance, see CLOUDFLARE_OPTIMIZATION.md.

πŸ”Œ Client Integration Guide

Authentication

The API supports multiple authentication methods depending on your use case:

#### 1. Firebase Authentication (Users)

For end-user applications, use Firebase authentication tokens:

Include Firebase token in Authorization header

curl -H "Authorization: Bearer FIREBASE_ID_TOKEN" \ https://api.temphist.com/v1/records/daily/New%20York/01-15

#### 2. API Access Token (Automated Systems)

For automated systems like cron jobs, server-side prefetching, or internal services, use the API access token:

Include API access token in Authorization header

curl -H "Authorization: Bearer $API_ACCESS_TOKEN" \ https://api.temphist.com/v1/records/daily/New%20York/01-15
Benefits of API Access Token:
  • βœ… No Firebase authentication overhead
  • βœ… Bypasses rate limiting for efficient automated operations
  • βœ… Identified as system/admin usage in logs
  • βœ… Efficient for automated prefetching and cache warming
  • βœ… Perfect for cron jobs and background tasks

#### 3. Test Token (Development)

For development and testing:

Include test token in Authorization header

curl -H "Authorization: Bearer $API_ACCESS_TOKEN" \ http://localhost:8000/v1/records/daily/New%20York/01-15

Base URLs

Production

BASE_URL=https://api.temphist.com

Development

BASE_URL=http://localhost:8000

Client Implementation Examples

#### JavaScript/TypeScript Client

class TempHistClient {
  private baseUrl: string;
  private apiToken: string;

constructor(baseUrl: string, apiToken: string) { this.baseUrl = baseUrl; this.apiToken = apiToken; }

// Basic request with caching support async getTemperatureData( period: string, location: string, identifier: string ) { const response = await fetch( `${this.baseUrl}/v1/records/${period}/${encodeURIComponent( location )}/${identifier}`, { headers: { Authorization: Bearer ${this.apiToken}, Accept: "application/json", }, } );

// Handle 304 Not Modified (cached response) if (response.status === 304) { return null; // Use your local cache }

return await response.json(); }

// Async job processing async getTemperatureDataAsync( period: string, location: string, identifier: string ) { // Create job const jobResponse = await fetch( `${this.baseUrl}/v1/records/${period}/${encodeURIComponent( location )}/${identifier}/async`, { method: "POST", headers: { Authorization: Bearer ${this.apiToken}, "Content-Type": "application/json", }, } );

const job = await jobResponse.json();

// Poll for completion return await this.pollJobStatus(job.job_id); }

private async pollJobStatus(jobId: string): Promise { while (true) { const response = await fetch(${this.baseUrl}/v1/jobs/${jobId}, { headers: { Authorization: Bearer ${this.apiToken}, }, });

const status = await response.json();

if (status.status === "ready") { return status.result; } else if (status.status === "error") { throw new Error(Job failed: ${status.error}); }

// Wait before polling again await new Promise((resolve) => setTimeout(resolve, 3000)); } } }

#### Python Client
import requests
import time
from typing import Optional, Dict, Any

class TempHistClient: def __init__(self, base_url: str, api_token: str): self.base_url = base_url.rstrip('/') self.api_token = api_token self.session = requests.Session() self.session.headers.update({ 'Authorization': f'Bearer {api_token}', 'Accept': 'application/json' })

def get_temperature_data(self, period: str, location: str, identifier: str, etag: Optional[str] = None) -> Dict[str, Any]: """Get temperature data with optional ETag support.""" headers = {} if etag: headers['If-None-Match'] = etag

response = self.session.get( f"{self.base_url}/v1/records/{period}/{location}/{identifier}", headers=headers )

# Handle 304 Not Modified if response.status_code == 304: return None # Use cached data

response.raise_for_status() return response.json()

def get_temperature_data_async(self, period: str, location: str, identifier: str) -> Dict[str, Any]: """Get temperature data using async job processing.""" # Create job job_response = self.session.post( f"{self.base_url}/v1/records/{period}/{location}/{identifier}/async" ) job_response.raise_for_status() job = job_response.json()

# Poll for completion return self._poll_job_status(job['job_id'])

def _poll_job_status(self, job_id: str) -> Dict[str, Any]: """Poll job status until completion.""" while True: response = self.session.get(f"{self.base_url}/v1/jobs/{job_id}") response.raise_for_status() status = response.json()

if status['status'] == 'ready': return status['result'] elif status['status'] == 'error': raise Exception(f"Job failed: {status.get('error', 'Unknown error')}")

time.sleep(3) # Wait 3 seconds before polling again

Error Handling

#### HTTP Status Codes

| Code | Description | Action | | ----- | ---------------- | ----------------------------- | | 200 | Success | Process response data | | 202 | Accepted | Job created (async endpoints) | | 304 | Not Modified | Use cached data | | 400 | Bad Request | Check request parameters | | 401 | Unauthorized | Verify API token | | 404 | Not Found | Check endpoint URL | | 422 | Validation Error | Fix request format | | 429 | Rate Limited | Wait and retry | | 500 | Server Error | Retry with backoff |

#### Error Response Format

{
  "detail": "Error message",
  "status_code": 400,
  "error_type": "validation_error"
}

Performance Best Practices

#### Use Appropriate Endpoints

For real-time data - use regular endpoints

GET /v1/records/daily/New%20York/01-15

For heavy computations - use async jobs

POST /v1/records/daily/New%20York/01-15/async

For repeated requests - implement ETag caching

If-None-Match: "abc123def456"

#### Retry Logic with Exponential Backoff

import time
import random

def retry_with_backoff(func, max_retries=3, base_delay=1): """Retry function with exponential backoff.""" for attempt in range(max_retries): try: return func() except requests.exceptions.RequestException as e: if attempt == max_retries - 1: raise

# Exponential backoff with jitter delay = base_delay (2 * attempt) + random.uniform(0, 1) time.sleep(delay)

πŸ“‘ API Endpoints

API Versioning (LOW-001)

Current Version: v1

The API uses URL-based versioning:

  • v1 endpoints (/v1/records/*): Current stable API version
  • Legacy endpoints (/weather/, /forecast/): Deprecated, maintained for backward compatibility
Versioning Policy:
  • New features and breaking changes will be introduced in new version numbers (v2, v3, etc.)
  • Previous versions will be maintained for at least 6 months after a new version is released
  • Deprecation warnings will be sent via response headers (Deprecation, Sunset, Link)
  • Legacy endpoints will eventually be removed - migrate to v1 API as soon as possible
Migration Guide: See MIGRATION_GUIDE.md for details on migrating from legacy endpoints.

V1 API (Recommended)

The new v1 API provides a unified structure for accessing temperature records across different time periods.

#### Main Record Endpoints

| Endpoint | Description | Format | | ---------------------------------------------------------- | --------------------------- | ---------------------------- | | GET /v1/records/{period}/{location}/{identifier} | Complete temperature record | See identifier formats below | | GET /v1/records/{period}/{location}/{identifier}/average | Average temperature data | Subresource | | GET /v1/records/{period}/{location}/{identifier}/trend | Temperature trend data | Subresource | | GET /v1/records/{period}/{location}/{identifier}/summary | Text summary | Subresource |

#### Async Job Endpoints (New)

| Endpoint | Description | Response | | --------------------------------------------------------- | ---------------------------- | -------------- | | POST /v1/records/{period}/{location}/{identifier}/async | Create async job for records | 202 Accepted | | GET /v1/jobs/{job_id} | Check job status and results | Job status |

#### Period Types and Identifier Formats

All periods use the same MM-DD identifier format, representing the end date of the period:

| Period | Identifier Format | Example | Description | | --------- | ----------------- | ------- | ------------------------------------------------ | | daily | MM-DD | 01-15 | January 15th across all years | | weekly | MM-DD | 01-15 | 7 days ending on January 15th across all years | | monthly | MM-DD | 01-15 | 30 days ending on January 15th across all years | | yearly | MM-DD | 01-15 | 365 days ending on January 15th across all years |

#### Example V1 Requests

Get daily record for January 15th

GET /v1/records/daily/london/01-15

Get weekly record ending on January 15th

GET /v1/records/weekly/london/01-15

Get monthly record ending on January 15th

GET /v1/records/monthly/london/01-15

Get yearly record ending on January 15th

GET /v1/records/yearly/london/01-15

Get just the average for a daily record

GET /v1/records/daily/london/01-15/average

Get just the trend for a daily record

GET /v1/records/daily/london/01-15/trend

Removed Endpoints

❌ These endpoints have been removed. Please use v1 endpoints instead.

| Removed Endpoint | V1 Equivalent | | ------------------------------------- | -------------------------------------------------- | | GET /data/{location}/{month_day} | /v1/records/daily/{location}/{month_day} | | GET /average/{location}/{month_day} | /v1/records/daily/{location}/{month_day}/average | | GET /trend/{location}/{month_day} | /v1/records/daily/{location}/{month_day}/trend | | GET /summary/{location}/{month_day} | /v1/records/daily/{location}/{month_day}/summary | Note: Removed endpoints return 410 Gone with migration information.

Other Endpoints

| Endpoint | Description | Format | | -------------------------------- | ------------------------- | ------------ | | GET / | API information | - | | GET /weather/{location}/{date} | Weather for specific date | YYYY-MM-DD | | GET /forecast/{location} | Current weather forecast | - |

Monitoring Endpoints

| Endpoint | Description | Access | | ------------------------ | -------------------------------- | ------ | | GET /rate-limit-status | Current rate limiting status | Public | | GET /rate-limit-stats | Overall rate limiting statistics | Admin | | GET /test-redis | Redis connection test | Public | | GET /health | Health check | Public |

V1 API Response Format

GET /v1/records/daily/london/01-15 returns:
{
  "period": "daily",
  "location": "london",
  "identifier": "01-15",
  "range": {
    "start": "1975-01-15",
    "end": "2024-01-15",
    "years": 50
  },
  "unit_group": "metric",
  "values": [
    {
      "date": "1975-01-15",
      "year": 1975,
      "temperature": 15.0,
      "temp_min": null,
      "temp_max": null
    }
  ],
  "average": {
    "mean": 12.5,
    "temp_min": null,
    "temp_max": null,
    "unit": "celsius",
    "data_points": 50
  },
  "trend": {
    "slope": 0.25,
    "unit": "Β°C/decade",
    "data_points": 50,
    "r_squared": null
  },
  "summary": "15.0Β°C. It is 2.5Β°C warmer than average today.",
  "metadata": {
    "total_years": 50,
    "available_years": 50,
    "missing_years": [],
    "completeness": 100.0
  }
}

Legacy Response Format (Deprecated)

GET /data/London/01-15 returns:
{
  "weather": {
    "data": [
      { "x": 1975, "y": 15.0 },
      { "x": 1976, "y": 15.5 },
      { "x": 1977, "y": 16.0 },
      { "x": 2024, "y": 17.0 }
    ],
    "metadata": {
      "total_years": 50,
      "available_years": 50,
      "missing_years": [],
      "completeness": 100.0
    }
  },
  "summary": "17.0Β°C. This is the warmest 15th January on record. It is 2.0Β°C warmer than average today.",
  "trend": {
    "slope": 0.3,
    "units": "Β°C/decade"
  },
  "average": {
    "average": 15.0,
    "unit": "celsius",
    "data_points": 50,
    "year_range": { "start": 1975, "end": 2024 },
    "missing_years": [],
    "completeness": 100.0
  }
}

#### Preapproved Locations Endpoint

The API provides access to a curated list of preapproved locations that are guaranteed to work with the weather endpoints. Endpoint: GET /v1/locations/preapproved Query Parameters:
  • country_code (optional): Filter by ISO 3166-1 alpha-2 country code (e.g., "US", "GB")
  • tier (optional): Filter by location tier (e.g., "global")
  • limit (optional): Limit results (1-500, default: no limit)
Response Headers:
  • Cache-Control: public, max-age=3600, s-maxage=86400
  • ETag: Stable ETag for conditional requests
  • Last-Modified: File modification timestamp
Example Requests:

Get all preapproved locations

GET /v1/locations/preapproved

Get locations in the United States

GET /v1/locations/preapproved?country_code=US

Get global tier locations

GET /v1/locations/preapproved?tier=global

Get first 10 locations

GET /v1/locations/preapproved?limit=10

Combined filters

GET /v1/locations/preapproved?country_code=GB&tier=global&limit=5
Response Format:
{
  "version": 1,
  "count": 20,
  "generated_at": "2024-01-15T10:30:00Z",
  "locations": [
    {
      "id": "london",
      "slug": "london",
      "name": "London",
      "admin1": "England",
      "country_name": "United Kingdom",
      "country_code": "GB",
      "latitude": 51.5074,
      "longitude": -0.1278,
      "timezone": "Europe/London",
      "tier": "global"
    }
  ]
}
Caching and Performance:
  • Full response cached in Redis for 24 hours
  • Filtered responses cached separately for optimal performance
  • Supports conditional requests with ETag and Last-Modified headers
  • Rate limited to 60 requests per minute per IP
Status Endpoint: GET /v1/locations/preapproved/status

Returns service health and configuration information.

πŸ›‘οΈ Rate Limiting

The API includes comprehensive rate limiting to prevent abuse:

Exemptions

The following are exempt from rate limiting:

  • Service Jobs: Requests using API_ACCESS_TOKEN (automated systems, cron jobs, cache warming)
  • Whitelisted IPs: IPs configured in IP_WHITELIST

> Note: Rate limiting only applies to end users with Firebase authentication. Automated systems using API_ACCESS_TOKEN can operate without rate limits for efficient prefetching and cache warming.

Location Diversity Limits

  • Max unique locations per hour: 10 (configurable)
  • Purpose: Prevents requesting data for too many different locations
  • Applied to: Firebase-authenticated user requests only

Request Rate Limits

  • Max requests per hour: 100 (configurable)
  • Purpose: Prevents excessive API calls overall
  • Applied to: Firebase-authenticated user requests only

IP Management

  • Whitelist: IPs exempt from all rate limiting
  • Blacklist: IPs blocked entirely (HTTP 403)

Rate Limit Responses

When limits are exceeded:

  • HTTP 429 (Too Many Requests)
  • Retry-After header with retry time
  • Detailed error message explaining the limit

Checking Rate Limit Status

Check your current rate limit status

curl -H "Authorization: Bearer YOUR_TOKEN" \ https://api.temphist.com/rate-limit-status

Response includes:

- service_job: true if using API_ACCESS_TOKEN

- whitelisted: true if IP is whitelisted

- rate_limited: false if exempt from rate limiting

πŸš€ Caching System

Multi-Layer Cache Strategy

The API implements a sophisticated multi-layer caching system:

#### 1. PostgreSQL Persistent Cache
  • Historical temperature data: Long-term storage with location aliasing
  • Location deduplication: Nearby locations (within 25km) share cached data
  • Persistent aliases: Stable mappings reduce repeated Visual Crossing lookups
  • Duration: Permanent (until data becomes stale or is manually invalidated)

#### 2. Redis Fast Cache

  • Today's data: 1 hour cache duration
  • Historical data: 1 week cache duration
  • Temporal tolerance: Serves approximate data within tolerance windows
  • Smart invalidation: Automatic cleanup of expired entries

#### 3. Improved Temporal Tolerance Cache
  • Canonical location keys: Normalized for maximum hit rates
  • Temporal matching: Β±7 days for yearly, Β±2 days for monthly data
  • Metadata tracking: Clear indication when serving approximate data

Cache Keys

  • PostgreSQL: Location ID-based keying with alias resolution
  • Redis Weather data: {location}_{date}
  • Redis Series data: series_{location}_{month}_{day}
  • Redis Analysis data: {type}_{location}_{month}_{day}
  • Improved cache: improved:cache:{agg}:{canonical_location}:{end_date}

Cache Monitoring

Check Redis cache status

curl http://localhost:8000/test-redis

View cache statistics and performance

curl http://localhost:8000/cache-stats

Check cache health

curl http://localhost:8000/cache-stats/health

Monitor rate limit and cache performance

curl http://localhost:8000/rate-limit-stats

PostgreSQL Cache Configuration

Required for persistent cache

TEMPHIST_PG_DSN=postgresql://user:password@host:5432/dbname

Or use standard DATABASE_URL

DATABASE_URL=postgresql://user:password@host:5432/dbname

Cache will be disabled if neither variable is set

The application will log a warning and continue without persistent cache

πŸ“Š Performance & Monitoring

Built-in Profiling

The API includes comprehensive performance monitoring:

Run performance tests

python performance_test.py

Profile specific functions

python -c " import cProfile from main import calculate_historical_average cProfile.run('calculate_historical_average([{\"x\": 2020, \"y\": 15.5}])') "

Performance Metrics

  • URL Building: ~2.5M operations/second
  • Historical Average: ~58K operations/second
  • Trend Calculation: ~23K operations/second
  • Memory Usage: Minimal impact

Key Performance Optimizations

  • PostgreSQL Pool Tuning (utils/daily_temperature_store.py): raises min_size/max_size, adds timeouts, and recycles idle connections for faster, more reliable queries.
  • Targeted Date Queries (utils/daily_temperature_store.py): switches BETWEEN scans to = ANY($2::date[]), improving index usage when fetching sparse days.
  • Database Index Suite (utils/daily_temperature_store.py): adds indexes for location aliases, coordinate lookups, and current-year slices to keep hot data efficient.
  • Geospatial Prefiltering (utils/daily_temperature_store.py): applies latitude/longitude bounding boxes before haversine checks, shrinking candidate sets for nearby-location matching.
  • Response-Timing Middleware (main.py): emits X-Response-Time headers and logs requests slower than 1β€―s to surface bottlenecks early.
  • Redis Pipelining (cache_utils.py, job_worker.py): batches multi-year cache writes/reads to cut network round trips and latency.
  • Tiered Cache TTLs (cache_utils.py, job_worker.py): extends TTL for older years (up to 365β€―days) while keeping recent data fresh.
  • Visual Crossing Timeout Control (main.py, config.py, utils/visual_crossing_timeline.py): separates HTTP_TIMEOUT_VISUAL_CROSSING (default 30β€―s) so slow upstream calls fail fast.

Monitoring Tools

  • Real-time logs: tail -f temphist.log
  • Error tracking: grep "ERROR" temphist.log
  • Performance metrics: Built-in timing middleware

πŸ§ͺ Testing

Quick Start

Run all tests (140+ tests)

pytest -v

Run specific test categories

pytest test_main.py -k "rate" -v # Rate limiting tests pytest test_main.py -k "cache" -v # Caching tests pytest test_main.py -k "performance" -v # Performance tests pytest test_cache.py -v # Enhanced caching system tests pytest tests/routers/ -v # Router-specific tests

Test Categories

  • Unit Tests: Individual component testing (test_main.py)
  • Enhanced Caching Tests: Cache utilities, ETags, jobs (test_cache.py)
  • Integration Tests: Full API endpoint testing
  • Performance Tests: Load and stress testing
  • Rate Limiting Tests: Rate limit validation
  • Router Tests: Endpoint-specific functionality

Enhanced Caching Tests

Test cache key normalization

pytest test_cache.py::TestCacheKeyBuilder -v

Test ETag generation and conditional requests

pytest test_cache.py::TestETagGenerator -v

Test async job processing

pytest test_cache.py::TestJobWorker -v

Test single-flight protection

pytest test_cache.py::TestSingleFlightLock -v

Test cache performance

pytest test_cache.py::TestPerformance -v

Load Testing

Run comprehensive load tests

python load_test_script.py --requests 1000 --concurrent 10

Test specific endpoints

python load_test_script.py --endpoint records --requests 500 python load_test_script.py --endpoint cache --requests 200

Test async job performance

python load_test_script.py --endpoint async --requests 50

Manual Testing

Test rate limiting

curl http://localhost:8000/rate-limit-status

Test weather data

curl http://localhost:8000/v1/records/daily/London/01-15

Test with authentication

curl -H "Authorization: Bearer $API_ACCESS_TOKEN" \ http://localhost:8000/v1/records/daily/London/01-15

Test async job processing

curl -X POST -H "Authorization: Bearer $API_ACCESS_TOKEN" \ http://localhost:8000/v1/records/daily/London/01-15/async

Test cache headers

curl -v -H "Authorization: Bearer $API_ACCESS_TOKEN" \ http://localhost:8000/v1/records/daily/London/01-15

Test conditional requests (304 Not Modified)

curl -H "Authorization: Bearer $API_ACCESS_TOKEN" \ -H "If-None-Match: your_etag_here" \ http://localhost:8000/v1/records/daily/London/01-15

Cache Prewarming

Prewarm popular locations

python prewarm.py --locations 20 --days 7

Prewarm with verbose output

python prewarm.py --locations 10 --days 3 --verbose

πŸ“ Logging

Log Levels

  • DEBUG: Detailed debugging information (development only)
  • INFO: General application flow
  • WARNING: Issues that don't stop execution
  • ERROR: Serious problems requiring attention

Log Configuration

Development

DEBUG=true # Enables debug logging and file output

Production

DEBUG=false # Disables debug overhead

Log Monitoring

Watch logs in real-time

tail -f temphist.log

Search for specific events

grep "ERROR" temphist.log grep "CACHE" temphist.log grep "RATE" temphist.log

πŸš€ Deployment

For complete deployment instructions, see DEPLOYMENT.md

Quick Start - Railway (Recommended)

  1. Create Railway project with Redis database
  1. Deploy from GitHub repository
  1. Set environment variables
  1. Deploy!

πŸ“‹ Environment Variables

Hosting Requirements

Minimum Requirements:
  • Python 3.8+ runtime environment
  • Redis database (version 6.0+ recommended) - Required
  • PostgreSQL database (version 12+ recommended) - Optional, for persistent cache
  • Memory: 512MB RAM minimum, 1GB+ recommended (2GB+ if using PostgreSQL)
  • Storage: 100MB+ for application, 500MB+ recommended if using PostgreSQL persistent cache
  • Network: Outbound HTTPS access to weather APIs
Supported Platforms:
  • Railway, Render, Heroku, DigitalOcean App Platform
  • AWS Elastic Beanstalk, Google Cloud Run, Azure Container Instances
  • VPS/Cloud servers (Ubuntu 20.04+, CentOS 8+)
  • Docker containers

Environment Variables

#### πŸ”‘ Required Variables

API Keys (Required for main service):
VISUAL_CROSSING_API_KEY=your_visual_crossing_key    # Primary weather data source
OPENWEATHER_API_KEY=your_openweather_key            # Backup weather data source
Database:

Redis (Required for fast caching and rate limiting)

REDIS_URL=redis://localhost:6379 # Redis connection string

Examples:

redis://username:password@host:port/db

rediss://username:password@host:port/db (SSL)

redis://default:password@redis.internal:6379 (Railway)

PostgreSQL (Optional - for persistent cache with location aliasing)

TEMPHIST_PG_DSN=postgresql://user:password@host:5432/dbname

Or use standard DATABASE_URL (Heroku/Railway)

DATABASE_URL=postgresql://user:password@host:5432/dbname

If not set, persistent cache will be disabled

#### πŸ”§ Main API Service Variables

Core Configuration:

Server settings

PORT=8000 # Server port (default: 8000) BASE_URL=https://your-api-domain.com # Public API URL for job callbacks

Caching

CACHE_ENABLED=true # Enable/disable caching (default: true)

Logging & Debug

DEBUG=false # Enable debug logging (default: false) LOG_VERBOSITY=normal # minimal|normal|verbose (default: normal)
Rate Limiting & Security:

Rate limiting

RATE_LIMIT_ENABLED=true # Enable rate limiting (default: true) MAX_LOCATIONS_PER_HOUR=10 # Max unique locations per IP/hour (default: 10) MAX_REQUESTS_PER_HOUR=100 # Max requests per IP/hour (default: 100) RATE_LIMIT_WINDOW_HOURS=1 # Rate limit window in hours (default: 1)

IP filtering (comma-separated lists)

IP_WHITELIST=192.168.1.100,10.0.0.5 # Bypass rate limits (optional) IP_BLACKLIST=192.168.1.200,10.0.0.99 # Block specific IPs (optional)

API access

API_ACCESS_TOKEN=your_secure_token # Token for automated access (optional)
Data Processing:
FILTER_WEATHER_DATA=true                           # Filter invalid weather data (default: true)
UNIT_GROUP=celsius                                 # Default temperature unit (default: celsius)
CORS Configuration:
CORS_ORIGINS=https://yourdomain.com,https://staging.yourdomain.com  # Comma-separated allowed origins
CORS_ORIGIN_REGEX=^https://.*\.yourdomain\.com$                     # Regex pattern for allowed origins
Firebase Integration (Optional):
FIREBASE_SERVICE_ACCOUNT={"type":"service_account",...}  # Firebase service account JSON

OR

FIREBASE_SERVICE_ACCOUNT_JSON={"type":"service_account",...} # Alternative variable name

#### βš™οΈ Job Worker Service Variables

Worker Configuration:

All main API variables above, plus:

DEBUG=false # Worker debug logging BASE_URL=https://your-api-domain.com # API URL for job callbacks
Note: The job worker service uses the same environment variables as the main API service, as it needs access to the same Redis instance and API keys.

#### πŸ—„οΈ Cache Management Variables

Cache Warming:
CACHE_WARMING_ENABLED=true                         # Enable automatic cache warming (default: true)
CACHE_WARMING_INTERVAL_HOURS=4                     # Hours between warming cycles (default: 4)
CACHE_WARMING_DAYS_BACK=7                          # Days of data to warm (default: 7)
CACHE_WARMING_CONCURRENT_REQUESTS=3                # Concurrent warming requests (default: 3)
CACHE_WARMING_MAX_LOCATIONS=15                     # Max locations to warm (default: 15)

The warmer now sources its base location list from data/preapproved_locations.json, so no additional environment variable is required.

Cache Statistics:
CACHE_STATS_ENABLED=true                           # Enable cache statistics (default: true)
CACHE_STATS_RETENTION_HOURS=24                     # Hours to retain stats (default: 24)
CACHE_HEALTH_THRESHOLD=0.7                         # Hit rate threshold for health (default: 0.7)
Cache Invalidation:
CACHE_INVALIDATION_ENABLED=true                    # Enable cache invalidation (default: true)
CACHE_INVALIDATION_DRY_RUN=false                  # Test invalidation without executing (default: false)
CACHE_INVALIDATION_BATCH_SIZE=100                 # Batch size for invalidation (default: 100)
Usage Tracking:
USAGE_TRACKING_ENABLED=true                        # Enable usage tracking (default: true)
USAGE_RETENTION_DAYS=7                             # Days to retain usage data (default: 7)

Platform-Specific Examples

Railway:
REDIS_URL=redis://default:password@redis.internal:6379
Render:
REDIS_URL=redis://username:password@host:port
Heroku:
REDIS_URL=redis://username:password@host:port
Docker:
REDIS_URL=redis://redis:6379
Local Development:
REDIS_URL=redis://localhost:6379
DEBUG=true
LOG_VERBOSITY=verbose

Health Checks

  • Health endpoint: /health
  • Redis check: /test-redis
  • Rate limit status: /rate-limit-status

See DEPLOYMENT.md for:

  • Complete Railway setup guide
  • Environment variable configuration
  • Firebase credentials setup
  • Multi-service deployment
  • Migration from Render
  • Troubleshooting deployment issues

πŸ”§ Troubleshooting

For detailed troubleshooting, see TROUBLESHOOTING.md

Quick Diagnostics

Health checks

curl https://your-app.com/health curl https://your-app.com/test-redis curl https://your-app.com/rate-limit-status

Enable debug logging

DEBUG=true uvicorn main:app --reload

See TROUBLESHOOTING.md for help with:

  • Deployment issues (502 errors, environment variables)
  • Async jobs and background worker problems
  • Redis and caching issues
  • Rate limiting problems
  • API errors (401, 422, 429, 500)
  • Performance issues

πŸ“ˆ Performance Optimization

Caching Improvements

  • Implemented Redis caching for all endpoints
  • Smart cache durations based on data type
  • Automatic cache invalidation

Rate Limiting Optimization

  • Memory-efficient data structures
  • Automatic cleanup of old entries
  • Configurable limits and time windows

API Optimization

  • Async/await for I/O operations
  • Connection pooling for Redis
  • Response compression

πŸ› οΈ Development

Code Structure

main.py              # Main FastAPI application
test_main.py         # Comprehensive test suite
performance_test.py  # Performance testing utilities
requirements.txt     # Python dependencies
render.yaml         # Render deployment configuration

Development Workflow

  1. Make changes to the code
  1. Run tests with pytest
  1. Check performance with python performance_test.py
  1. Test rate limiting with manual requests
  1. Deploy to Render

Managing Preapproved Locations Data

The preapproved locations are stored in data/preapproved_locations.json and loaded at startup.

#### Adding New Locations
  1. Edit the data file:

   # Add new location to data/preapproved_locations.json
   {
     "id": "new_city",
     "slug": "new-city",
     "name": "New City",
     "admin1": "State/Province",
     "country_name": "Country Name",
     "country_code": "CC",  # ISO 3166-1 alpha-2
     "latitude": 0.0,
     "longitude": 0.0,
     "timezone": "Continent/City",
     "tier": "global"
   }
   

  1. Restart the application to load new data:

   # The cache will be automatically warmed on startup
   python main.py
   

  1. Verify the data loaded:
   curl http://localhost:8000/v1/locations/preapproved/status
   

#### Cache Management

  • Warm cache manually: The cache is automatically warmed on startup
  • Clear cache: Use Redis commands or restart the application
  • Monitor cache: Check Redis keys with pattern preapproved:v1:*

#### Data Validation

The application validates all location data against the LocationItem schema:
  • Country codes must be valid ISO 3166-1 alpha-2 format
  • Coordinates must be valid numbers
  • All required fields must be present

Contributing

  1. Fork the repository
  1. Create a feature branch
  1. Make your changes
  1. Add tests for new functionality
  1. Run the test suite
  1. Submit a pull request

πŸ“š Additional Resources

  • API Documentation: http://localhost:8000/docs (when running locally)
  • Interactive API: http://localhost:8000/redoc
  • Rate Limit Status: /rate-limit-status
  • Performance Metrics: Built-in profiling tools

πŸ“„ License

MIT License - see LICENSE file for details

--- Note: This API is designed for production use with comprehensive monitoring, rate limiting, and caching. The performance profiling shows excellent optimization with minimal memory usage and high throughput capabilities.