Detailed Methodology for Progressive Bias Investigation¶
1. Mathematical Framework¶
1.1 Bias Definition¶
For each station i at time t, we define the adjustment bias as:
B(i,t) = T_F52(i,t) - T_TOB(i,t)
Where:
- T_F52(i,t)
= Fully adjusted (F52) temperature for station i at time t
- T_TOB(i,t)
= Time-of-observation adjusted temperature for station i at time t
- B(i,t)
= Non-TOB adjustment applied by NOAA
This isolates the adjustments beyond TOB corrections, including: - Homogenization adjustments - Station move corrections - Equipment change adjustments - Urban heat island "corrections"
1.2 Trend Calculation¶
For each station, we calculate the temporal trend using ordinary least squares regression:
B(i,t) = β₀(i) + β₁(i) × t + ε(i,t)
Where:
- β₁(i)
= Bias trend for station i (°C/year)
- β₀(i)
= Intercept (baseline bias)
- ε(i,t)
= Residual error
The key metric is β₁(i) × 10
= trend per decade
1.3 Network-Wide Analysis¶
The network-wide bias trend is calculated as:
β̄₁ = (1/N) × Σᵢ β₁(i)
With standard error:
SE(β̄₁) = σ(β₁) / √N
2. Data Processing Steps¶
2.1 Data Loading and Matching¶
- Load TOB adjusted data for all stations
- Load F52 adjusted data for all stations
- Match records by:
- Station ID
- Year
- Month
2.2 Quality Control¶
- Temporal Coverage: Require minimum 10 years of data
- Annual Completeness: Require ≥6 months per year
- Trend Reliability: Require ≥30 data points for trend calculation
- Start Year Filter (for urban/rural analysis): Require first year ≤1905 to ensure consistent long-term coverage from 1895 baseline
2.3 Bias Calculation¶
For each matched record:
1. Calculate monthly bias: B = F52 - TOB
2. Aggregate to annual means (requiring ≥6 months)
3. Apply linear regression to annual time series
3. Statistical Tests¶
3.1 Individual Station Significance¶
For each station trend, we test: - H₀: β₁(i) = 0 (no temporal trend) - H₁: β₁(i) ≠ 0 (significant trend)
Using t-test with significance level α = 0.05
3.2 Network-Wide Tests¶
Mann-Kendall Test¶
Non-parametric test for monotonic trends in the network-wide mean bias:
S = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ sgn(Bⱼ - Bᵢ)
Where sgn is the signum function.
Breakpoint Detection¶
Using Pettitt's test to identify sudden changes in adjustment patterns:
Uₖ = Σᵢ₌₁ᵏ Σⱼ₌ₖ₊₁ⁿ sgn(Bⱼ - Bᵢ)
The most probable breakpoint occurs at k where |Uₖ| is maximum.
3.3 Spatial Analysis¶
Test for spatial autocorrelation using Moran's I:
I = (N/W) × (Σᵢ Σⱼ wᵢⱼ(βᵢ - β̄)(βⱼ - β̄)) / (Σᵢ(βᵢ - β̄)²)
Where wᵢⱼ are spatial weights based on distance.
4. Control Analyses¶
4.1 Urban vs Rural Stratification¶
Station Classification¶
Stations are classified based on distance to cities with population >50,000:
- Urban Core: ≤25 km from large city
- Urban: 25-50 km from large city
- Suburban: 50-100 km from large city
- Rural: >100 km from any large city
For analysis, these are simplified to binary classification: - Urban: Stations within 100 km of cities >50k population (includes urban core, urban, and suburban) - Rural: Stations >100 km from any city >50k population
Temporal Filtering for Urban/Rural Analysis¶
To ensure consistent time series comparison, stations are filtered by data availability: - Inclusion criterion: Station must have first year of data ≤ 1905 (within 10 years of 1895 start) - Rationale: Ensures all analyzed stations cover most of the study period (1895-2023) - Impact: Excludes 269 stations (22.1%) that began recording after 1905 - Result: 949 stations analyzed (668 urban, 281 rural)
This temporal filtering prevents bias from comparing stations with vastly different recording periods and ensures robust long-term trend analysis.
We test if bias trends differ between groups using Welch's t-test and Mann-Whitney U test.
4.2 Regional Stratification¶
Stations are grouped by climate regions to control for: - Regional climate variations - Policy implementation differences - Data density effects
4.3 Temporal Stratification¶
Analysis is performed for multiple time periods: - Full period (1895-present) - Early period (1895-1950) - Middle period (1951-1980) - Recent period (1981-present)
5. Interpretation Framework¶
5.1 Evidence of Systematic Bias¶
Strong evidence would include: 1. Consistent Direction: >70% of stations show positive trends 2. Statistical Significance: Network-wide trend p < 0.01 3. Large Magnitude: Mean trend >0.01°C/decade 4. Acceleration: Trends stronger in recent decades 5. Spatial Coherence: Similar trends across regions
5.2 Evidence Against Systematic Bias¶
Would include: 1. Random Distribution: ~50% positive, ~50% negative trends 2. No Significance: Network-wide p > 0.05 3. Small Magnitude: |Mean trend| <0.005°C/decade 4. Temporal Stability: No acceleration over time 5. Spatial Heterogeneity: Regional differences dominate
5.3 Alternative Explanations¶
Consider whether patterns could result from: 1. Legitimate adjustments: Documented station changes 2. Urban growth: Increasing UHI requiring correction 3. Network changes: Shifting station composition 4. Methodology artifacts: Statistical or computational issues
6. Uncertainty Quantification¶
6.1 Sources of Uncertainty¶
- Measurement uncertainty: ±0.1°C typical
- Adjustment uncertainty: Undocumented changes
- Sampling uncertainty: Incomplete station coverage
- Trend uncertainty: Regression standard errors
6.2 Propagation Methods¶
- Bootstrap resampling: 1000 iterations with replacement
- Jackknife: Leave-one-out station analysis
- Monte Carlo: Simulate measurement errors
6.3 Reporting¶
All results reported with: - 95% confidence intervals - Standard errors - Sample sizes - Significance levels
7. Limitations¶
7.1 Data Limitations¶
- Station coverage varies over time
- Missing data may not be random
- Adjustment algorithms have changed over time
7.2 Methodological Limitations¶
- Linear trends may oversimplify
- Spatial correlation not fully modeled
- Cannot separate all adjustment types
7.3 Interpretive Limitations¶
- Correlation does not imply causation
- Multiple hypotheses increase false positive risk
- Results specific to USHCN network
8. Validation Approaches¶
8.1 Sensitivity Analysis¶
Test robustness to: - Minimum data requirements - Trend calculation methods - Time period selection - Station selection criteria
8.2 Cross-Validation¶
- Compare with independent temperature datasets
- Check against satellite era (1979+)
- Validate with reanalysis products
8.3 Reproducibility¶
All analysis code is: - Version controlled - Fully documented - Uses fixed random seeds - Includes data version information