NeahNew/CRITICAL_ISSUE_ANALYSIS.md
2026-01-08 16:17:13 +01:00

7.6 KiB

Critical Issue: Infinite Token Refresh Loop

Problem Analysis

What's Happening

  1. Initial Load: App starts successfully, user authenticated
  2. Session Invalidation: Keycloak session becomes invalid (user logged out elsewhere, session expired, etc.)
  3. Refresh Storm: Every API request triggers:
    • JWT callback execution
    • Token refresh attempt
    • Refresh failure (session invalid)
    • Token cleared, but error state persists
    • Next request repeats the cycle

Root Cause

The JWT callback in app/api/auth/options.ts has no circuit breaker:

// Current problematic flow:
if (expiresAt && Date.now() < expiresAt) {
  return token; // Token valid
}

// Token expired - ALWAYS tries to refresh
const refreshedToken = await refreshAccessToken(token);

// If refresh fails, clears tokens but...
// Next request will see expired token and try again!

The Problem:

  • No cooldown period after failed refresh
  • No "session invalid" cache/flag
  • Every request triggers refresh attempt
  • Multiple widgets = multiple parallel requests = refresh storm

Impact

  • Performance: Excessive Keycloak API calls
  • Server Load: CPU/memory spike from refresh attempts
  • User Experience: App appears broken, constant loading
  • Logs: Spam with "Keycloak session invalidated" messages
  • Security: Potential DoS on Keycloak server

Solution: Circuit Breaker Pattern

Implementation Strategy

  1. Add Refresh Cooldown: Don't retry refresh for X seconds after failure
  2. Session Invalid Flag: Cache the "session invalid" state
  3. Early Return: If session known to be invalid, skip refresh attempt
  4. Client-Side Detection: Stop making requests when session invalid

Code Changes Needed

1. Add Circuit Breaker to JWT Callback

// Add to app/api/auth/options.ts

// Track last failed refresh attempt
const lastFailedRefresh = new Map<string, number>();
const REFRESH_COOLDOWN = 5000; // 5 seconds

async jwt({ token, account, profile }) {
  // ... existing initial sign-in logic ...
  
  // Check if token is expired
  const expiresAt = token.accessTokenExpires as number;
  if (expiresAt && Date.now() < expiresAt) {
    return token; // Token still valid
  }

  // CIRCUIT BREAKER: Check if we recently failed to refresh
  const userId = token.sub || 'unknown';
  const lastFailure = lastFailedRefresh.get(userId) || 0;
  const timeSinceFailure = Date.now() - lastFailure;
  
  if (timeSinceFailure < REFRESH_COOLDOWN) {
    // Too soon after failure, return error token immediately
    logger.debug('Refresh cooldown active, skipping refresh attempt', {
      userId,
      timeSinceFailure,
    });
    return {
      ...token,
      error: "SessionNotActive",
      accessToken: undefined,
      refreshToken: undefined,
      idToken: undefined,
    };
  }

  // Try to refresh
  if (!token.refreshToken) {
    return {
      ...token,
      error: "NoRefreshToken",
      accessToken: undefined,
      refreshToken: undefined,
      idToken: undefined,
    };
  }

  const refreshedToken = await refreshAccessToken(token);
  
  // If refresh failed, record the failure time
  if (refreshedToken.error === "SessionNotActive") {
    lastFailedRefresh.set(userId, Date.now());
    logger.info("Keycloak session invalidated, setting cooldown", { userId });
    
    // Clean up old entries (prevent memory leak)
    if (lastFailedRefresh.size > 1000) {
      const now = Date.now();
      for (const [key, value] of lastFailedRefresh.entries()) {
        if (now - value > REFRESH_COOLDOWN * 10) {
          lastFailedRefresh.delete(key);
        }
      }
    }
    
    return {
      ...refreshedToken,
      accessToken: undefined,
      refreshToken: undefined,
      idToken: undefined,
    };
  }
  
  // Success - clear any previous failure record
  lastFailedRefresh.delete(userId);
  
  return refreshedToken;
}

2. Use Redis for Distributed Circuit Breaker

For multi-instance deployments, use Redis:

// lib/services/auth-circuit-breaker.ts
import { getRedisClient } from '@/lib/redis';

const REFRESH_COOLDOWN = 5000; // 5 seconds
const CIRCUIT_BREAKER_KEY = (userId: string) => `auth:refresh:cooldown:${userId}`;

export async function isRefreshInCooldown(userId: string): Promise<boolean> {
  const redis = getRedisClient();
  const key = CIRCUIT_BREAKER_KEY(userId);
  const lastFailure = await redis.get(key);
  
  if (!lastFailure) {
    return false;
  }
  
  const timeSinceFailure = Date.now() - parseInt(lastFailure, 10);
  return timeSinceFailure < REFRESH_COOLDOWN;
}

export async function recordRefreshFailure(userId: string): Promise<void> {
  const redis = getRedisClient();
  const key = CIRCUIT_BREAKER_KEY(userId);
  await redis.set(key, Date.now().toString(), 'EX', Math.ceil(REFRESH_COOLDOWN / 1000));
}

export async function clearRefreshCooldown(userId: string): Promise<void> {
  const redis = getRedisClient();
  const key = CIRCUIT_BREAKER_KEY(userId);
  await redis.del(key);
}

3. Client-Side Request Stopping

Add to components to stop making requests when session invalid:

// hooks/use-session-guard.ts
import { useSession } from 'next-auth/react';
import { useEffect, useRef } from 'react';

export function useSessionGuard() {
  const { status, data: session } = useSession();
  const hasInvalidSession = useRef(false);
  
  useEffect(() => {
    if (status === 'unauthenticated' && !hasInvalidSession.current) {
      hasInvalidSession.current = true;
      // Stop all refresh intervals
      // Clear any pending requests
    }
  }, [status]);
  
  return {
    shouldMakeRequests: status === 'authenticated' && !hasInvalidSession.current,
    isInvalid: hasInvalidSession.current,
  };
}

Immediate Fix (Quick)

Option 1: Add Simple Cooldown (In-Memory)

Add this to app/api/auth/options.ts:

// At top of file
const refreshCooldown = new Map<string, number>();
const COOLDOWN_MS = 5000; // 5 seconds

// In jwt callback, before refresh attempt:
const userId = token.sub || 'unknown';
const lastFailure = refreshCooldown.get(userId) || 0;

if (Date.now() - lastFailure < COOLDOWN_MS) {
  // Skip refresh, return error immediately
  return {
    ...token,
    error: "SessionNotActive",
    accessToken: undefined,
    refreshToken: undefined,
    idToken: undefined,
  };
}

// After failed refresh:
if (refreshedToken.error === "SessionNotActive") {
  refreshCooldown.set(userId, Date.now());
  // ... rest of error handling
}

Option 2: Early Return on Known Invalid Session

// In jwt callback, check token error first:
if (token.error === "SessionNotActive") {
  // Already know session is invalid, don't try again
  return {
    ...token,
    accessToken: undefined,
    refreshToken: undefined,
    idToken: undefined,
    error: "SessionNotActive",
  };
}

  1. Immediate: Add simple in-memory cooldown (Option 1)
  2. Short-term: Migrate to Redis-based circuit breaker
  3. Long-term: Add client-side session guard to stop requests

Testing

After implementing:

  1. Test Scenario 1: Logout from Keycloak admin console

    • Should see: 1-2 refresh attempts, then cooldown
    • Should NOT see: Infinite loop
  2. Test Scenario 2: Expire session manually

    • Should see: Cooldown prevents refresh storm
    • Should see: User redirected to sign-in
  3. Test Scenario 3: Multiple widgets loading

    • Should see: All widgets respect cooldown
    • Should see: No refresh storm

Monitoring

Add metrics:

  • Refresh attempt count
  • Refresh failure count
  • Cooldown activations
  • Session invalidations per user