7.6 KiB
7.6 KiB
Critical Issue: Infinite Token Refresh Loop
Problem Analysis
What's Happening
- Initial Load: App starts successfully, user authenticated
- Session Invalidation: Keycloak session becomes invalid (user logged out elsewhere, session expired, etc.)
- Refresh Storm: Every API request triggers:
- JWT callback execution
- Token refresh attempt
- Refresh failure (session invalid)
- Token cleared, but error state persists
- Next request repeats the cycle
Root Cause
The JWT callback in app/api/auth/options.ts has no circuit breaker:
// Current problematic flow:
if (expiresAt && Date.now() < expiresAt) {
return token; // Token valid
}
// Token expired - ALWAYS tries to refresh
const refreshedToken = await refreshAccessToken(token);
// If refresh fails, clears tokens but...
// Next request will see expired token and try again!
The Problem:
- No cooldown period after failed refresh
- No "session invalid" cache/flag
- Every request triggers refresh attempt
- Multiple widgets = multiple parallel requests = refresh storm
Impact
- Performance: Excessive Keycloak API calls
- Server Load: CPU/memory spike from refresh attempts
- User Experience: App appears broken, constant loading
- Logs: Spam with "Keycloak session invalidated" messages
- Security: Potential DoS on Keycloak server
Solution: Circuit Breaker Pattern
Implementation Strategy
- Add Refresh Cooldown: Don't retry refresh for X seconds after failure
- Session Invalid Flag: Cache the "session invalid" state
- Early Return: If session known to be invalid, skip refresh attempt
- Client-Side Detection: Stop making requests when session invalid
Code Changes Needed
1. Add Circuit Breaker to JWT Callback
// Add to app/api/auth/options.ts
// Track last failed refresh attempt
const lastFailedRefresh = new Map<string, number>();
const REFRESH_COOLDOWN = 5000; // 5 seconds
async jwt({ token, account, profile }) {
// ... existing initial sign-in logic ...
// Check if token is expired
const expiresAt = token.accessTokenExpires as number;
if (expiresAt && Date.now() < expiresAt) {
return token; // Token still valid
}
// CIRCUIT BREAKER: Check if we recently failed to refresh
const userId = token.sub || 'unknown';
const lastFailure = lastFailedRefresh.get(userId) || 0;
const timeSinceFailure = Date.now() - lastFailure;
if (timeSinceFailure < REFRESH_COOLDOWN) {
// Too soon after failure, return error token immediately
logger.debug('Refresh cooldown active, skipping refresh attempt', {
userId,
timeSinceFailure,
});
return {
...token,
error: "SessionNotActive",
accessToken: undefined,
refreshToken: undefined,
idToken: undefined,
};
}
// Try to refresh
if (!token.refreshToken) {
return {
...token,
error: "NoRefreshToken",
accessToken: undefined,
refreshToken: undefined,
idToken: undefined,
};
}
const refreshedToken = await refreshAccessToken(token);
// If refresh failed, record the failure time
if (refreshedToken.error === "SessionNotActive") {
lastFailedRefresh.set(userId, Date.now());
logger.info("Keycloak session invalidated, setting cooldown", { userId });
// Clean up old entries (prevent memory leak)
if (lastFailedRefresh.size > 1000) {
const now = Date.now();
for (const [key, value] of lastFailedRefresh.entries()) {
if (now - value > REFRESH_COOLDOWN * 10) {
lastFailedRefresh.delete(key);
}
}
}
return {
...refreshedToken,
accessToken: undefined,
refreshToken: undefined,
idToken: undefined,
};
}
// Success - clear any previous failure record
lastFailedRefresh.delete(userId);
return refreshedToken;
}
2. Use Redis for Distributed Circuit Breaker
For multi-instance deployments, use Redis:
// lib/services/auth-circuit-breaker.ts
import { getRedisClient } from '@/lib/redis';
const REFRESH_COOLDOWN = 5000; // 5 seconds
const CIRCUIT_BREAKER_KEY = (userId: string) => `auth:refresh:cooldown:${userId}`;
export async function isRefreshInCooldown(userId: string): Promise<boolean> {
const redis = getRedisClient();
const key = CIRCUIT_BREAKER_KEY(userId);
const lastFailure = await redis.get(key);
if (!lastFailure) {
return false;
}
const timeSinceFailure = Date.now() - parseInt(lastFailure, 10);
return timeSinceFailure < REFRESH_COOLDOWN;
}
export async function recordRefreshFailure(userId: string): Promise<void> {
const redis = getRedisClient();
const key = CIRCUIT_BREAKER_KEY(userId);
await redis.set(key, Date.now().toString(), 'EX', Math.ceil(REFRESH_COOLDOWN / 1000));
}
export async function clearRefreshCooldown(userId: string): Promise<void> {
const redis = getRedisClient();
const key = CIRCUIT_BREAKER_KEY(userId);
await redis.del(key);
}
3. Client-Side Request Stopping
Add to components to stop making requests when session invalid:
// hooks/use-session-guard.ts
import { useSession } from 'next-auth/react';
import { useEffect, useRef } from 'react';
export function useSessionGuard() {
const { status, data: session } = useSession();
const hasInvalidSession = useRef(false);
useEffect(() => {
if (status === 'unauthenticated' && !hasInvalidSession.current) {
hasInvalidSession.current = true;
// Stop all refresh intervals
// Clear any pending requests
}
}, [status]);
return {
shouldMakeRequests: status === 'authenticated' && !hasInvalidSession.current,
isInvalid: hasInvalidSession.current,
};
}
Immediate Fix (Quick)
Option 1: Add Simple Cooldown (In-Memory)
Add this to app/api/auth/options.ts:
// At top of file
const refreshCooldown = new Map<string, number>();
const COOLDOWN_MS = 5000; // 5 seconds
// In jwt callback, before refresh attempt:
const userId = token.sub || 'unknown';
const lastFailure = refreshCooldown.get(userId) || 0;
if (Date.now() - lastFailure < COOLDOWN_MS) {
// Skip refresh, return error immediately
return {
...token,
error: "SessionNotActive",
accessToken: undefined,
refreshToken: undefined,
idToken: undefined,
};
}
// After failed refresh:
if (refreshedToken.error === "SessionNotActive") {
refreshCooldown.set(userId, Date.now());
// ... rest of error handling
}
Option 2: Early Return on Known Invalid Session
// In jwt callback, check token error first:
if (token.error === "SessionNotActive") {
// Already know session is invalid, don't try again
return {
...token,
accessToken: undefined,
refreshToken: undefined,
idToken: undefined,
error: "SessionNotActive",
};
}
Recommended Implementation
- Immediate: Add simple in-memory cooldown (Option 1)
- Short-term: Migrate to Redis-based circuit breaker
- Long-term: Add client-side session guard to stop requests
Testing
After implementing:
-
Test Scenario 1: Logout from Keycloak admin console
- Should see: 1-2 refresh attempts, then cooldown
- Should NOT see: Infinite loop
-
Test Scenario 2: Expire session manually
- Should see: Cooldown prevents refresh storm
- Should see: User redirected to sign-in
-
Test Scenario 3: Multiple widgets loading
- Should see: All widgets respect cooldown
- Should see: No refresh storm
Monitoring
Add metrics:
- Refresh attempt count
- Refresh failure count
- Cooldown activations
- Session invalidations per user