NeahNew/NOTIFICATION_FLOW_ANALYSIS.md
2026-01-06 19:18:29 +01:00

15 KiB

Complete Notification Flow Analysis

Date: 2026-01-06
Purpose: Trace the entire notification system flow to identify issues and improvements


🔍 FLOW 1: Initial Page Load & Count Display

Step-by-Step Flow:

  1. Component Mount (notification-badge.tsx)

    • useNotifications() hook initializes
    • useEffect triggers when status === 'authenticated'
    • Calls fetchNotificationCount(true) (force refresh)
    • Calls fetchNotifications()
    • Starts polling every 60 seconds
  2. Count Fetch (use-notifications.ts/api/notifications/count)

    • Hook calls /api/notifications/count?_t=${Date.now()} (cache-busting)
    • API route authenticates user
    • Calls NotificationService.getNotificationCount(userId)
  3. Service Layer (notification-service.ts)

    • Checks Redis cache first (notifications:count:${userId})
    • If cached: Returns cached data immediately
    • If not cached: Fetches from adapters
  4. Adapter Layer (leantime-adapter.ts)

    • getNotificationCount() calls getNotifications(userId, 1, 100)
    • ⚠️ ISSUE: Only fetches first 100 notifications for counting
    • Filters unread: notifications.filter(n => !n.isRead).length
    • Returns count object
  5. Cache Storage

    • Service stores count in Redis with 30-second TTL
    • Returns to API route
    • API returns to hook
    • Hook updates React state: setNotificationCount(data)
  6. UI Update

    • Badge displays notificationCount.unread
    • Shows "65" if 65 unread notifications

🔍 FLOW 2: Mark Single Notification as Read

Step-by-Step Flow:

  1. User Action (notification-badge.tsx)

    • User clicks "Mark as read" button
    • Calls handleMarkAsRead(notificationId)
    • Calls markAsRead(notificationId) from hook
  2. Hook Action (use-notifications.ts)

    • Makes POST to /api/notifications/${notificationId}/read
    • Optimistic UI Update:
      • Updates notification in state: isRead: true
      • Decrements count: unread: Math.max(0, prev.unread - 1)
    • Waits 100ms, then calls fetchNotificationCount(true)
  3. API Route (app/api/notifications/[id]/read/route.ts)

    • Authenticates user
    • Extracts notification ID: leantime-2732 → splits to get source and ID
    • Calls NotificationService.markAsRead(userId, notificationId)
  4. Service Layer (notification-service.ts)

    • Extracts source: leantime from ID
    • Gets adapter: this.adapters.get('leantime')
    • Calls adapter.markAsRead(userId, notificationId)
  5. Adapter Layer (leantime-adapter.ts)

    • Gets user email from session: getUserEmail()
    • Gets Leantime user ID: getLeantimeUserId(email)
    • ⚠️ CRITICAL ISSUE: If getLeantimeUserId() fails → returns false
    • If successful: Calls Leantime API markNotificationRead
    • Returns success/failure
  6. Cache Invalidation (notification-service.ts)

    • If markAsRead() returns true:
      • Calls invalidateCache(userId)
      • Deletes count cache: notifications:count:${userId}
      • Deletes all list caches: notifications:list:${userId}:*
    • If returns false: Cache NOT invalidated
  7. Count Refresh (use-notifications.ts)

    • After 100ms delay, calls fetchNotificationCount(true)
    • Fetches fresh count from API
    • ⚠️ ISSUE: If cache wasn't invalidated, might get stale count

🔍 FLOW 3: Mark All Notifications as Read

Step-by-Step Flow:

  1. User Action (notification-badge.tsx)

    • User clicks "Mark all read" button
    • Calls handleMarkAllAsRead()
    • Calls markAllAsRead() from hook
  2. Hook Action (use-notifications.ts)

    • Makes POST to /api/notifications/read-all
    • Optimistic UI Update:
      • Sets all notifications: isRead: true
      • Sets count: unread: 0
    • Waits 200ms, then calls fetchNotificationCount(true)
  3. API Route (app/api/notifications/read-all/route.ts)

    • Authenticates user
    • Calls NotificationService.markAllAsRead(userId)
  4. Service Layer (notification-service.ts)

    • Loops through all adapters
    • For each adapter:
      • Checks if configured
      • Calls adapter.markAllAsRead(userId)
    • Collects results: [true/false, ...]
    • Determines: success = results.every(r => r), anySuccess = results.some(r => r)
    • Cache Invalidation:
      • If anySuccess === true: Invalidates cache
      • If anySuccess === false: Cache NOT invalidated
  5. Adapter Layer (leantime-adapter.ts)

    • Gets user email: getUserEmail()
    • Gets Leantime user ID: getLeantimeUserId(email)
    • ⚠️ CRITICAL ISSUE: If this fails → returns false immediately
    • If successful:
      • Fetches all notifications directly from API (up to 1000)
      • Filters unread: rawNotifications.filter(n => n.read === 0)
      • Marks each individually using markNotificationRead
      • Returns success if any were marked
  6. Cache Invalidation (notification-service.ts)

    • Only happens if anySuccess === true
    • ⚠️ ISSUE: If getLeantimeUserId() fails, anySuccess = false
    • Cache stays stale → count remains 65
  7. Count Refresh (use-notifications.ts)

    • After 200ms, calls fetchNotificationCount(true)
    • ⚠️ ISSUE: If cache wasn't invalidated, gets stale count from cache

🔍 FLOW 4: Fetch Notification List

Step-by-Step Flow:

  1. User Opens Dropdown (notification-badge.tsx)

    • handleOpenChange(true) called
    • Calls manualFetch() which calls fetchNotifications(1, 10)
  2. Hook Action (use-notifications.ts)

    • Makes GET to /api/notifications?page=1&limit=20
    • Updates state: setNotifications(data.notifications)
  3. API Route (app/api/notifications/route.ts)

    • Authenticates user
    • Calls NotificationService.getNotifications(userId, page, limit)
  4. Service Layer (notification-service.ts)

    • Checks Redis cache first: notifications:list:${userId}:${page}:${limit}
    • If cached: Returns cached data immediately
    • If not cached: Fetches from adapters
  5. Adapter Layer (leantime-adapter.ts)

    • Gets user email and Leantime user ID
    • Calls Leantime API getAllNotifications with pagination
    • Transforms notifications to our format
    • Returns array
  6. Cache Storage

    • Service stores list in Redis with 5-minute TTL
    • Returns to API
    • API returns to hook
    • Hook updates React state

🐛 IDENTIFIED ISSUES

Issue #1: getLeantimeUserId() Fails Inconsistently

Problem:

  • getLeantimeUserId() works in getNotifications() and getNotificationCount()
  • But fails in markAllAsRead() and sometimes in markAsRead()
  • Logs show: "User not found in Leantime: a.tmiri@clm.foundation"

Root Cause:

  • getLeantimeUserId() calls Leantime API getAll users endpoint
  • Fetches ALL users, then searches for matching email
  • Possible causes:
    1. Race condition: API call happens at different times
    2. Session timing: Session might be different between calls
    3. API rate limiting: Leantime API might throttle requests
    4. Caching issue: No caching of user ID lookup

Impact:

  • Mark all as read fails → cache not invalidated → count stays 65
  • Mark single as read might fail → cache not invalidated → count doesn't update

Solution:

  • Cache Leantime user ID in Redis with longer TTL
  • Add retry logic with exponential backoff
  • Add better error handling and logging

Issue #2: Cache Invalidation Only on Success

Problem:

  • Cache is only invalidated if markAsRead() or markAllAsRead() returns true
  • If operation fails (e.g., getLeantimeUserId() fails), cache stays stale
  • Count remains at old value (65)

Root Cause:

if (success) {
  await this.invalidateCache(userId);
}

Impact:

  • User sees stale count even after attempting to mark as read
  • UI shows optimistic update, but server count doesn't match

Solution:

  • Always invalidate cache after marking attempt (even on failure)
  • Or: Invalidate cache before marking, then refresh after
  • Or: Use optimistic updates with eventual consistency

Issue #3: Count Based on First 100 Notifications

Problem:

  • getNotificationCount() only fetches first 100 notifications
  • If user has 200 notifications with 66 unread, count shows 66
  • But if 66 unread are beyond first 100, count is wrong

Root Cause:

const notifications = await this.getNotifications(userId, 1, 100);
const unreadCount = notifications.filter(n => !n.isRead).length;

Impact:

  • Count might be inaccurate if >100 notifications exist
  • User might see "66 unread" but only 10 displayed (pagination)

Solution:

  • Use dedicated count API if Leantime provides one
  • Or: Fetch all notifications for counting (up to reasonable limit)
  • Or: Show "66+ unread" if count reaches 100

Issue #4: Race Condition Between Cache Invalidation and Count Fetch

Problem:

  • Hook calls fetchNotificationCount(true) after 100-200ms delay
  • But cache invalidation might not be complete
  • Count fetch might still get stale cache

Root Cause:

setTimeout(() => {
  fetchNotificationCount(true);
}, 200);

Impact:

  • Count might not update immediately after marking
  • User sees optimistic update, then stale count

Solution:

  • Increase delay to 500ms
  • Or: Poll count until it matches expected value
  • Or: Use WebSocket/SSE for real-time updates

Issue #5: No Caching of Leantime User ID

Problem:

  • getLeantimeUserId() fetches ALL users from Leantime API every time
  • No caching, so repeated calls are slow and might fail
  • Different calls might get different results (race condition)

Root Cause:

  • No Redis cache for user ID mapping
  • Each call makes full API request

Impact:

  • Slow performance
  • Inconsistent results
  • API rate limiting issues

Solution:

  • Cache user ID in Redis: leantime:userid:${email} with 1-hour TTL
  • Invalidate cache only when user changes or on explicit refresh

Issue #6: getNotificationCount Uses Cached getNotifications

Problem:

  • getNotificationCount() calls getNotifications(userId, 1, 100)
  • getNotifications() uses cache if available
  • Count might be based on stale cached notifications

Root Cause:

async getNotificationCount(userId: string): Promise<NotificationCount> {
  const notifications = await this.getNotifications(userId, 1, 100);
  // Uses cached data if available
}

Impact:

  • Count might be stale even if notifications were marked as read
  • Cache TTL mismatch: count cache (30s) vs list cache (5min)

Solution:

  • Fetch notifications directly from API for counting (bypass cache)
  • Or: Use dedicated count endpoint
  • Or: Invalidate list cache when count cache is invalidated

Issue #7: Optimistic Updates Don't Match Server State

Problem:

  • Hook optimistically updates count: unread: 0
  • But server count might still be 65 (cache not invalidated)
  • After refresh, count jumps back to 65

Root Cause:

  • Optimistic update happens immediately
  • Server cache invalidation might fail
  • Count refresh gets stale data

Impact:

  • Confusing UX: count goes to 0, then back to 65
  • User thinks operation failed when it might have succeeded

Solution:

  • Only show optimistic update if we're confident operation will succeed
  • Or: Show loading state until server confirms
  • Or: Poll until count matches expected value

Priority 1: Fix getLeantimeUserId() Reliability

  1. Cache User ID Mapping

    // Cache key: leantime:userid:${email}
    // TTL: 1 hour
    // Invalidate on user update or explicit refresh
    
  2. Add Retry Logic

    // Retry 3 times with exponential backoff
    // Log each attempt
    // Return cached value if API fails
    
  3. Better Error Handling

    // Log full error details
    // Return null only after all retries fail
    // Don't fail entire operation on user ID lookup failure
    

Priority 2: Always Invalidate Cache After Marking

  1. Invalidate Before Marking

    // Invalidate cache first
    // Then mark as read
    // Then refresh count
    
  2. Or: Always Invalidate After Attempt

    // Always invalidate cache after marking attempt
    // Even if operation failed
    // This ensures fresh data on next fetch
    

Priority 3: Fix Count Accuracy

  1. Use Dedicated Count API (if available)

    // Check if Leantime has count-only endpoint
    // Use that instead of fetching all notifications
    
  2. Or: Fetch All for Counting

    // Fetch up to 1000 notifications for counting
    // Or use pagination to count all
    
  3. Or: Show "66+ unread" if limit reached

    // If count === 100, show "100+ unread"
    // Indicate there might be more
    

Priority 4: Improve Cache Strategy

  1. Unified Cache Invalidation

    // When count cache is invalidated, also invalidate list cache
    // When list cache is invalidated, also invalidate count cache
    // Keep them in sync
    
  2. Shorter Cache TTLs

    // Count cache: 10 seconds (currently 30s)
    // List cache: 1 minute (currently 5min)
    // More frequent updates
    
  3. Cache Tags/Versioning

    // Use cache version numbers
    // Increment on invalidation
    // Check version before using cache
    

Priority 5: Better Error Recovery

  1. Graceful Degradation

    // If mark as read fails, still invalidate cache
    // Show error message to user
    // Allow retry
    
  2. Retry Logic

    // Retry failed operations automatically
    // Exponential backoff
    // Max 3 retries
    

📊 FLOW DIAGRAM: Current vs Improved

Current Flow (Mark All As Read):

User clicks → Hook → API → Service → Adapter
  ↓
getLeantimeUserId() → FAILS ❌
  ↓
Returns false → Service: anySuccess = false
  ↓
Cache NOT invalidated ❌
  ↓
Count refresh → Gets stale cache → Shows 65 ❌

Improved Flow (Mark All As Read):

User clicks → Hook → API → Service → Adapter
  ↓
getLeantimeUserId() → Check cache first
  ↓
If cached: Use cached ID ✅
If not cached: Fetch from API → Cache result ✅
  ↓
Mark all as read → Success ✅
  ↓
Always invalidate cache (even on partial failure) ✅
  ↓
Count refresh → Gets fresh data → Shows 0 ✅

🚀 IMPLEMENTATION PRIORITY

  1. Fix getLeantimeUserId() caching (High Priority)

    • Add Redis cache for user ID mapping
    • Add retry logic
    • Better error handling
  2. Always invalidate cache (High Priority)

    • Invalidate cache even on failure
    • Or invalidate before marking
  3. Fix count accuracy (Medium Priority)

    • Use dedicated count API or fetch all
    • Show "66+ unread" if limit reached
  4. Improve cache strategy (Medium Priority)

    • Unified invalidation
    • Shorter TTLs
    • Cache versioning
  5. Better error recovery (Low Priority)

    • Graceful degradation
    • Retry logic
    • Better UX

Status: Analysis complete. Ready for implementation.