NeahNew/NOTIFICATION_FLOW_ANALYSIS.md
2026-01-06 19:18:29 +01:00

527 lines
15 KiB
Markdown

# Complete Notification Flow Analysis
**Date**: 2026-01-06
**Purpose**: Trace the entire notification system flow to identify issues and improvements
---
## 🔍 **FLOW 1: Initial Page Load & Count Display**
### Step-by-Step Flow:
1. **Component Mount** (`notification-badge.tsx`)
- `useNotifications()` hook initializes
- `useEffect` triggers when `status === 'authenticated'`
- Calls `fetchNotificationCount(true)` (force refresh)
- Calls `fetchNotifications()`
- Starts polling every 60 seconds
2. **Count Fetch** (`use-notifications.ts` → `/api/notifications/count`)
- Hook calls `/api/notifications/count?_t=${Date.now()}` (cache-busting)
- API route authenticates user
- Calls `NotificationService.getNotificationCount(userId)`
3. **Service Layer** (`notification-service.ts`)
- **Checks Redis cache first** (`notifications:count:${userId}`)
- If cached: Returns cached data immediately
- If not cached: Fetches from adapters
4. **Adapter Layer** (`leantime-adapter.ts`)
- `getNotificationCount()` calls `getNotifications(userId, 1, 100)`
- **⚠️ ISSUE**: Only fetches first 100 notifications for counting
- Filters unread: `notifications.filter(n => !n.isRead).length`
- Returns count object
5. **Cache Storage**
- Service stores count in Redis with 30-second TTL
- Returns to API route
- API returns to hook
- Hook updates React state: `setNotificationCount(data)`
6. **UI Update**
- Badge displays `notificationCount.unread`
- Shows "65" if 65 unread notifications
---
## 🔍 **FLOW 2: Mark Single Notification as Read**
### Step-by-Step Flow:
1. **User Action** (`notification-badge.tsx`)
- User clicks "Mark as read" button
- Calls `handleMarkAsRead(notificationId)`
- Calls `markAsRead(notificationId)` from hook
2. **Hook Action** (`use-notifications.ts`)
- Makes POST to `/api/notifications/${notificationId}/read`
- **Optimistic UI Update**:
- Updates notification in state: `isRead: true`
- Decrements count: `unread: Math.max(0, prev.unread - 1)`
- Waits 100ms, then calls `fetchNotificationCount(true)`
3. **API Route** (`app/api/notifications/[id]/read/route.ts`)
- Authenticates user
- Extracts notification ID: `leantime-2732` → splits to get source and ID
- Calls `NotificationService.markAsRead(userId, notificationId)`
4. **Service Layer** (`notification-service.ts`)
- Extracts source: `leantime` from ID
- Gets adapter: `this.adapters.get('leantime')`
- Calls `adapter.markAsRead(userId, notificationId)`
5. **Adapter Layer** (`leantime-adapter.ts`)
- **Gets user email from session**: `getUserEmail()`
- **Gets Leantime user ID**: `getLeantimeUserId(email)`
- **⚠️ CRITICAL ISSUE**: If `getLeantimeUserId()` fails → returns `false`
- If successful: Calls Leantime API `markNotificationRead`
- Returns success/failure
6. **Cache Invalidation** (`notification-service.ts`)
- If `markAsRead()` returns `true`:
- Calls `invalidateCache(userId)`
- Deletes count cache: `notifications:count:${userId}`
- Deletes all list caches: `notifications:list:${userId}:*`
- If returns `false`: **Cache NOT invalidated**
7. **Count Refresh** (`use-notifications.ts`)
- After 100ms delay, calls `fetchNotificationCount(true)`
- Fetches fresh count from API
- **⚠️ ISSUE**: If cache wasn't invalidated, might get stale count
---
## 🔍 **FLOW 3: Mark All Notifications as Read**
### Step-by-Step Flow:
1. **User Action** (`notification-badge.tsx`)
- User clicks "Mark all read" button
- Calls `handleMarkAllAsRead()`
- Calls `markAllAsRead()` from hook
2. **Hook Action** (`use-notifications.ts`)
- Makes POST to `/api/notifications/read-all`
- **Optimistic UI Update**:
- Sets all notifications: `isRead: true`
- Sets count: `unread: 0`
- Waits 200ms, then calls `fetchNotificationCount(true)`
3. **API Route** (`app/api/notifications/read-all/route.ts`)
- Authenticates user
- Calls `NotificationService.markAllAsRead(userId)`
4. **Service Layer** (`notification-service.ts`)
- Loops through all adapters
- For each adapter:
- Checks if configured
- Calls `adapter.markAllAsRead(userId)`
- Collects results: `[true/false, ...]`
- Determines: `success = results.every(r => r)`, `anySuccess = results.some(r => r)`
- **Cache Invalidation**:
- If `anySuccess === true`: Invalidates cache ✅
- If `anySuccess === false`: **Cache NOT invalidated**
5. **Adapter Layer** (`leantime-adapter.ts`)
- **Gets user email**: `getUserEmail()`
- **Gets Leantime user ID**: `getLeantimeUserId(email)`
- **⚠️ CRITICAL ISSUE**: If this fails → returns `false` immediately
- If successful:
- Fetches all notifications directly from API (up to 1000)
- Filters unread: `rawNotifications.filter(n => n.read === 0)`
- Marks each individually using `markNotificationRead`
- Returns success if any were marked
6. **Cache Invalidation** (`notification-service.ts`)
- Only happens if `anySuccess === true`
- **⚠️ ISSUE**: If `getLeantimeUserId()` fails, `anySuccess = false`
- Cache stays stale → count remains 65
7. **Count Refresh** (`use-notifications.ts`)
- After 200ms, calls `fetchNotificationCount(true)`
- **⚠️ ISSUE**: If cache wasn't invalidated, gets stale count from cache
---
## 🔍 **FLOW 4: Fetch Notification List**
### Step-by-Step Flow:
1. **User Opens Dropdown** (`notification-badge.tsx`)
- `handleOpenChange(true)` called
- Calls `manualFetch()` which calls `fetchNotifications(1, 10)`
2. **Hook Action** (`use-notifications.ts`)
- Makes GET to `/api/notifications?page=1&limit=20`
- Updates state: `setNotifications(data.notifications)`
3. **API Route** (`app/api/notifications/route.ts`)
- Authenticates user
- Calls `NotificationService.getNotifications(userId, page, limit)`
4. **Service Layer** (`notification-service.ts`)
- **Checks Redis cache first**: `notifications:list:${userId}:${page}:${limit}`
- If cached: Returns cached data immediately
- If not cached: Fetches from adapters
5. **Adapter Layer** (`leantime-adapter.ts`)
- Gets user email and Leantime user ID
- Calls Leantime API `getAllNotifications` with pagination
- Transforms notifications to our format
- Returns array
6. **Cache Storage**
- Service stores list in Redis with 5-minute TTL
- Returns to API
- API returns to hook
- Hook updates React state
---
## 🐛 **IDENTIFIED ISSUES**
### **Issue #1: getLeantimeUserId() Fails Inconsistently**
**Problem**:
- `getLeantimeUserId()` works in `getNotifications()` and `getNotificationCount()`
- But fails in `markAllAsRead()` and sometimes in `markAsRead()`
- Logs show: `"User not found in Leantime: a.tmiri@clm.foundation"`
**Root Cause**:
- `getLeantimeUserId()` calls Leantime API `getAll` users endpoint
- Fetches ALL users, then searches for matching email
- **Possible causes**:
1. **Race condition**: API call happens at different times
2. **Session timing**: Session might be different between calls
3. **API rate limiting**: Leantime API might throttle requests
4. **Caching issue**: No caching of user ID lookup
**Impact**:
- Mark all as read fails → cache not invalidated → count stays 65
- Mark single as read might fail → cache not invalidated → count doesn't update
**Solution**:
- Cache Leantime user ID in Redis with longer TTL
- Add retry logic with exponential backoff
- Add better error handling and logging
---
### **Issue #2: Cache Invalidation Only on Success**
**Problem**:
- Cache is only invalidated if `markAsRead()` or `markAllAsRead()` returns `true`
- If operation fails (e.g., `getLeantimeUserId()` fails), cache stays stale
- Count remains at old value (65)
**Root Cause**:
```typescript
if (success) {
await this.invalidateCache(userId);
}
```
**Impact**:
- User sees stale count even after attempting to mark as read
- UI shows optimistic update, but server count doesn't match
**Solution**:
- Always invalidate cache after marking attempt (even on failure)
- Or: Invalidate cache before marking, then refresh after
- Or: Use optimistic updates with eventual consistency
---
### **Issue #3: Count Based on First 100 Notifications**
**Problem**:
- `getNotificationCount()` only fetches first 100 notifications
- If user has 200 notifications with 66 unread, count shows 66
- But if 66 unread are beyond first 100, count is wrong
**Root Cause**:
```typescript
const notifications = await this.getNotifications(userId, 1, 100);
const unreadCount = notifications.filter(n => !n.isRead).length;
```
**Impact**:
- Count might be inaccurate if >100 notifications exist
- User might see "66 unread" but only 10 displayed (pagination)
**Solution**:
- Use dedicated count API if Leantime provides one
- Or: Fetch all notifications for counting (up to reasonable limit)
- Or: Show "66+ unread" if count reaches 100
---
### **Issue #4: Race Condition Between Cache Invalidation and Count Fetch**
**Problem**:
- Hook calls `fetchNotificationCount(true)` after 100-200ms delay
- But cache invalidation might not be complete
- Count fetch might still get stale cache
**Root Cause**:
```typescript
setTimeout(() => {
fetchNotificationCount(true);
}, 200);
```
**Impact**:
- Count might not update immediately after marking
- User sees optimistic update, then stale count
**Solution**:
- Increase delay to 500ms
- Or: Poll count until it matches expected value
- Or: Use WebSocket/SSE for real-time updates
---
### **Issue #5: No Caching of Leantime User ID**
**Problem**:
- `getLeantimeUserId()` fetches ALL users from Leantime API every time
- No caching, so repeated calls are slow and might fail
- Different calls might get different results (race condition)
**Root Cause**:
- No Redis cache for user ID mapping
- Each call makes full API request
**Impact**:
- Slow performance
- Inconsistent results
- API rate limiting issues
**Solution**:
- Cache user ID in Redis: `leantime:userid:${email}` with 1-hour TTL
- Invalidate cache only when user changes or on explicit refresh
---
### **Issue #6: getNotificationCount Uses Cached getNotifications**
**Problem**:
- `getNotificationCount()` calls `getNotifications(userId, 1, 100)`
- `getNotifications()` uses cache if available
- Count might be based on stale cached notifications
**Root Cause**:
```typescript
async getNotificationCount(userId: string): Promise<NotificationCount> {
const notifications = await this.getNotifications(userId, 1, 100);
// Uses cached data if available
}
```
**Impact**:
- Count might be stale even if notifications were marked as read
- Cache TTL mismatch: count cache (30s) vs list cache (5min)
**Solution**:
- Fetch notifications directly from API for counting (bypass cache)
- Or: Use dedicated count endpoint
- Or: Invalidate list cache when count cache is invalidated
---
### **Issue #7: Optimistic Updates Don't Match Server State**
**Problem**:
- Hook optimistically updates count: `unread: 0`
- But server count might still be 65 (cache not invalidated)
- After refresh, count jumps back to 65
**Root Cause**:
- Optimistic update happens immediately
- Server cache invalidation might fail
- Count refresh gets stale data
**Impact**:
- Confusing UX: count goes to 0, then back to 65
- User thinks operation failed when it might have succeeded
**Solution**:
- Only show optimistic update if we're confident operation will succeed
- Or: Show loading state until server confirms
- Or: Poll until count matches expected value
---
## 🎯 **RECOMMENDED IMPROVEMENTS**
### **Priority 1: Fix getLeantimeUserId() Reliability**
1. **Cache User ID Mapping**
```typescript
// Cache key: leantime:userid:${email}
// TTL: 1 hour
// Invalidate on user update or explicit refresh
```
2. **Add Retry Logic**
```typescript
// Retry 3 times with exponential backoff
// Log each attempt
// Return cached value if API fails
```
3. **Better Error Handling**
```typescript
// Log full error details
// Return null only after all retries fail
// Don't fail entire operation on user ID lookup failure
```
---
### **Priority 2: Always Invalidate Cache After Marking**
1. **Invalidate Before Marking**
```typescript
// Invalidate cache first
// Then mark as read
// Then refresh count
```
2. **Or: Always Invalidate After Attempt**
```typescript
// Always invalidate cache after marking attempt
// Even if operation failed
// This ensures fresh data on next fetch
```
---
### **Priority 3: Fix Count Accuracy**
1. **Use Dedicated Count API** (if available)
```typescript
// Check if Leantime has count-only endpoint
// Use that instead of fetching all notifications
```
2. **Or: Fetch All for Counting**
```typescript
// Fetch up to 1000 notifications for counting
// Or use pagination to count all
```
3. **Or: Show "66+ unread" if limit reached**
```typescript
// If count === 100, show "100+ unread"
// Indicate there might be more
```
---
### **Priority 4: Improve Cache Strategy**
1. **Unified Cache Invalidation**
```typescript
// When count cache is invalidated, also invalidate list cache
// When list cache is invalidated, also invalidate count cache
// Keep them in sync
```
2. **Shorter Cache TTLs**
```typescript
// Count cache: 10 seconds (currently 30s)
// List cache: 1 minute (currently 5min)
// More frequent updates
```
3. **Cache Tags/Versioning**
```typescript
// Use cache version numbers
// Increment on invalidation
// Check version before using cache
```
---
### **Priority 5: Better Error Recovery**
1. **Graceful Degradation**
```typescript
// If mark as read fails, still invalidate cache
// Show error message to user
// Allow retry
```
2. **Retry Logic**
```typescript
// Retry failed operations automatically
// Exponential backoff
// Max 3 retries
```
---
## 📊 **FLOW DIAGRAM: Current vs Improved**
### **Current Flow (Mark All As Read)**:
```
User clicks → Hook → API → Service → Adapter
getLeantimeUserId() → FAILS ❌
Returns false → Service: anySuccess = false
Cache NOT invalidated ❌
Count refresh → Gets stale cache → Shows 65 ❌
```
### **Improved Flow (Mark All As Read)**:
```
User clicks → Hook → API → Service → Adapter
getLeantimeUserId() → Check cache first
If cached: Use cached ID ✅
If not cached: Fetch from API → Cache result ✅
Mark all as read → Success ✅
Always invalidate cache (even on partial failure) ✅
Count refresh → Gets fresh data → Shows 0 ✅
```
---
## 🚀 **IMPLEMENTATION PRIORITY**
1. **Fix getLeantimeUserId() caching** (High Priority)
- Add Redis cache for user ID mapping
- Add retry logic
- Better error handling
2. **Always invalidate cache** (High Priority)
- Invalidate cache even on failure
- Or invalidate before marking
3. **Fix count accuracy** (Medium Priority)
- Use dedicated count API or fetch all
- Show "66+ unread" if limit reached
4. **Improve cache strategy** (Medium Priority)
- Unified invalidation
- Shorter TTLs
- Cache versioning
5. **Better error recovery** (Low Priority)
- Graceful degradation
- Retry logic
- Better UX
---
**Status**: Analysis complete. Ready for implementation.