527 lines
15 KiB
Markdown
527 lines
15 KiB
Markdown
# Complete Notification Flow Analysis
|
|
|
|
**Date**: 2026-01-06
|
|
**Purpose**: Trace the entire notification system flow to identify issues and improvements
|
|
|
|
---
|
|
|
|
## 🔍 **FLOW 1: Initial Page Load & Count Display**
|
|
|
|
### Step-by-Step Flow:
|
|
|
|
1. **Component Mount** (`notification-badge.tsx`)
|
|
- `useNotifications()` hook initializes
|
|
- `useEffect` triggers when `status === 'authenticated'`
|
|
- Calls `fetchNotificationCount(true)` (force refresh)
|
|
- Calls `fetchNotifications()`
|
|
- Starts polling every 60 seconds
|
|
|
|
2. **Count Fetch** (`use-notifications.ts` → `/api/notifications/count`)
|
|
- Hook calls `/api/notifications/count?_t=${Date.now()}` (cache-busting)
|
|
- API route authenticates user
|
|
- Calls `NotificationService.getNotificationCount(userId)`
|
|
|
|
3. **Service Layer** (`notification-service.ts`)
|
|
- **Checks Redis cache first** (`notifications:count:${userId}`)
|
|
- If cached: Returns cached data immediately
|
|
- If not cached: Fetches from adapters
|
|
|
|
4. **Adapter Layer** (`leantime-adapter.ts`)
|
|
- `getNotificationCount()` calls `getNotifications(userId, 1, 100)`
|
|
- **⚠️ ISSUE**: Only fetches first 100 notifications for counting
|
|
- Filters unread: `notifications.filter(n => !n.isRead).length`
|
|
- Returns count object
|
|
|
|
5. **Cache Storage**
|
|
- Service stores count in Redis with 30-second TTL
|
|
- Returns to API route
|
|
- API returns to hook
|
|
- Hook updates React state: `setNotificationCount(data)`
|
|
|
|
6. **UI Update**
|
|
- Badge displays `notificationCount.unread`
|
|
- Shows "65" if 65 unread notifications
|
|
|
|
---
|
|
|
|
## 🔍 **FLOW 2: Mark Single Notification as Read**
|
|
|
|
### Step-by-Step Flow:
|
|
|
|
1. **User Action** (`notification-badge.tsx`)
|
|
- User clicks "Mark as read" button
|
|
- Calls `handleMarkAsRead(notificationId)`
|
|
- Calls `markAsRead(notificationId)` from hook
|
|
|
|
2. **Hook Action** (`use-notifications.ts`)
|
|
- Makes POST to `/api/notifications/${notificationId}/read`
|
|
- **Optimistic UI Update**:
|
|
- Updates notification in state: `isRead: true`
|
|
- Decrements count: `unread: Math.max(0, prev.unread - 1)`
|
|
- Waits 100ms, then calls `fetchNotificationCount(true)`
|
|
|
|
3. **API Route** (`app/api/notifications/[id]/read/route.ts`)
|
|
- Authenticates user
|
|
- Extracts notification ID: `leantime-2732` → splits to get source and ID
|
|
- Calls `NotificationService.markAsRead(userId, notificationId)`
|
|
|
|
4. **Service Layer** (`notification-service.ts`)
|
|
- Extracts source: `leantime` from ID
|
|
- Gets adapter: `this.adapters.get('leantime')`
|
|
- Calls `adapter.markAsRead(userId, notificationId)`
|
|
|
|
5. **Adapter Layer** (`leantime-adapter.ts`)
|
|
- **Gets user email from session**: `getUserEmail()`
|
|
- **Gets Leantime user ID**: `getLeantimeUserId(email)`
|
|
- **⚠️ CRITICAL ISSUE**: If `getLeantimeUserId()` fails → returns `false`
|
|
- If successful: Calls Leantime API `markNotificationRead`
|
|
- Returns success/failure
|
|
|
|
6. **Cache Invalidation** (`notification-service.ts`)
|
|
- If `markAsRead()` returns `true`:
|
|
- Calls `invalidateCache(userId)`
|
|
- Deletes count cache: `notifications:count:${userId}`
|
|
- Deletes all list caches: `notifications:list:${userId}:*`
|
|
- If returns `false`: **Cache NOT invalidated** ❌
|
|
|
|
7. **Count Refresh** (`use-notifications.ts`)
|
|
- After 100ms delay, calls `fetchNotificationCount(true)`
|
|
- Fetches fresh count from API
|
|
- **⚠️ ISSUE**: If cache wasn't invalidated, might get stale count
|
|
|
|
---
|
|
|
|
## 🔍 **FLOW 3: Mark All Notifications as Read**
|
|
|
|
### Step-by-Step Flow:
|
|
|
|
1. **User Action** (`notification-badge.tsx`)
|
|
- User clicks "Mark all read" button
|
|
- Calls `handleMarkAllAsRead()`
|
|
- Calls `markAllAsRead()` from hook
|
|
|
|
2. **Hook Action** (`use-notifications.ts`)
|
|
- Makes POST to `/api/notifications/read-all`
|
|
- **Optimistic UI Update**:
|
|
- Sets all notifications: `isRead: true`
|
|
- Sets count: `unread: 0`
|
|
- Waits 200ms, then calls `fetchNotificationCount(true)`
|
|
|
|
3. **API Route** (`app/api/notifications/read-all/route.ts`)
|
|
- Authenticates user
|
|
- Calls `NotificationService.markAllAsRead(userId)`
|
|
|
|
4. **Service Layer** (`notification-service.ts`)
|
|
- Loops through all adapters
|
|
- For each adapter:
|
|
- Checks if configured
|
|
- Calls `adapter.markAllAsRead(userId)`
|
|
- Collects results: `[true/false, ...]`
|
|
- Determines: `success = results.every(r => r)`, `anySuccess = results.some(r => r)`
|
|
- **Cache Invalidation**:
|
|
- If `anySuccess === true`: Invalidates cache ✅
|
|
- If `anySuccess === false`: **Cache NOT invalidated** ❌
|
|
|
|
5. **Adapter Layer** (`leantime-adapter.ts`)
|
|
- **Gets user email**: `getUserEmail()`
|
|
- **Gets Leantime user ID**: `getLeantimeUserId(email)`
|
|
- **⚠️ CRITICAL ISSUE**: If this fails → returns `false` immediately
|
|
- If successful:
|
|
- Fetches all notifications directly from API (up to 1000)
|
|
- Filters unread: `rawNotifications.filter(n => n.read === 0)`
|
|
- Marks each individually using `markNotificationRead`
|
|
- Returns success if any were marked
|
|
|
|
6. **Cache Invalidation** (`notification-service.ts`)
|
|
- Only happens if `anySuccess === true`
|
|
- **⚠️ ISSUE**: If `getLeantimeUserId()` fails, `anySuccess = false`
|
|
- Cache stays stale → count remains 65
|
|
|
|
7. **Count Refresh** (`use-notifications.ts`)
|
|
- After 200ms, calls `fetchNotificationCount(true)`
|
|
- **⚠️ ISSUE**: If cache wasn't invalidated, gets stale count from cache
|
|
|
|
---
|
|
|
|
## 🔍 **FLOW 4: Fetch Notification List**
|
|
|
|
### Step-by-Step Flow:
|
|
|
|
1. **User Opens Dropdown** (`notification-badge.tsx`)
|
|
- `handleOpenChange(true)` called
|
|
- Calls `manualFetch()` which calls `fetchNotifications(1, 10)`
|
|
|
|
2. **Hook Action** (`use-notifications.ts`)
|
|
- Makes GET to `/api/notifications?page=1&limit=20`
|
|
- Updates state: `setNotifications(data.notifications)`
|
|
|
|
3. **API Route** (`app/api/notifications/route.ts`)
|
|
- Authenticates user
|
|
- Calls `NotificationService.getNotifications(userId, page, limit)`
|
|
|
|
4. **Service Layer** (`notification-service.ts`)
|
|
- **Checks Redis cache first**: `notifications:list:${userId}:${page}:${limit}`
|
|
- If cached: Returns cached data immediately
|
|
- If not cached: Fetches from adapters
|
|
|
|
5. **Adapter Layer** (`leantime-adapter.ts`)
|
|
- Gets user email and Leantime user ID
|
|
- Calls Leantime API `getAllNotifications` with pagination
|
|
- Transforms notifications to our format
|
|
- Returns array
|
|
|
|
6. **Cache Storage**
|
|
- Service stores list in Redis with 5-minute TTL
|
|
- Returns to API
|
|
- API returns to hook
|
|
- Hook updates React state
|
|
|
|
---
|
|
|
|
## 🐛 **IDENTIFIED ISSUES**
|
|
|
|
### **Issue #1: getLeantimeUserId() Fails Inconsistently**
|
|
|
|
**Problem**:
|
|
- `getLeantimeUserId()` works in `getNotifications()` and `getNotificationCount()`
|
|
- But fails in `markAllAsRead()` and sometimes in `markAsRead()`
|
|
- Logs show: `"User not found in Leantime: a.tmiri@clm.foundation"`
|
|
|
|
**Root Cause**:
|
|
- `getLeantimeUserId()` calls Leantime API `getAll` users endpoint
|
|
- Fetches ALL users, then searches for matching email
|
|
- **Possible causes**:
|
|
1. **Race condition**: API call happens at different times
|
|
2. **Session timing**: Session might be different between calls
|
|
3. **API rate limiting**: Leantime API might throttle requests
|
|
4. **Caching issue**: No caching of user ID lookup
|
|
|
|
**Impact**:
|
|
- Mark all as read fails → cache not invalidated → count stays 65
|
|
- Mark single as read might fail → cache not invalidated → count doesn't update
|
|
|
|
**Solution**:
|
|
- Cache Leantime user ID in Redis with longer TTL
|
|
- Add retry logic with exponential backoff
|
|
- Add better error handling and logging
|
|
|
|
---
|
|
|
|
### **Issue #2: Cache Invalidation Only on Success**
|
|
|
|
**Problem**:
|
|
- Cache is only invalidated if `markAsRead()` or `markAllAsRead()` returns `true`
|
|
- If operation fails (e.g., `getLeantimeUserId()` fails), cache stays stale
|
|
- Count remains at old value (65)
|
|
|
|
**Root Cause**:
|
|
```typescript
|
|
if (success) {
|
|
await this.invalidateCache(userId);
|
|
}
|
|
```
|
|
|
|
**Impact**:
|
|
- User sees stale count even after attempting to mark as read
|
|
- UI shows optimistic update, but server count doesn't match
|
|
|
|
**Solution**:
|
|
- Always invalidate cache after marking attempt (even on failure)
|
|
- Or: Invalidate cache before marking, then refresh after
|
|
- Or: Use optimistic updates with eventual consistency
|
|
|
|
---
|
|
|
|
### **Issue #3: Count Based on First 100 Notifications**
|
|
|
|
**Problem**:
|
|
- `getNotificationCount()` only fetches first 100 notifications
|
|
- If user has 200 notifications with 66 unread, count shows 66
|
|
- But if 66 unread are beyond first 100, count is wrong
|
|
|
|
**Root Cause**:
|
|
```typescript
|
|
const notifications = await this.getNotifications(userId, 1, 100);
|
|
const unreadCount = notifications.filter(n => !n.isRead).length;
|
|
```
|
|
|
|
**Impact**:
|
|
- Count might be inaccurate if >100 notifications exist
|
|
- User might see "66 unread" but only 10 displayed (pagination)
|
|
|
|
**Solution**:
|
|
- Use dedicated count API if Leantime provides one
|
|
- Or: Fetch all notifications for counting (up to reasonable limit)
|
|
- Or: Show "66+ unread" if count reaches 100
|
|
|
|
---
|
|
|
|
### **Issue #4: Race Condition Between Cache Invalidation and Count Fetch**
|
|
|
|
**Problem**:
|
|
- Hook calls `fetchNotificationCount(true)` after 100-200ms delay
|
|
- But cache invalidation might not be complete
|
|
- Count fetch might still get stale cache
|
|
|
|
**Root Cause**:
|
|
```typescript
|
|
setTimeout(() => {
|
|
fetchNotificationCount(true);
|
|
}, 200);
|
|
```
|
|
|
|
**Impact**:
|
|
- Count might not update immediately after marking
|
|
- User sees optimistic update, then stale count
|
|
|
|
**Solution**:
|
|
- Increase delay to 500ms
|
|
- Or: Poll count until it matches expected value
|
|
- Or: Use WebSocket/SSE for real-time updates
|
|
|
|
---
|
|
|
|
### **Issue #5: No Caching of Leantime User ID**
|
|
|
|
**Problem**:
|
|
- `getLeantimeUserId()` fetches ALL users from Leantime API every time
|
|
- No caching, so repeated calls are slow and might fail
|
|
- Different calls might get different results (race condition)
|
|
|
|
**Root Cause**:
|
|
- No Redis cache for user ID mapping
|
|
- Each call makes full API request
|
|
|
|
**Impact**:
|
|
- Slow performance
|
|
- Inconsistent results
|
|
- API rate limiting issues
|
|
|
|
**Solution**:
|
|
- Cache user ID in Redis: `leantime:userid:${email}` with 1-hour TTL
|
|
- Invalidate cache only when user changes or on explicit refresh
|
|
|
|
---
|
|
|
|
### **Issue #6: getNotificationCount Uses Cached getNotifications**
|
|
|
|
**Problem**:
|
|
- `getNotificationCount()` calls `getNotifications(userId, 1, 100)`
|
|
- `getNotifications()` uses cache if available
|
|
- Count might be based on stale cached notifications
|
|
|
|
**Root Cause**:
|
|
```typescript
|
|
async getNotificationCount(userId: string): Promise<NotificationCount> {
|
|
const notifications = await this.getNotifications(userId, 1, 100);
|
|
// Uses cached data if available
|
|
}
|
|
```
|
|
|
|
**Impact**:
|
|
- Count might be stale even if notifications were marked as read
|
|
- Cache TTL mismatch: count cache (30s) vs list cache (5min)
|
|
|
|
**Solution**:
|
|
- Fetch notifications directly from API for counting (bypass cache)
|
|
- Or: Use dedicated count endpoint
|
|
- Or: Invalidate list cache when count cache is invalidated
|
|
|
|
---
|
|
|
|
### **Issue #7: Optimistic Updates Don't Match Server State**
|
|
|
|
**Problem**:
|
|
- Hook optimistically updates count: `unread: 0`
|
|
- But server count might still be 65 (cache not invalidated)
|
|
- After refresh, count jumps back to 65
|
|
|
|
**Root Cause**:
|
|
- Optimistic update happens immediately
|
|
- Server cache invalidation might fail
|
|
- Count refresh gets stale data
|
|
|
|
**Impact**:
|
|
- Confusing UX: count goes to 0, then back to 65
|
|
- User thinks operation failed when it might have succeeded
|
|
|
|
**Solution**:
|
|
- Only show optimistic update if we're confident operation will succeed
|
|
- Or: Show loading state until server confirms
|
|
- Or: Poll until count matches expected value
|
|
|
|
---
|
|
|
|
## 🎯 **RECOMMENDED IMPROVEMENTS**
|
|
|
|
### **Priority 1: Fix getLeantimeUserId() Reliability**
|
|
|
|
1. **Cache User ID Mapping**
|
|
```typescript
|
|
// Cache key: leantime:userid:${email}
|
|
// TTL: 1 hour
|
|
// Invalidate on user update or explicit refresh
|
|
```
|
|
|
|
2. **Add Retry Logic**
|
|
```typescript
|
|
// Retry 3 times with exponential backoff
|
|
// Log each attempt
|
|
// Return cached value if API fails
|
|
```
|
|
|
|
3. **Better Error Handling**
|
|
```typescript
|
|
// Log full error details
|
|
// Return null only after all retries fail
|
|
// Don't fail entire operation on user ID lookup failure
|
|
```
|
|
|
|
---
|
|
|
|
### **Priority 2: Always Invalidate Cache After Marking**
|
|
|
|
1. **Invalidate Before Marking**
|
|
```typescript
|
|
// Invalidate cache first
|
|
// Then mark as read
|
|
// Then refresh count
|
|
```
|
|
|
|
2. **Or: Always Invalidate After Attempt**
|
|
```typescript
|
|
// Always invalidate cache after marking attempt
|
|
// Even if operation failed
|
|
// This ensures fresh data on next fetch
|
|
```
|
|
|
|
---
|
|
|
|
### **Priority 3: Fix Count Accuracy**
|
|
|
|
1. **Use Dedicated Count API** (if available)
|
|
```typescript
|
|
// Check if Leantime has count-only endpoint
|
|
// Use that instead of fetching all notifications
|
|
```
|
|
|
|
2. **Or: Fetch All for Counting**
|
|
```typescript
|
|
// Fetch up to 1000 notifications for counting
|
|
// Or use pagination to count all
|
|
```
|
|
|
|
3. **Or: Show "66+ unread" if limit reached**
|
|
```typescript
|
|
// If count === 100, show "100+ unread"
|
|
// Indicate there might be more
|
|
```
|
|
|
|
---
|
|
|
|
### **Priority 4: Improve Cache Strategy**
|
|
|
|
1. **Unified Cache Invalidation**
|
|
```typescript
|
|
// When count cache is invalidated, also invalidate list cache
|
|
// When list cache is invalidated, also invalidate count cache
|
|
// Keep them in sync
|
|
```
|
|
|
|
2. **Shorter Cache TTLs**
|
|
```typescript
|
|
// Count cache: 10 seconds (currently 30s)
|
|
// List cache: 1 minute (currently 5min)
|
|
// More frequent updates
|
|
```
|
|
|
|
3. **Cache Tags/Versioning**
|
|
```typescript
|
|
// Use cache version numbers
|
|
// Increment on invalidation
|
|
// Check version before using cache
|
|
```
|
|
|
|
---
|
|
|
|
### **Priority 5: Better Error Recovery**
|
|
|
|
1. **Graceful Degradation**
|
|
```typescript
|
|
// If mark as read fails, still invalidate cache
|
|
// Show error message to user
|
|
// Allow retry
|
|
```
|
|
|
|
2. **Retry Logic**
|
|
```typescript
|
|
// Retry failed operations automatically
|
|
// Exponential backoff
|
|
// Max 3 retries
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 **FLOW DIAGRAM: Current vs Improved**
|
|
|
|
### **Current Flow (Mark All As Read)**:
|
|
```
|
|
User clicks → Hook → API → Service → Adapter
|
|
↓
|
|
getLeantimeUserId() → FAILS ❌
|
|
↓
|
|
Returns false → Service: anySuccess = false
|
|
↓
|
|
Cache NOT invalidated ❌
|
|
↓
|
|
Count refresh → Gets stale cache → Shows 65 ❌
|
|
```
|
|
|
|
### **Improved Flow (Mark All As Read)**:
|
|
```
|
|
User clicks → Hook → API → Service → Adapter
|
|
↓
|
|
getLeantimeUserId() → Check cache first
|
|
↓
|
|
If cached: Use cached ID ✅
|
|
If not cached: Fetch from API → Cache result ✅
|
|
↓
|
|
Mark all as read → Success ✅
|
|
↓
|
|
Always invalidate cache (even on partial failure) ✅
|
|
↓
|
|
Count refresh → Gets fresh data → Shows 0 ✅
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 **IMPLEMENTATION PRIORITY**
|
|
|
|
1. **Fix getLeantimeUserId() caching** (High Priority)
|
|
- Add Redis cache for user ID mapping
|
|
- Add retry logic
|
|
- Better error handling
|
|
|
|
2. **Always invalidate cache** (High Priority)
|
|
- Invalidate cache even on failure
|
|
- Or invalidate before marking
|
|
|
|
3. **Fix count accuracy** (Medium Priority)
|
|
- Use dedicated count API or fetch all
|
|
- Show "66+ unread" if limit reached
|
|
|
|
4. **Improve cache strategy** (Medium Priority)
|
|
- Unified invalidation
|
|
- Shorter TTLs
|
|
- Cache versioning
|
|
|
|
5. **Better error recovery** (Low Priority)
|
|
- Graceful degradation
|
|
- Retry logic
|
|
- Better UX
|
|
|
|
---
|
|
|
|
**Status**: Analysis complete. Ready for implementation.
|
|
|