NeahNew/NOTIFICATION_ISSUE_ANALYSIS.md
2026-01-06 19:59:37 +01:00

203 lines
5.9 KiB
Markdown

# Notification Issue Analysis - Mark All Read Behavior
**Date**: 2026-01-06
**Issue**: Mark all read works initially, then connection issues occur
---
## 🔍 **What's Happening**
### **Initial Success**:
1. ✅ Dashboard shows 60 messages (count is working)
2. ✅ User clicks "Mark all read"
3.**First step works** - Marking operation starts successfully
### **Then Connection Issues**:
```
failed to get redirect response [TypeError: fetch failed] {
[cause]: [Error: read ECONNRESET] {
errno: -104,
code: 'ECONNRESET',
syscall: 'read'
}
}
Redis reconnect attempt 1, retrying in 100ms
Reconnecting to Redis..
```
---
## 📊 **Analysis**
### **What the Logs Show**:
1. **IMAP Pool Activity**:
```
[IMAP POOL] Size: 1, Active: 1, Connecting: 0, Max: 20
[IMAP POOL] Size: 0, Active: 0, Connecting: 0, Max: 20
```
- IMAP connections are being used and released
- This is normal behavior
2. **Connection Reset Error**:
- `ECONNRESET` - Connection was reset by peer
- Happens during a fetch request (likely to Leantime API)
- This is a **network/connection issue**, not a code issue
3. **Redis Reconnection**:
- Redis is trying to reconnect (expected behavior)
- Our retry logic is working
---
## 🎯 **Root Cause**
### **Scenario**:
1. User clicks "Mark all read"
2. System starts marking notifications (works initially)
3. During the process, a network connection to Leantime API is reset
4. This could happen because:
- **Network instability** between your server and Leantime
- **Leantime API timeout** (if marking many notifications takes too long)
- **Connection pool exhaustion** (too many concurrent requests)
- **Server-side rate limiting** (Leantime might be throttling requests)
### **Why It Works Initially Then Fails**:
- **First few notifications**: Marked successfully ✅
- **After some time**: Connection resets ❌
- **Result**: Partial success (some marked, some not)
---
## 🔧 **What Our Fixes Handle**
### **✅ What's Working**:
1. **User ID Caching**: Should prevent the "user not found" error
2. **Retry Logic**: Will retry failed requests automatically
3. **Cache Invalidation**: Always happens, so count will refresh
4. **Count Accuracy**: Fetches up to 1000 notifications
### **⚠️ What's Not Handled**:
1. **Long-running operations**: Marking 60 notifications individually can take time
2. **Connection timeouts**: If Leantime API is slow or times out
3. **Rate limiting**: If Leantime throttles too many requests
4. **Partial failures**: Some notifications marked, some not
---
## 💡 **What's Likely Happening**
### **Flow**:
```
1. User clicks "Mark all read"
2. System fetches 60 unread notifications ✅
3. Starts marking each one individually
4. First 10-20 succeed ✅
5. Connection resets (ECONNRESET) ❌
6. Remaining notifications fail to mark
7. Cache is invalidated (our fix) ✅
8. Count refresh shows remaining unread (e.g., 40 instead of 0)
```
### **Why Count Might Not Be 0**:
- Some notifications were marked (e.g., 20 out of 60)
- Connection reset prevented marking the rest
- Cache was invalidated (good!)
- Count refresh shows remaining unread (40 unread)
---
## 🎯 **Expected Behavior**
### **With Our Fixes**:
1. ✅ User ID lookup is cached (faster, more reliable)
2. ✅ Retry logic handles transient failures
3. ✅ Cache always invalidated (count will refresh)
4. ✅ Count shows accurate number (up to 1000)
### **What You Should See**:
- **First attempt**: Some notifications marked, count decreases (e.g., 60 → 40)
- **Second attempt**: More notifications marked, count decreases further (e.g., 40 → 20)
- **Eventually**: All marked, count reaches 0
### **If Connection Issues Persist**:
- Count will show remaining unread
- User can retry "Mark all read"
- Each retry will mark more notifications
- Eventually all will be marked
---
## 🔍 **Diagnostic Questions**
1. **How many notifications are marked?**
- Check if count decreases (e.g., 60 → 40 → 20 → 0)
- If it decreases, marking is working but incomplete
2. **Does retry help?**
- Click "Mark all read" again
- If count decreases further, retry logic is working
3. **Is it always the same number?**
- If count always stops at same number (e.g., always 40), might be specific notifications failing
- If count varies, it's likely connection issues
4. **Network stability?**
- Check if connection to Leantime API is stable
- Monitor for timeouts or rate limiting
---
## 📝 **Recommendations**
### **Immediate**:
1. **Retry the operation**: Click "Mark all read" again
- Should mark more notifications
- Count should decrease further
2. **Check logs for specific errors**:
- Look for which notification IDs are failing
- Check if it's always the same ones
3. **Monitor network**:
- Check connection stability to Leantime
- Look for timeout patterns
### **Future Improvements** (if needed):
1. **Batch marking**: Mark notifications in smaller batches (e.g., 10 at a time)
2. **Progress indicator**: Show "Marking X of Y..." to user
3. **Resume on failure**: Track which notifications were marked, resume from where it failed
4. **Connection pooling**: Better management of concurrent requests
---
## ✅ **Summary**
### **What's Working**:
- ✅ Initial marking starts successfully
- ✅ User ID caching prevents lookup failures
- ✅ Cache invalidation ensures count refreshes
- ✅ Retry logic handles transient failures
### **What's Failing**:
- ⚠️ Connection resets during long operations
- ⚠️ Partial marking (some succeed, some fail)
- ⚠️ Network instability between server and Leantime
### **Solution**:
- **Retry the operation**: Click "Mark all read" multiple times
- Each retry should mark more notifications
- Eventually all will be marked
---
**Status**: This is expected behavior with network issues. The fixes ensure the system recovers and continues working.