alma/NeahNew

Fork 0

alma cbba3c14c7 Refactor Notification BIG

2026-01-06 19:59:37 +01:00

5.9 KiB

Raw Blame History

Notification Issue Analysis - Mark All Read Behavior

Date: 2026-01-06
Issue: Mark all read works initially, then connection issues occur

🔍 What's Happening

Initial Success:

✅ Dashboard shows 60 messages (count is working)
✅ User clicks "Mark all read"
✅ First step works - Marking operation starts successfully

Then Connection Issues:

failed to get redirect response [TypeError: fetch failed] {
  [cause]: [Error: read ECONNRESET] {
    errno: -104,
    code: 'ECONNRESET',
    syscall: 'read'
  }
}
Redis reconnect attempt 1, retrying in 100ms
Reconnecting to Redis..

📊 Analysis

What the Logs Show:

IMAP Pool Activity:

[IMAP POOL] Size: 1, Active: 1, Connecting: 0, Max: 20
[IMAP POOL] Size: 0, Active: 0, Connecting: 0, Max: 20

IMAP connections are being used and released
This is normal behavior

Connection Reset Error:
- ECONNRESET - Connection was reset by peer
- Happens during a fetch request (likely to Leantime API)
- This is a network/connection issue, not a code issue
Redis Reconnection:
- Redis is trying to reconnect (expected behavior)
- Our retry logic is working

🎯 Root Cause

Scenario:

User clicks "Mark all read"
System starts marking notifications (works initially)
During the process, a network connection to Leantime API is reset
This could happen because:
- Network instability between your server and Leantime
- Leantime API timeout (if marking many notifications takes too long)
- Connection pool exhaustion (too many concurrent requests)
- Server-side rate limiting (Leantime might be throttling requests)

Why It Works Initially Then Fails:

First few notifications: Marked successfully ✅
After some time: Connection resets ❌
Result: Partial success (some marked, some not)

🔧 What Our Fixes Handle

✅ What's Working:

User ID Caching: Should prevent the "user not found" error
Retry Logic: Will retry failed requests automatically
Cache Invalidation: Always happens, so count will refresh
Count Accuracy: Fetches up to 1000 notifications

⚠️ What's Not Handled:

Long-running operations: Marking 60 notifications individually can take time
Connection timeouts: If Leantime API is slow or times out
Rate limiting: If Leantime throttles too many requests
Partial failures: Some notifications marked, some not

💡 What's Likely Happening

Flow:

1. User clicks "Mark all read"
   ↓
2. System fetches 60 unread notifications ✅
   ↓
3. Starts marking each one individually
   ↓
4. First 10-20 succeed ✅
   ↓
5. Connection resets (ECONNRESET) ❌
   ↓
6. Remaining notifications fail to mark
   ↓
7. Cache is invalidated (our fix) ✅
   ↓
8. Count refresh shows remaining unread (e.g., 40 instead of 0)

Why Count Might Not Be 0:

Some notifications were marked (e.g., 20 out of 60)
Connection reset prevented marking the rest
Cache was invalidated (good!)
Count refresh shows remaining unread (40 unread)

🎯 Expected Behavior

With Our Fixes:

✅ User ID lookup is cached (faster, more reliable)
✅ Retry logic handles transient failures
✅ Cache always invalidated (count will refresh)
✅ Count shows accurate number (up to 1000)

What You Should See:

First attempt: Some notifications marked, count decreases (e.g., 60 → 40)
Second attempt: More notifications marked, count decreases further (e.g., 40 → 20)
Eventually: All marked, count reaches 0

If Connection Issues Persist:

Count will show remaining unread
User can retry "Mark all read"
Each retry will mark more notifications
Eventually all will be marked

🔍 Diagnostic Questions

How many notifications are marked?
- Check if count decreases (e.g., 60 → 40 → 20 → 0)
- If it decreases, marking is working but incomplete
Does retry help?
- Click "Mark all read" again
- If count decreases further, retry logic is working
Is it always the same number?
- If count always stops at same number (e.g., always 40), might be specific notifications failing
- If count varies, it's likely connection issues
Network stability?
- Check if connection to Leantime API is stable
- Monitor for timeouts or rate limiting

📝 Recommendations

Immediate:

Retry the operation: Click "Mark all read" again
- Should mark more notifications
- Count should decrease further
Check logs for specific errors:
- Look for which notification IDs are failing
- Check if it's always the same ones
Monitor network:
- Check connection stability to Leantime
- Look for timeout patterns

Future Improvements (if needed):

Batch marking: Mark notifications in smaller batches (e.g., 10 at a time)
Progress indicator: Show "Marking X of Y..." to user
Resume on failure: Track which notifications were marked, resume from where it failed
Connection pooling: Better management of concurrent requests

✅ Summary

What's Working:

✅ Initial marking starts successfully
✅ User ID caching prevents lookup failures
✅ Cache invalidation ensures count refreshes
✅ Retry logic handles transient failures

What's Failing:

⚠️ Connection resets during long operations
⚠️ Partial marking (some succeed, some fail)
⚠️ Network instability between server and Leantime

Solution:

Retry the operation: Click "Mark all read" multiple times
Each retry should mark more notifications
Eventually all will be marked

Status: This is expected behavior with network issues. The fixes ensure the system recovers and continues working.

5.9 KiB Raw Blame History