ChatGPT Extension

Enhancing ChatGPT with a custom browser extension

Jun 25, 2024

Overview

If you keep ChatGPT open while juggling tasks, the slowest part of the workflow is waiting for a reply that finishes off-screen. This extension solves that specific gap: track the request, notify when the stream stops, and show exactly what question is still pending.

The implementation stays deliberately small. A Manifest V3 content script sits in the ChatGPT tab, clocks the time between prompt and completion, and the service worker plays the audio cue. No background polling, no extra permissions—just a thin layer of observability on top of the UI.

Building it came down to a short checklist: detect when a prompt is submitted, monitor the DOM for the completion signal, and persist the timing so it is visible the next time you focus the tab. Every feature either reduces the "did it finish yet?" loop or captures the answer latency for later review.

What you end up with is a dependable helper that records the prompt, chimes when the response is done, and syncs a compact history via chrome.storage.sync. No render pipelines or experimental APIs—just the minimal instrumentation needed to make ChatGPT feel responsive.

Extension Architecture

Ship it like any other Manifest V3 project: a service worker, one declarative content script scoped to https://chat.openai.com, and a tiny options page for toggling the chime. The worker owns alarms, cross-tab messaging, and the synced history so the content script can stay focused on DOM events.

Inside the tab, the script maps the conversation ID from the URL, mirrors the active textarea, and overlays a "Waiting for response…" banner while the model streams. Every DOM observation feeds a simple state machine so we always know where the request sits: Idle → Prompted → Streaming → Complete. Each transition stamps the current time so the UI can report latency without re-processing the transcript.

Packaging was refreshingly dull. Rollup compiles the TypeScript into a single content bundle, we use chrome.scripting to inject it declaratively, and the chrome.action badge flips between grey and green depending on whether there is an unanswered prompt in the current tab. No experimental APIs, no heroic permissions—meaning the review queue in the Chrome Web Store stayed short.

Extension Runtime Flow

diagram.mmd

graph LR
  Prompt[User Prompt]
  CS[Content Script]
  SW[Service Worker]
  UI[ChatGPT DOM]
  Overlay[Overlay + Timer]
  Storage[(chrome.storage.sync)]

  Prompt -->|Submit| CS
  CS -->|setState: Prompted| Overlay
  CS -->|Watch DOM| UI
  UI -->|Token stream| CS
  CS -->|Emit completion| SW
  SW -->|Play chime| Prompt
  SW -->|Persist metrics| Storage
  Storage -->|Sync history| CS

Flip between the high-level graph and the raw Mermaid definition.

The content script proxies every prompt into a tiny state machine: on submit it marks the tab as Prompted and raises the overlay so you know a response is pending.

While ChatGPT streams tokens, the script watches DOM mutations and keeps timers alive. Nothing is persisted yet; we only emit metrics once a response is truly complete.

Completion hands control to the service worker, which plays the notification chime, stores timings in chrome.storage.sync, and broadcasts an update to each open tab.

Detecting ChatGPT Response Completion

Completion detection is the only tricky part because ChatGPT provides no "done" event. The approach here is mechanical: wire a to the conversation container, filter new nodes where data-message-author-role equals assistant, then check whether the node still contains [data-testid="result-streaming-indicator"] . When the indicator disappears, mark the prompt complete and emit the notification payload.

Defensive guards keep the observer resilient: bail out if the container disappears, validate every added node before trusting it, and re-attach when navigation swaps the conversation. Those guardrails stop the extension from reporting false positives when OpenAI shuffles class names or introduces new layout experiments.

Prompt Lifecycle State Machine

diagram.mmd

stateDiagram-v2
    [*] --> Idle: Page load
    Idle --> Prompted: Prompt submitted
    Prompted --> Streaming: Assistant token received
    Streaming --> Complete: Completion detected
    Streaming --> Idle: Prompt cancelled
    Complete --> Idle: User acknowledges

Toggle between the state chart and the raw Mermaid definition.

Idle is the default: the overlay sits dormant until you hit send. Submitting a prompt advances the state to Prompted, arming the timers and badge.

Once the first assistant token lands we enter Streaming. From here we only advance to Complete if the DOM observer sees a full assistant message without the streaming indicator; canceling a prompt drops straight back to Idle.

The transition into Complete is what the service worker listens for. That state change records durations, plays audio, and resets everything to Idle for the next prompt.

observer.js [5 highlights]

1	const root =
2	document.querySelector('[data-testid="conversation-turn"]')?.parentElement;
Tip Anchor the observer to the same container ChatGPT uses for each turn, then climb one level so subtree mutations capture every new message stream. The `data-testid` is the steadiest hook the UI exposes.
3
4	if (!root) {
5	throw new Error('Conversation container not found');
6	}
Important Hard-failing on layout drift is intentional: when the DOM no longer matches expectations we would rather throw, surface it in the console, and disable notifications than silently miss completions.
7
8	const observer = new MutationObserver((records) => {
9	const completed = records.some((record) => {
10	return [...record.addedNodes].some((node) => {
11	if (!(node instanceof HTMLElement)) return false;
12
13	const isAssistant = node.getAttribute('data-message-author-role') === 'assistant';
14	const streamIndicator = node.querySelector('[data-testid="result-streaming-indicator"]');
15
16	return isAssistant && !streamIndicator;
17	});
18	});
Note Each mutation batch is filtered down to newly added assistant turns that lack the streaming indicator. Leaning on semantic attributes keeps this resilient when class names churn, and the early returns avoid needless work during rapid-fire token streams.
19
20	if (!completed) return;
21
22	publishCompletionEvent({
23	type: 'chatgpt:response-complete',
24	conversationId: currentConversationId(),
25	completedAt: Date.now()
26	});
Tip The emitted payload is the contract with the service worker: a normalized event type, the active conversation ID, and a millisecond timestamp so the history log can compute latencies later.
27	});
28
29	observer.observe(root, { subtree: true, childList: true });
Note Watching both `childList` and `subtree` means we catch list item insertions as well as mutations deeper in the message hierarchy, which protects us from minor DOM reshuffles.

Once the event fires, the service worker plays the chime, updates the overlay, and increments a per-conversation counter. The next time you focus the tab you see the outstanding prompt, the elapsed time, and a button to replay the notification if you missed it.

Closing Thoughts

Nothing here is fancy—just a few well-placed observers and a standard MV3 setup—but the payback shows up immediately. You stop checking tabs, you capture real latency numbers, and you can tweak the workflow without waiting on OpenAI to expose new hooks. When the UI shifts again, swap out the selectors and keep moving.