Kitchen RGBT Annotation

Sign in with your Google account to access the annotation tool.

Kitchen RGBT Video Annotation

View:

1. Select Video

Question coverage: 0 questions 1-2 3+
Start: -- | End: --
Hover on thermal video (T/RT) to see exact temperature from IR sensor

2. Select Modality

3. Question Type

4. Question Intent & Generation

5. Options (MCQ)

Thermal range (optional): °C → °C (helps generate better options)

5. Reference Answer (Open-ended)

LLM Review (GPT-5.4)

6. Taxonomy

Run LLM Review above to auto-fill, or select manually.

Model Predictions

Existing Quiz Entries (0)

QA List (0)

Select a QA from the list

--

Edit

Filter:
QID Author Subskill Modality Reviews Verdict

--

This is your own question — please don't review it yourself.
Previous Reviews

LLM Review

Your Review

Admin Override

Reviewer Progress

Project Progress

Leaderboard

User Submitted Goal Submit Progress Reviewed Goal Review Progress

Video Gallery

Click a video to see its annotation timeline.

--

0 questions 1-2 3+

Annotation Prompt

Copy this prompt into your AI assistant (ChatGPT, Claude, etc.) along with the video context. It will help you write good questions.

You are helping create questions for THERMAL-KITCHENS, a benchmark
that tests whether AI models can understand cooking videos using both
RGB (visible) and thermal infrared cameras.

The core challenge: thermal cameras reveal physical heat state that
RGB cannot. The best questions are ones where RGB appearance is
misleading or insufficient, and thermal evidence is required to reach
the correct answer.

Key concepts in this benchmark:
- Latent physical state: heat properties invisible to RGB even under
  good lighting (residual heat, hotspots, contact traces)
- Cross-modal conflict: RGB appearance actively misleads
  (visible steam but cold liquid, seared surface but raw interior,
  reheated food that looks hot but is unevenly heated)
- Thermal grounding: connecting thermal evidence to specific objects,
  regions, or actions in the scene
- Hazard persistence: dangerous heat that remains after the visible
  cause has ended (burner off but surface still hot)
- Temporal thermal evolution: how heat state changes, spreads,
  or dissipates over the course of a clip

---

## What makes a good question

The question must require thermal evidence that RGB cannot provide.
Ask about heat states, temperature distributions, or thermal changes
that are invisible to the naked eye.

Good scenarios (aligned with benchmark design):
- Residual heat after action ends: burner off, pan still hot,
  RGB shows no active heat source
- Cross-modal conflict: visible steam/smoke suggests heat,
  thermal reveals actual low temperature
- Uneven reheating: food looks uniformly warm from RGB,
  thermal shows cold center
- Contact hazard persistence: object touched or heated,
  hand has left, thermal trace remains
- New cold object introduced: fresh ingredient or cold utensil
  placed into hot environment, thermal contrast visible
- Heat transfer between objects: conduction through cookware,
  thermal spread across surfaces

---

## What makes a bad question

Avoid these patterns — they will be rejected:

- Commonsense bypass: the answer is obvious without watching the video
  ("Is the oil hot after frying for 10 minutes?" — everyone knows yes)
- Single-frame sufficient: a single thermal frame answers the question;
  no need to watch the clip at all
- RGB bypass: visible cues (bubbling, smoke, color change, action)
  already answer the question without thermal
- Food science dependency: the answer requires knowing specific
  temperature thresholds (starch gelatinization, safe internal temps)
  rather than reading observable evidence from the video
- Threshold language: avoid "boiling point", "near-boiling",
  "safe temperature" — use relative descriptions instead
  ("hotter than", "cooler than", "no observable change")
- Temporal mismatch: the question claims to require the full clip
  but a single end frame would suffice

---

## Option design rules

All four options must be on the same judgment dimension.
Do not mix:
- Present-state descriptions with historical trajectory claims
- Temperature thresholds with relative comparisons
- Yes/No prefixes with standalone statements

At least one distractor should reflect the natural RGB-only
misinterpretation — what a model would answer if it only saw the
visible camera.

Options must be mutually exclusive. If two options could both be
true at the same time, redesign them.

---

## Examples

GOOD question (cross-modal conflict):
"Based on both the visible appearance and the thermal evidence,
what is the most accurate assessment of the liquid's temperature
trend over this period?"
- Works because: RGB (steam) suggests heat, thermal may contradict
- Requires: both modalities, neither alone suffices

BAD question (commonsense bypass):
"After stir-frying for 10 minutes, is the wok hot?"
- Fails because: anyone answers yes without watching anything

GOOD question (heat distribution):
"Based on the thermal evidence, how does the thermal state of the
newly added layer compare to the existing layers by the end of
the clip?"
- Works because: requires observing thermal contrast between objects
- RGB only shows that something was added, not its temperature

BAD question (food science dependency):
"Has the chicken reached a safe internal cooking temperature?"
- Fails because: requires knowing the safe temperature threshold,
  not reading observable thermal evidence

GOOD question (temporal, residual heat):
"Over the course of the clip, which surface retains the highest
thermal reading the longest after the heat source is removed?"
- Works because: requires tracking thermal change across time
- No actions or appliances named in the stem

BAD question (single-frame sufficient):
"By the end of the clip, is the pan hot or cold?"
- Fails because: one thermal frame at the end answers this completely

GOOD question (contact hazard persistence):
"Based on the thermal evidence, is there any observable thermal
trace on the surface after the last contact event in the clip?"
- Works because: requires temporal observation of thermal persistence
- RGB cannot detect residual heat from touch

BAD question (RGB bypass):
"After the lid is placed on the pot, does steam escape?"
- Fails because: steam is visible in RGB, no thermal needed