Back to Blog
The Jupyter notebook bug that only crashes for other people

The Jupyter notebook bug that only crashes for other people

May 1, 20265 min read

Cell 0 uses df. Cell 1 defines df.

Notebook works for you because your kernel ran the cells in some other order and the variable's still in memory. You commit. Someone clones the repo, hits Restart and Run All, dies on cell 0.

Standard Python linters can't catch this. ruff, flake8, mypy operate on one source file at a time. A notebook is N cells whose execution order in your kernel may have nothing to do with their order on disk. The bug isn't inside any single cell. It's in the relationship between cells.

nborder is a static linter for that relationship.

Rules

Code

Flags

NB101

execution_count decreases in source order

NB201

Name used in cell N, only defined in cell M where M > N

NB102

Name used somewhere, never defined anywhere

NB103

Stochastic call (numpy, torch, tensorflow, stdlib random) before any seed

How the cross-cell analysis works

Each cell gets parsed with libCST. A visitor extracts symbol definitions (assignments, function defs, class defs, imports) and symbol uses (name references, attribute roots) per cell. Connect them across cells in source order, you get a dataflow graph at notebook scope.

NB201 findings are uses whose nearest matching definition lives in a later cell. NB102 findings are uses with no matching definition anywhere.

The graph also makes the auto-fix safe. When NB201 fires, the fixer runs a topological sort over cell dependency edges. Sort succeeds, cells get reordered to respect dataflow and execution counts get cleared. Cycle detected, fixer bails with an explicit message naming the cycle.

NB201 fix example

Input:

# cell 0
result = df.head()

# cell 1
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})

Enter fullscreen mode Exit fullscreen mode

Run nborder check --fix notebook.ipynb:

notebook.ipynb:cell_0:1:10: NB201 Variable `df` used in cell 0 is only defined in cell 1. The notebook will fail on Restart-and-Run-All. [*]
Fix outcomes:
  reorder: applied (reordered 2 cells and cleared execution counts)

Enter fullscreen mode Exit fullscreen mode

Output:

# cell 0
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})

# cell 1
result = df.head()

Enter fullscreen mode Exit fullscreen mode

Cell IDs preserved. Execution counts cleared. Second nborder check exits 0.

NB103 and seed injection

NB103 walks the same graph for stochastic calls (np.random.rand, torch.rand, tf.random.normal, random.random) firing before any matching seed. The fix injects a single seed cell at the right position. Multi-library notebooks get one cell:

import numpy as np
np.random.seed(42)
rng = np.random.default_rng(42)
import torch
torch.manual_seed(42)

Enter fullscreen mode Exit fullscreen mode

Alias-aware. import numpy as numpy_lib produces a seed line using numpy_lib, not a redundant fresh import. After fixing a NumPy notebook, computed cell outputs are byte-identical across consecutive jupyter nbconvert --execute runs.

JAX and scikit-learn get diagnostic-only handling. JAX needs PRNGKey threading through call signatures. sklearn random_state=None needs a value chosen against your testing strategy. Neither is a single line you can inject.

Byte-stable writer

Parse a notebook, modify nothing, write it back, bytes match exactly. Verified against nbformat v4.0, v4.4, v4.5 fixtures plus a real-world notebook corpus. When the writer does mutate during a fix, only the cells that actually changed get rewritten. Cell IDs, metadata, and unrelated cells stay verbatim.

Outputs

Four reporters:

  • text: ruff-style path:cell:line:col: NB### message
  • json: machine-readable
  • github: ::error file=...,line=...,title=NB201:: annotations for PR inline comments
  • sarif: SARIF 2.1.0, schema-validated

Pre-commit hook and a composite GitHub Action included:

- uses: moonrunnerkc/nborder@v0.1.4
  with:
    path: notebooks/
    select: NB201,NB103

Enter fullscreen mode Exit fullscreen mode

What it doesn't do

  • Doesn't execute notebooks. Pair with nbval or papermill for kernel-level validation.
  • Doesn't lint cell-internal style. That's nbqa.
  • Dynamic name resolution (exec, getattr, **kwargs, monkey-patching) is invisible. Same limitation as any static analyzer.
  • Cell magics are stripped before analysis. Names introduced by %%capture get tracked. Anything magic-internal does not.

Install

pip install nborder
nborder check path/to/notebooks/

Enter fullscreen mode Exit fullscreen mode

Python 3.10+.

moonrunnerkc / nborder

A fast, opinionated linter and auto-fixer for Jupyter notebook hidden-state and execution-order bugs.

nborder

A fast, opinionated linter and auto-fixer for Jupyter notebook hidden-state and execution-order bugs.

What this catches

Code

Name

One-line example

NB101

Non-monotonic execution counts

Cell 1 ran with In [3]: after cell 0 ran with In [5]:.

NB102

Won't survive Restart-and-Run-All

print(df) references a name no cell in the notebook defines.

NB201

Use-before-assign across cells

Cell 0 uses df; df = ... only appears in cell 1.

NB103

Stochastic library used without seed

np.random.rand(3) runs with no seed call before it.

Each rule has a docs page under docs/rules/ explaining the bug class, a bad and good example, and the auto-fix behaviour. The four sections below walk through each rule with the diagnostic nborder actually emits.

NB101: out-of-order execution

The execution_count field on each cell records the order Jupyter actually ran cells in, not the order they appear in the file. When those orders disagree, the recorded…

View on GitHub


Source: Dev.to

Related Posts