Skip to content

Support for Jupyter notebooks in excess of 1 MB #6722

@hansec

Description

@hansec

At present it appears as though linguist on GitHub handles large files (> 1MB) by reporting their byte count (or something similar) en lieu of their actual line count. For my repo https://github.com/hansec/OpenFUSIONToolkit, this manifests as the primary language being reported as "Jupyter Notebook", when the number of lines of code of this type is significantly less than the primary language (Fortran).

When running linguist on the repo in a container I find only two files with non-zero line counts. Counting individually, these total totaling 1,504 lines of code (as reported by linguist). However, when linguist is run on the full repo it reports 6677629 lines of Jupyter code. This is comparable too, but greater than, the total size of all Jupyter files at 6260766 bytes.

Jupyter Notebook:
src/examples/Marklin/cylinder/Marklin_ex1.ipynb
src/examples/TokaMaker/ITER/ITER_baseline_ex.ipynb
src/examples/TokaMaker/ITER/ITER_mesh_ex.ipynb
src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex1.ipynb
src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex2.ipynb
57.75%  6677629    Jupyter Notebook
36.04%  4167016    Fortran
4.46%   515698     Python
0.62%   72091      CSS
0.38%   44332      CMake
0.29%   33881      C
0.19%   21918      Makefile
0.14%   16676      C++
0.11%   13267      HTML
0.00%   520        Shell
.../src/examples/Marklin/cylinder/Marklin_ex1.ipynb: 594 lines (594 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook

.../src/examples/TokaMaker/ITER/ITER_baseline_ex.ipynb: 460 lines (460 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook

.../src/examples/TokaMaker/ITER/ITER_mesh_ex.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown

.../src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex1.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown

.../src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex2.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown

It appears as though there is a 1 MB limit for notebooks and other files being handled in a standard way. While I certainly understand a file size limit, 1 MB seems particularly low for Jupyter notebooks that may include one or more images.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions