Support for Jupyter notebooks in excess of 1 MB

At present it appears as though `linguist` on GitHub handles large files (> 1MB) by reporting their byte count (or something similar) en lieu of their actual line count. For my repo [https://github.com/hansec/OpenFUSIONToolkit](https://github.com/hansec/OpenFUSIONToolkit), this manifests as the primary language being reported as "Jupyter Notebook", when the number of lines of code of this type is significantly less than the primary language (Fortran).

When running `linguist` on the repo in a container I find only two files with non-zero line counts. Counting individually, these total totaling 1,504 lines of code (as reported by linguist). However, when linguist is run on the full repo it reports `6677629` lines of Jupyter code. This is comparable too, but greater than, the total size of all Jupyter files at `6260766` bytes.

```
Jupyter Notebook:
src/examples/Marklin/cylinder/Marklin_ex1.ipynb
src/examples/TokaMaker/ITER/ITER_baseline_ex.ipynb
src/examples/TokaMaker/ITER/ITER_mesh_ex.ipynb
src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex1.ipynb
src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex2.ipynb
```

```
57.75%  6677629    Jupyter Notebook
36.04%  4167016    Fortran
4.46%   515698     Python
0.62%   72091      CSS
0.38%   44332      CMake
0.29%   33881      C
0.19%   21918      Makefile
0.14%   16676      C++
0.11%   13267      HTML
0.00%   520        Shell
```

```
.../src/examples/Marklin/cylinder/Marklin_ex1.ipynb: 594 lines (594 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook

.../src/examples/TokaMaker/ITER/ITER_baseline_ex.ipynb: 460 lines (460 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook

.../src/examples/TokaMaker/ITER/ITER_mesh_ex.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown

.../src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex1.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown

.../src/examples/TokaMaker/fixed_boundary/fixed_boundary_ex2.ipynb: 0 lines (0 sloc)
  type:      Text
  mime type: text/plain
  language:  Jupyter Notebook
  blob is too large to be shown
```

It appears as though there is a [1 MB limit](https://github.com/github-linguist/linguist/blob/559a6426942abcae16b6d6b328147476432bf6cb/lib/linguist/blob_helper.rb#L191) for notebooks and other files being handled in a standard way. While I certainly understand a file size limit, 1 MB seems particularly low for Jupyter notebooks that may include one or more images.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Jupyter notebooks in excess of 1 MB #6722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Jupyter notebooks in excess of 1 MB #6722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions