Skip to content

ETT-477 Update under_server references in imgsrv#196

Open
moseshll wants to merge 3 commits intomainfrom
ETT-477_under_server
Open

ETT-477 Update under_server references in imgsrv#196
moseshll wants to merge 3 commits intomainfrom
ETT-477_under_server

Conversation

@moseshll
Copy link
Copy Markdown
Contributor

  • Remove code that de-restricts when not under server (e.g., $self->restricted(0) unless ( Debug::DUtils::under_server() );)
    • SRV/Article/EPUB.pm
    • SRV/Article/PDF.pm
    • SRV/Volume/Base.pm
    • SRV/Volume/EPUB.pm
    • SRV/Volume/HTML.pm
    • SRV/Volume/PDF.pm
    • SRV/Volume/Remediated/Bundle.pm
  • Remove conditional use of SRV::Utils::Stream for output (e.g., if ( ! $self->output_filename || Debug::DUtils::under_server() ))
    • SRV/Article/PDF.pm
    • SRV/Volume/PDF.pm
  • Completely de-restrict SRV/Cover.pm
  • Always translate to character entities in DEBUG output
    • SRV/SearchUtils.pm
  • Explicitly initialize super to undef in _default_params() since I believe the existing code is wrong
    • SRV/Volume/Base.pm
  • Miscellaneous logic simplifications
    • SRV/Image.pm
    • SRV/Utils.pm
  • Update psgi apps to use SRV::Utils::under_server
    • FIXME: do we even need that?
  • Remove Article/JATS support

- Remove code that de-restricts when not under server (e.g., `$self->restricted(0) unless ( Debug::DUtils::under_server() );`)
  - `SRV/Article/EPUB.pm`
  - `SRV/Article/PDF.pm`
  - `SRV/Volume/Base.pm`
  - `SRV/Volume/EPUB.pm`
  - `SRV/Volume/HTML.pm`
  - `SRV/Volume/PDF.pm`
  - `SRV/Volume/Remediated/Bundle.pm`
- Remove conditional use of `SRV::Utils::Stream` for output (e.g., `if ( ! $self->output_filename || Debug::DUtils::under_server() )`)
  - `SRV/Article/PDF.pm`
  - `SRV/Volume/PDF.pm`
- Completely de-restrict SRV/Cover.pm
- Always translate to character entities in DEBUG output
  - `SRV/SearchUtils.pm`
- Explicitly initialize `super` to `undef` in `_default_params()` since I believe the existing code is wrong
  - `SRV/Volume/Base.pm`
- Miscellaneous logic simplifications
  - `SRV/Image.pm`
  - `SRV/Utils.pm`
- Update psgi apps to use `SRV::Utils::under_server`
  - FIXME: do we even need that?
Comment thread imgsrv/apps/download.psgi
builder {

if ( under_server() ) {
if ( SRV::Utils::under_server() ) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, do we even need these at all? Applies to this file and imgsrv.psgi as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about URLFixer in general. If there are percent-escaped ';' or '=' characters in the URL, then it will unescape the entire URL and redirect to that. That's a problem we see with the ChatGPT-created queries to ssd.

I could imagine some kind of redirection attack though -- basically tricking imgsrv to redirecting to something unexpected - but I'm not sure exactly what the threat model would be.

It may be worthwhile seeing if there's evidence of legit requests to imgsrv that would trigger this (i.e. requests to imgsrv containing %3B or %3D), and if we can't find any, we just disable URLFixer entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly can't tell. It appears that the redirect is sticking the URL with escaped values into the http_referer, so I can't see what the upstream source is, that is, whether it's one of ours or something else that might hold some meaning. I will rely on some more guidance to answer this question, perhaps, and expect to return to this question.

Copy link
Copy Markdown
Contributor Author

@moseshll moseshll Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: we can't completely remove under_server or URLFixer as things stand now. So here are two sample requests from the playwright tests. The second is called with Plack::App::Command so the actual download action is not under_server(). URLFixer is only enabled when under_server(), the (second, below) request that comes back from the frontend contains %3d and... for some reason it's bad to fix it??? It looks like doing so would result in decoding a part of the URL (download_url) that is meant to be left encoded. Enabling URLFixer across the board kills the playwright tests for some reason. Not sure why. I'm not sure if this is in scope for the current ticket. Why on earth are these URLs so bloody convoluted?

 - /cgi/imgsrv/download/epub?id=test.pd_open&callback=tunnelCallback&_=1774990829729
 - /epub?progress_filepath=/htapps/babel/cache/download/2K16.118abf7093dae5452990834aa55749f1430a1454497252f63dab189fcec3707780/2K16.118abf7093dae5452990834aa55749f1430a1454497252f63dab189fcec3707780__progress&download_url=/cgi/imgsrv/download/epub?id=test.pd_open%3Bmarker=2K16.118abf7093dae5452990834aa55749f1430a1454497252f63dab189fcec3707780%3Battachment=1&id=test.pd_open

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some tracing I understand better what is happening. image.psgi and download.psgi can both be called via Plack::App::Command->run and thus with under_server FALSE. I didn't realize that background processing was done by the bucket chain of

SRV::Volume::Base->_background (or subclass)
SRV::Utils::run_command
IPC::Run::run
bin/start.sh
bin/pdf (or similar)
Plack::App::Command->run(..., apps/download.psgi) (or imgsrv.psgi)

and by the time this gets to URLFixer (yes ultimately we're talking about that) the REQUEST_URI in the environment is munged all to heck. So I would guess that's why that particular middleware is selectively enabled.

Copy link
Copy Markdown
Contributor Author

@moseshll moseshll Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help answer the original question, however, %3B and %3D only seem to be showing up in unauthenticated requests from outside the US. I would tentatively recommend disabling URLFixer, though perhaps not removing it right away.

# this can only be passed from the command line
$params{super} = undef;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the existing code should be if instead of unless because it makes no sense otherwise based on the comment. I don't know if this change is more secure big picture wise, but it is simpler.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't make any sense. We should check if we remove it that debug=super still works & works only for logged-in users with appropriate permissions.

@moseshll moseshll marked this pull request as ready for review March 20, 2026 16:14
@moseshll moseshll requested a review from aelkiss March 20, 2026 16:14
Copy link
Copy Markdown
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of questions, mostly if we should get rid of some of the other under_server checks entirely, and making sure my understanding is the same.

Comment thread imgsrv/apps/download.psgi
builder {

if ( under_server() ) {
if ( SRV::Utils::under_server() ) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about URLFixer in general. If there are percent-escaped ';' or '=' characters in the URL, then it will unescape the entire URL and redirect to that. That's a problem we see with the ChatGPT-created queries to ssd.

I could imagine some kind of redirection attack though -- basically tricking imgsrv to redirecting to something unexpected - but I'm not sure exactly what the threat model would be.

It may be worthwhile seeing if there's evidence of legit requests to imgsrv that would trigger this (i.e. requests to imgsrv containing %3B or %3D), and if we can't find any, we just disable URLFixer entirely.

Comment thread imgsrv/apps/download.psgi
enable "PopulateENV", app_name => 'imgsrv';

enable_if { (under_server() && $ENV{HT_DEV}) } 'StackTrace';
enable_if { (SRV::Utils::under_server() && $ENV{HT_DEV}) } 'StackTrace';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we think of any reason not to enable solely based on HT_DEV or (maybe better) the plack environment?

Comment thread imgsrv/apps/download.psgi
enable_if { (SRV::Utils::under_server() && ! $ENV{HT_DEV}) } "HTTPExceptions", rethrow => 0;

if ( under_server() ) {
if ( SRV::Utils::under_server() ) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to just always enable these -- given we do local development in docker, I can't think of a case where we wouldn't be under_server.

# this can only be passed from the command line
$params{super} = undef;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't make any sense. We should check if we remove it that debug=super still works & works only for logged-in users with appropriate permissions.

Comment thread imgsrv/lib/SRV/Cover.pm
my $ext = shift;
my $output_filename =
SRV::Utils::generate_output_filename($env, [ $self->file, $self->mode, 'full', $self->size, '0', $self->quality, $self->restricted, 'ZZZ' ], $ext);
SRV::Utils::generate_output_filename($env, [ $self->file, $self->mode, 'full', $self->size, '0', $self->quality, 'ZZZ' ], $ext);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removing $self->restricted here threw me for a loop, but it looks like generate_output_filename just makes a hash of all the options, and if covers are never restricted then having it in the hash doesn't matter. It does change the cache key for existing cover images, though, so I might be inclined to keep it in the hash?

Comment thread imgsrv/lib/SRV/Cover.pm
my $restricted = $self->restricted;
unless ( defined $restricted ) {
# $restricted = $C->get_object('Access::Rights')->assert_final_access_status($C, $gId) ne 'allow';
$restricted = $$env{'psgix.restricted'};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be other ways (throttling, etc) where psgix.restricted gets set, even though in general covers should be unrestricted?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess the point is that restricted could be set, but it doesn't matter because we serve the cover anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants