Correctly handling unsupported content#359
Correctly handling unsupported content#359starrify wants to merge 1 commit intoscrapinghub:masterfrom
Conversation
3631776 to
1853125
Compare
8365c7f to
0071e49
Compare
|
|
||
| def _on_unsupported_content(self, reply): | ||
| self.logger.log('Unsupported content detected', min_level=3) | ||
| self._is_unsupported_content = True |
There was a problem hiding this comment.
this attribute is set per browser tab and browser tab can be used to download multiple resources, if you set this attribute here will it affect other requests going through same webpage? e.g. script doing
function main (splash)
splash:go("url_with_unsupported_content") # this will set _is_unsupported_content to True
splash:go("url with supported content") # will this go through?
endother thing worth noting is that there could be multiple resources downloaded when rendering so e.g.
splash:go("some page")may issue 10 requests, first 6 will have supported content, but 7th will have unsupported content, what will happen with resources 7-10? will they be downloaded all right?
If only one among many resources is unsupported (e.g. there are many stylesheets and only one is corrupted) what response will user get?
In mockserver there is child resource called "subresources" you could probably add another similar test resource similar that would try to fetch one resource with unsupported content while rendering.
For now I cannot think of a good design for returning the downloaded unsupported content back to the client via existing endpoints (e.g.
/html,/png, and/har). The good news:/harendpoint now works as expected.Maybe later we can introduce something like
splash:raw_contentin the scripting support to deliver the content.