Hi team,
Thanks for migrating this gem to use Tika and replaced mimemagic gem, we're using the latest gem version on production and so far so good, great work, thank you for your hard work!
We just figured out that some certain xlsx and docx files which are uploaded from our users are being miss-detected as application/zip, same as this issue #35
But it only happen with some files that have a size larger than 64kb
Summary:
There were 3 xlsx files:
- test.xlsx => 5kb => mimetype: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- test2.xlsx => 30kb => mimetype: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- test3.xlsx => 368kb => mimetype: application/zip
The root cause of 3rd case is it's failed when executing a matching comparison for [Content_Types].xml with offset is 30:65536 while Google Docs/sheets have the fingerprint items at the end of the file.
Can we implement a negative offset to read from the end of the file for these cases?
Hi team,
Thanks for migrating this gem to use Tika and replaced mimemagic gem, we're using the latest gem version on production and so far so good, great work, thank you for your hard work!
We just figured out that some certain
xlsxanddocxfiles which are uploaded from our users are being miss-detected asapplication/zip, same as this issue #35But it only happen with some files that have a size larger than 64kb
Summary:
There were 3 xlsx files:
The root cause of 3rd case is it's failed when executing a matching comparison for
[Content_Types].xmlwith offset is30:65536while Google Docs/sheets have the fingerprint items at the end of the file.Can we implement a negative offset to read from the end of the file for these cases?