feat: make pyarrow hot fix optional#11977
feat: make pyarrow hot fix optional#11977aandrestrumid wants to merge 5 commits intoibis-project:mainfrom
Conversation
|
This makes me wonder: should ibis, as a library, even be applying the hotfix for users? Or should we leave that up to the user to control when and how to do that? From the comment, it looks like the hotfix is a patch for https://nvd.nist.gov/vuln/detail/cve-2023-47248, which is a RCE vulnerability when reading malicious files. In all of our tests and CI environments, I don't think we read arbitrary files, so I don't think we need the hotfix for CI for OUR protection. I assume we are applying it more as a courtesy to end users. There is something to be said for applying it when an ibis maintainer is mucking around with their dev env and might try reading some malicious file, but in our lockfile we use pyarrow 23, so we shouldn't be exposed to this unless the maintainer intentionally installs an old version pf pyarrow, but I think that is rare enough that we should pretty much ignore it. So, what if we
This would
I took the download counts by pyarrow version over the last 3 months from https://pepy.tech/projects/pyarrow and it does look like ~20% of users are still downloading unsafe versions of pyarrow Details of Pyarrow downloads by versionAny thoughts? If we don't do this direction I suggest, then this PR looks mostly right to me, I can take another more thorough review and then approve. |
|
Like, this change only removes the pyarrow-hotfix dep for CI. For users, they ALWAYS get it installed, because python dep specification doesn't allow us to define the semantics of "package X is required IFF package Y version < 14.0.0". So I'm curious, what is the actual reason behind this PR? What is the benefit of making pyarrow hotfix optional only in this one CI job? If that is the only benefit, then I think the current implementation is simpler, and I don't really see the downside? Is there some downside for a user to apply the hotfix when they have a pyarrow version > 14? |
|
Hi @NickCrews, thanks for looking into this.
For context, I am using ibis with bigquery, but I had it installed as It was working for me because I had the other dependencies installed already (by coincidence). But then I removed I think for this PR to make sense we should also remove all However I do agree that it's currently simpler. The overhead of |
|
Thanks, that story of how you ended up here makes sense. I bet other users could have that same experience. What about this: # ibis/__init__.py
def try_apply_pyarrow_hotfix():
"""If pyarrow and pyarrow_hotifx are available, apply on pyarrow < 14.0.1.
This is a patch for https://nvd.nist.gov/vuln/detail/cve-2023-47248
"""
# Optional config, IDK if we want this or if there is a better way
import os
if os.env.get("IBIS_PYARROW_HOTFIX_SKIP"):
return
try:
import pyarrow as pa
except ImportError:
return
if tuple(int(x) for x in pa.__version__.split(".")[:3]) > (14, 0, 0):
return
try:
import pyarrow_hotfix # noqa: F401
except ImportError:
warnings.warn(f"""You are using an old version of pyarrow ({pa.__version__}) that is vulnerable to security vulnerability https://nvd.nist.gov/vuln/detail/cve-2023-47248. Either upgrade pyarrow to version >14, or install `pyarrow-hotfix` to patch the old version. Set `os.env["IBIS_PYARROW_HOTFIX_SKIP"] = 'true'` to skip this autopatch and silence this warning.""")
try_apply_pyarrow_hotfix()
If we wanted to avoid the above perf impact, and only import when needed, we could instead put the above implementation in |
|
I've updated the PR to apply the hot fix with a simple import To summarize I see 5 solutions:
|
7a3a587 to
73b02b5
Compare
| gen_name, | ||
| normalize_filename, | ||
| normalize_filenames, | ||
| warn_deprecated, |
There was a problem hiding this comment.
If it's easy, revert this change that is just restyling so our git diff is cleaner. But not a blocker if you don't get to it.
| ) | ||
| from ibis.backends.sql import SQLBackend | ||
| from ibis.backends.sql.compilers.base import STAR, AlterTable, C, RenameTable | ||
| from ibis.common import import_to_try_pyarrow_hotfix # noqa: F401 |
There was a problem hiding this comment.
This got moved up here to the top level. So now as soon as you import this file, you need pyarrow installed. I'm guessing you did this to reduce the number of imports from 3 to 1, which is a nice simplification. But I think we should remain more conservative, and only import_to_try_pyarrow_hotfix lazily when we actually DO need pyarrow. So move this back to the original callsites.
The one exception is I don't think we need to import_to_try_pyarrow_hotfix in the if TYPE_CHECKING block?? So I think you can just delete that one??
|
Thanks for your work here. Those summaries are great. I think we go ahead with your solution 5, the one that this PR implements. Let's switch the actual implementation to this though:
# ibis/common/import_to_try_pyarrow_hotfix
def try_apply_pyarrow_hotfix():
"""If pyarrow and pyarrow_hotifx are available, apply on pyarrow < 14.0.0.
This is a patch for https://nvd.nist.gov/vuln/detail/cve-2023-47248.
See https://github.com/ibis-project/ibis/pull/11977 for discussion
"""
# Optional config, IDK if we want this or if there is a better way
import os
if os.env.get("IBIS_PYARROW_HOTFIX_SKIP"):
return
import pyarrow as pa
if tuple(int(x) for x in pa.__version__.split(".")[:3]) > (14, 0, 0):
return
try:
import pyarrow_hotfix # noqa: F401
except ImportError:
warnings.warn(f"""You are using an old version of pyarrow ({pa.__version__}) that is vulnerable to security vulnerability https://nvd.nist.gov/vuln/detail/cve-2023-47248. Either upgrade pyarrow to version >14, or install `pyarrow-hotfix` to patch the old version. Set `os.env["IBIS_PYARROW_HOTFIX_SKIP"] = 'true'` to skip this autopatch and silence this warning.""")
try_apply_pyarrow_hotfix()After this:
Once you switch the implementation to the above, and fix the two other very small nits I had, then I will merge this. |

Description of changes
Make pyarrow-hotfix optional and only needed for pyarrow<=14.1 where it is effective.