Skip to content
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [Version 1.3.1](https://github.com/dataiku/dss-plugin-sharepoint-online/releases/tag/v1.3.1) - Security release - 2026-04-01

- Adding optional whitelist mechanism on certificate preset

## [Version 1.3.0](https://github.com/dataiku/dss-plugin-sharepoint-online/releases/tag/v1.3.0) - Security release - 2026-02-26

- Increase the version of the package cryptography to 46.0.5 and msal to 1.34.0 in response to CVE-2026-26007
Expand Down
1 change: 1 addition & 0 deletions custom-recipes/sharepoint-online-append-list/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def convert_date_format(json_row):
metadata_to_retrieve.append("Title")
display_metadata = len(metadata_to_retrieve) > 0
client = SharePointClient(config)
client.assert_can_write_list(sharepoint_list_title)

sharepoint_writer = client.get_writer({"columns": input_schema}, None, None, max_workers, batch_size, write_mode)
with output_dataset.get_writer() as writer:
Expand Down
68 changes: 68 additions & 0 deletions parameter-sets/app-certificate/parameter-set.json
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,74 @@
"type": "PASSWORD",
"description": "If required by private key",
"mandatory": false
},
{
"name": "activate_whitelist",
"label": "Whitelists",
"type": "BOOLEAN",
"description": "Not advised: access rights should be handled at Azure app level"
},
{
"name": "libraries_whitelist",
"label": "Whitelisted libraries",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a naming POV this a bit confusing. Also "whitelist" is itself no longer considered the best term for this on "Inclusion and Diversity" grounds - that is the standard practice today and considered a risk for Enterprise & B2B software.

Moving away from it also means we can also avoid using the word "list" where it could be a bit confusing.

For activation param I suggest:
"name": "activate_allowlists",
"label": "Allowlists",

So, re
"name": "libraries_whitelist",
This param seems to be about folders, not libraries. The example here:
"description": "/sites/YourSite/Shared Documents/your folder path"
Is pointing to a folder, not a document library. "Shared Documents" is a document library, anything under it is not. Perhaps it covers both, but calling it Library is probably confusing, folder is better even if referring to both I think.

So here I suggest:
"name": "allowed_folders",
"label": "Allowed folders",

Re
"name": "whitelist_name",
"label": "Library name",

As mentioned above this seems to be a folder path not a name, so I suggest:
"name": "allowed_folder_path",
"label": "Folder path",
"description": "/sites/YourSite/Shared Documents/your folder path"

For the rights subparam:
"name": "allowed_folder_rights",
"label": "Access rights",

For the lists param, I'd suggest for naming the param as a whole:
"name": "allowed_lists",
"label": "Allowed lists"

For name subparam:
There's a mention of libraries here, but I think it is an accident?
I suggest
"name": "allowed_list_name",
"label": "List name",
description": "List name from the list's SharePoint URL"
(this is sure the name -so not display name or ID? As a GUID ID also exists for the list, we should be clear we are not talking about that)

For rights subparam:
"name": "allowed_list_rights",
"label": "Access rights",

"type": "OBJECT_LIST",
"description": "",
"visibilityCondition": "model.activate_whitelist === true",
"subParams": [
{
"name": "whitelist_name",
"label": "Library name",
"type": "STRING",
"description": "/sites/YourSite/Shared Documents/your folder path"
},
{
"name": "whitelist_rights",
"label": "Access rights",
"type": "MULTISELECT",
"description": "",
"selectChoices": [
{
"value": "read",
"label": "Read"
},
{
"value": "write",
"label": "Write"
}
]
}
]
},
{
"name": "lists_whitelist",
"label": "Whitelisted lists",
"type": "OBJECT_LIST",
"description": "",
"visibilityCondition": "model.activate_whitelist === true",
"subParams": [
{
"name": "whitelist_name",
"label": "Library name",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo? Should it say list name?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(will be taken care of automatically if naming changed)

"type": "STRING",
"description": "List ID from the list's SharePoint URL"
},
{
"name": "whitelist_rights",
"label": "Access rights",
"type": "MULTISELECT",
"description": "",
"selectChoices": [
{
"value": "read",
"label": "Read"
},
{
"value": "write",
"label": "Write"
}
]
}
]
}
]
}
2 changes: 1 addition & 1 deletion plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"id": "sharepoint-online",
"version": "1.3.0",
"version": "1.3.1",
"meta": {
"label": "SharePoint Online",
"description": "Read and write data from/to your SharePoint Online account",
Expand Down
2 changes: 2 additions & 0 deletions python-connectors/sharepoint-online_lists/connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def is_column_expendable(column):

def generate_rows(self, dataset_schema=None, dataset_partitioning=None,
partition_id=None, records_limit=-1):
self.client.assert_can_read_list(self.sharepoint_list_title)
if self.client.column_ids == {}:
self.client.get_read_schema()

Expand Down Expand Up @@ -114,6 +115,7 @@ def format_row(self, row):
def get_writer(self, dataset_schema=None, dataset_partitioning=None,
partition_id=None, write_mode="OVERWRITE"):
assert_list_title(self.sharepoint_list_title)
self.client.assert_can_write_list(self.sharepoint_list_title)
if write_mode != "APPEND":
write_mode = SharePointConstants.WRITE_MODE_CREATE
return self.client.get_writer(dataset_schema, dataset_partitioning, partition_id, self.max_workers, self.batch_size, write_mode)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def close(self):
def stat(self, path):
assert_valid_sharepoint_path(path)
full_path = get_lnt_path(self.get_full_path(path))
self.client.assert_can_read_path(full_path)
logger.info('stat:path="{}", full_path="{}"'.format(path, full_path))
files = self.client.get_files(full_path)
folders = self.client.get_folders(full_path)
Expand Down Expand Up @@ -94,6 +95,7 @@ def browse(self, path):
path = get_rel_path(path)
full_path = get_lnt_path(self.get_full_path(path))
logger.info('browse:path="{}", full_path="{}"'.format(path, full_path))
self.client.assert_can_read_path(full_path)

folders = self.client.get_folders(full_path)
files = self.client.get_files(full_path)
Expand Down Expand Up @@ -157,6 +159,7 @@ def enumerate(self, path, first_non_empty):
assert_valid_sharepoint_path(path)
full_path = get_lnt_path(self.get_full_path(path))
logger.info('enumerate:path="{}",fullpath="{}", first_non_empty="{}"'.format(path, full_path, first_non_empty))
self.client.assert_can_read_path(full_path)
path_to_item, item_name = os.path.split(full_path)
is_file = self.client.is_file(full_path)
if is_file:
Expand Down Expand Up @@ -192,6 +195,7 @@ def delete_recursive(self, path):
assert_valid_sharepoint_path(path)
full_path = self.get_full_path(path)
logger.info('delete_recursive:path={},fullpath={}'.format(path, full_path))
self.client.assert_can_write_path(full_path)
assert_path_is_not_root(full_path)
path_to_item, item_name = os.path.split(full_path.rstrip("/"))
files = self.client.get_files(path_to_item)
Expand Down Expand Up @@ -220,6 +224,8 @@ def move(self, from_path, to_path):
full_from_path = self.get_full_path(from_path)
full_to_path = self.get_full_path(to_path)
logger.info('move:from={},to={}'.format(full_from_path, full_to_path))
self.client.assert_can_read_path(full_from_path)
self.client.assert_can_write_path(full_to_path)

self.client.move_file(full_from_path, full_to_path)
# SP Online now returns {'odata.null': True}
Expand All @@ -229,6 +235,7 @@ def read(self, path, stream, limit):
assert_valid_sharepoint_path(path)
full_path = self.get_full_path(path)
logger.info('read:full_path={}'.format(full_path))
self.client.assert_can_read_path(full_path)
response = self.client.get_file_content(full_path)
bio = BytesIO(response.content)
shutil.copyfileobj(bio, stream)
Expand All @@ -237,6 +244,7 @@ def write(self, path, stream):
assert_valid_sharepoint_path(path)
full_path = self.get_full_path(path)
logger.info('write:path="{}", full_path="{}"'.format(path, full_path))
self.client.assert_can_write_path(full_path)
bio = BytesIO()
shutil.copyfileobj(stream, bio)
bio.seek(0)
Expand Down
2 changes: 1 addition & 1 deletion python-lib/dss_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class DSSConstants(object):
"sharepoint_oauth": "The access token is missing"
}
PATH = 'path'
PLUGIN_VERSION = "1.3.0"
PLUGIN_VERSION = "1.3.1"
SECRET_PARAMETERS_KEYS = ["Authorization", "sharepoint_username", "sharepoint_password", "client_secret", "client_certificate", "passphrase"]
SITE_APP_DETAILS = {
"sharepoint_tenant": "The tenant name is missing",
Expand Down
23 changes: 23 additions & 0 deletions python-lib/sharepoint_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
is_empty_path, get_lnt_path,
format_private_key, format_certificate_thumbprint, url_encode
)
from sharepoint_whitelist import WhiteList
from safe_logger import SafeLogger


Expand Down Expand Up @@ -50,6 +51,7 @@ def __init__(self, config, root_name_overwrite_legacy_mode=False):
self.column_entity_property_name = {}
self.columns_to_format = []
self.column_sharepoint_type = {}
self.whitelist = WhiteList()

if config.get('auth_type') == DSSConstants.AUTH_OAUTH:
logger.info("SharePointClient:sharepoint_oauth")
Expand Down Expand Up @@ -117,6 +119,7 @@ def __init__(self, config, root_name_overwrite_legacy_mode=False):
elif config.get('auth_type') == DSSConstants.AUTH_APP_CERTIFICATE:
logger.info("SharePointClient:app-certificate")
login_details = config.get('app_certificate')
self.whitelist = WhiteList(login_details)
self.assert_login_details(DSSConstants.APP_CERTIFICATE_DETAILS, login_details)
self.setup_sharepoint_online_url(login_details)
self.setup_login_details(login_details)
Expand Down Expand Up @@ -1095,6 +1098,26 @@ def is_column_displayable(self, column, display_metadata=False, metadata_to_retr
return True
return (not column[SharePointConstants.HIDDEN_COLUMN])

def assert_can_read_path(self, path):
full_path = self.get_site_path(path)
full_path = "/" + full_path.strip("/")
logger.info("Testing read access to path '{}'".format(full_path))
self.whitelist.assert_can_read_path(full_path)

def assert_can_write_path(self, path):
full_path = self.get_site_path(path)
full_path = "/" + full_path.strip("/")
logger.info("Testing write access to path '{}'".format(full_path))
self.whitelist.assert_can_write_path(full_path)

def assert_can_read_list(self, list_name):
logger.info("Testing read access to list '{}'".format(list_name))
self.whitelist.assert_can_read_list(list_name)

def assert_can_write_list(self, list_name):
logger.info("Testing write access to list '{}'".format(list_name))
self.whitelist.assert_can_write_list(list_name)


class SharePointSession():

Expand Down
66 changes: 66 additions & 0 deletions python-lib/sharepoint_whitelist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
from safe_logger import SafeLogger

logger = SafeLogger("sharepoint-online plugin")


class WhiteList():
def __init__(self, config=None):
self.config = config or {}
self.activate_white_list = self.config.get("activate_whitelist", False)
self.libraries_whitelist = {}
self.lists_whitelist = {}
libraries_whitelist = self.config.get("libraries_whitelist", [])
if self.activate_white_list:
for library in libraries_whitelist:
library_path = library.get("whitelist_name", "").strip("/").lower()
library_rights = library.get("whitelist_rights", [])
self.libraries_whitelist[library_path] = library_rights
lists_whitelist = self.config.get("lists_whitelist", [])
for list_item in lists_whitelist:
list_name = list_item.get("whitelist_name", "").lower()
list_rights = list_item.get("whitelist_rights", [])
self.lists_whitelist[list_name] = list_rights
logger.info("Whitelisting with libraries:{} and lists:{}".format(self.libraries_whitelist, self.lists_whitelist))

def assert_can_read_path(self, path):
if not self.can_read_path(path):
raise Exception("This preset does not have read access to '{}'".format(path))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I wonder, should we make it clear that this is the preset config rather than the rights it has on sharepoint? I guess we shouldn't even though it might be easier from a troubleshooting POV?


def assert_can_write_path(self, path):
if not self.can_write_path(path):
raise Exception("This preset does not have write access to '{}'".format(path))

def assert_can_read_list(self, list_name):
if not self.can_read_list(list_name):
raise Exception("This preset does not have read access to the list '{}'".format(list_name))

def assert_can_write_list(self, list_name):
if not self.can_write_list(list_name):
raise Exception("This preset does not have write access to the list '{}'".format(list_name))

def can_read_path(self, path):
return self.can_do("read", self.libraries_whitelist, path.strip("/").lower().split("/"))

def can_write_path(self, path):
return self.can_do("write", self.libraries_whitelist, path.strip("/").lower().split("/"))

def can_read_list(self, list_name):
return self.can_do("read", self.lists_whitelist, list_name.lower())

def can_write_list(self, list_name):
return self.can_do("write", self.lists_whitelist, list_name.lower())

def can_do(self, required_right, rights, path_to_test):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⛏️ path_to_test isn't always a path, sometimes it is a list? Might be worth making that clearer somehow?
Maybe just make it item_to_test - and say folder paths will be list type, sharepoint lists will be strings
I mean it is obvious once you know but it was confusing at first glance...

if not self.activate_white_list:
return True
if isinstance(path_to_test, list):
for path_size in range(len(path_to_test) + 1, 0, -1):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✍️ Maybe put a comment just to say - match with path or subpath (anything underneath considered allowed). I was feeling dense and didn't get it straight away

also ⛏️
We don't need the +1 here:
len(path_to_test) + 1,
It's confusing

tokens_in_path = path_to_test[0:path_size]
path_chunk_to_test = "/".join(tokens_in_path)
right_for_path = rights.get(path_chunk_to_test, [])
if required_right in right_for_path:
return True
return False
else:
right_for_path = rights.get(path_to_test, [])
return required_right in right_for_path
4 changes: 4 additions & 0 deletions tests/python/integration/test_scenario.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,7 @@ def test_run_sharepoint_online_256_plus_chars_strings(user_dss_clients):

def test_run_sharepoint_online_app_username_password_auth(user_dss_clients):
dss_scenario.run(user_dss_clients, project_key=TEST_PROJECT_KEY, scenario_id="APPUSERNAMEPASSWORDAUTH")


def test_run_sharepoint_online_whitelisting(user_dss_clients):
dss_scenario.run(user_dss_clients, project_key=TEST_PROJECT_KEY, scenario_id="WHITELISTING")
Loading