Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
f07cf60
332_DOI Ingest Logic Refactor
Oct 13, 2022
d45c656
refactor and add comments
CarsonDavis Oct 17, 2022
705c3fe
add template logic for draft filtering logic
CarsonDavis Oct 19, 2022
0209003
partially fleshed out update process
CarsonDavis Oct 19, 2022
33a79cd
Added conditions to update drafts
Oct 21, 2022
c4b54f9
autoformating
alekhya-pottimuthi Oct 21, 2022
e68c7ac
formated
alekhya-pottimuthi Oct 21, 2022
3e6cd90
Addressed flake8 errors
alekhya-pottimuthi Oct 21, 2022
d6f4fc1
Incorporated review comments
alekhya-pottimuthi Oct 25, 2022
b1bec81
Addressed Flake8 errors
alekhya-pottimuthi Oct 25, 2022
244068d
add cmr tests structure
CarsonDavis Oct 31, 2022
e4fea4b
add sample test data creation
CarsonDavis Oct 31, 2022
acda25c
Added test Cases
alekhya-pottimuthi Nov 7, 2022
ad36fa8
addressed flake8 errors
alekhya-pottimuthi Nov 7, 2022
a70b302
addressed flake8 errors
alekhya-pottimuthi Nov 7, 2022
e843872
testcases
alekhya-pottimuthi Nov 18, 2022
0e94406
create a test setup func and update the cmr rec generator
CarsonDavis Nov 18, 2022
d999f2b
add bulk db update, add test_no_drafts
CarsonDavis Nov 18, 2022
5908f5d
added initial tests inside test_unpublished_create
CarsonDavis Nov 18, 2022
36205ed
add notes to the cmr test file
CarsonDavis Nov 18, 2022
07a96c2
adding test_unpublished_update and test_published_create test cases
alekhya-pottimuthi Nov 30, 2022
764a5bc
added test_published_create test case
alekhya-pottimuthi Nov 30, 2022
aa1f1e5
create a dedicated function to generate cmr_metadata
CarsonDavis Dec 1, 2022
091450e
add pre-run cmr metadata response for ACES
CarsonDavis Dec 1, 2022
dc25f8b
create unpublished update draft
CarsonDavis Dec 1, 2022
78f34eb
work in progress test_unplished_update
alekhya-pottimuthi Dec 2, 2022
72c60a7
Updated test_unpublished_update
alekhya-pottimuthi Dec 12, 2022
4a1ffba
added docstrings
alekhya-pottimuthi Dec 16, 2022
5af0af6
Added the doc strings and comments
alekhya-pottimuthi Jan 6, 2023
dec2d37
add test for cmr_test_data and improve docstrings
CarsonDavis Jan 10, 2023
0a2d750
add cmr_test_data and script to generate it
CarsonDavis Jan 10, 2023
7bc90e6
minor fixes in tests file
CarsonDavis Jan 11, 2023
6582728
updating docstrings
svbagwell Jan 11, 2023
abffba3
Merge branch 'enhc-doi_logic' of https://github.com/NASA-IMPACT/admg-…
svbagwell Jan 11, 2023
bb47fd7
updated docstrings for non-test case functions
svbagwell Jan 12, 2023
fd2a086
update doctring for test_no_drafts
svbagwell Jan 12, 2023
ae2edee
added context in docstrings for test
svbagwell Jan 12, 2023
a980fb3
improve fields_to_ignore and add test for doi_field_coverage
CarsonDavis Jan 13, 2023
d876d58
add explanatory comments about field usage
CarsonDavis Jan 13, 2023
4966e26
Merge branch 'enhc-doi_logic' of github.com:NASA-IMPACT/admg_webapp i…
CarsonDavis Jan 13, 2023
ff7de09
Merge branch 'dev' into enhc-doi_logic
CarsonDavis Feb 23, 2023
f4ac3f5
fix doi draft merging behavior
CarsonDavis Feb 23, 2023
2d9410b
Merge branch 'dev' into enhc-doi_logic
CarsonDavis Mar 6, 2023
c132cd8
add praveen's code
CarsonDavis Mar 7, 2023
500d132
Fixed Flake error
Mar 7, 2023
69aa644
Fix code style issues with Black
lint-action Mar 7, 2023
2f656dd
merge dev in to doi
CarsonDavis Apr 11, 2023
f7decca
Fix code style issues with Black
lint-action Apr 11, 2023
98bed3b
change DOI creation logic to match new workflow
CarsonDavis Apr 11, 2023
21dc45d
Merge branch 'enhc-doi_logic' of github.com:NASA-IMPACT/admg_webapp i…
CarsonDavis Apr 11, 2023
0916152
Fix code style issues with Black
lint-action Apr 11, 2023
925cef6
fix errors in code
CarsonDavis Apr 11, 2023
62b5940
Merge branch 'enhc-doi_logic' of github.com:NASA-IMPACT/admg_webapp i…
CarsonDavis Apr 11, 2023
210de39
Merge branch 'enhc-doi_logic' into feature-update_doi_metadata
CarsonDavis Apr 13, 2023
5d6b4e6
Created an updated readme file for documetation
May 11, 2023
36ee16f
Edited readme file, added a documentation and diagrams folders
May 12, 2023
d2b418b
Added content to documentation, edited API doc
Jun 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 3 additions & 127 deletions api_app/api_documentation.md
Original file line number Diff line number Diff line change
@@ -1,130 +1,6 @@
# Overview
## Detailed Fields

CASEI is built on top of a PostgreSQL database with multiple tables that each contain fields and foreign keys. Each endpoint in the API will point to a corresponding table. All the available models and fields can be seen at the bottom of this page.
Follow this link [link](https://nasa-impact.github.io/admg-backend/) to access CASEI API Documentation and how to run different queries.

Requesting a bare table endpoint, such as `https://admg.nasa-impact.net/api/campaign` will return a list of all the metadata items in the table, in this case, for every campaign in the inventory. Specific objects can be retrieved by adding a known UUID after the table name, and if you don't know the UUID, string match searching is available for most fields. Further details on all search types as well as example queries can be found below.
Below is a detailed list of fields users can query with the API.

# Queries
## Full Table Query
As mentioned above, the most basic query returns the full data from a table. For example `https://admg.nasa-impact.net/api/campaign` will return a list of all published campaign items in the database.

Below is a contrived example of the results from a `campaign` query, with ... indicating the continuation of additional metadata and additional campaigns. Here you can see abbreviated metadata for two campaigns, OLYMPEX and ACES.

```
{
"success": True,
"message": ",
"data": [
{
"uuid": "2552174b-213c-4bfc-b36a-632fb16c5ec2",
"short_name": "OLYMPEX",
"long_name": "Olympic Mountains Experiment",
"start_date": "2015-11-01",
"partner_orgs": [
"d6ffd2fa-1230-4971-a0a4-832b27b3a6c1"
],
...
},
{
"uuid": "30ba471c-0844-447a-91fd-b63a2f42b715",
"short_name": "ACES",
"long_name": "Altus Cumulus Electrification Study",
"start_date": "2002-08-02"
"partner_orgs": [],
...
},
...
]
}
```
## UUIDs and Related Objects

In the example results from the Campaign table, we saw several UUIDs listed: one `uuid` that identifies each campaign, and on OLYMPEX a UUID in the `partner_orgs` list.

This is because every item has its own identifying UUID, and related objects linked from other tables are always specified using a UUID. For example, each Campaign might have been conducted in conjunction with a Partner Org. However, Partner Org is not a simple string value. It is an independent object with its own table and additional metadata. So the Campaign API response will list Partner Org as a UUID to the relevant object.

If you would like to see the details on that Partner Org, you must query the `partner_org` endpoint with the given UUID. Using the metadata shown above, we would make the following query:
```
https://admg.nasa-impact.net/api/partner_org/d6ffd2fa-1230-4971-a0a4-832b27b3a6c1
```
This would return the metadata for the related Partner Org, in this case, ECCC.
```
{
"success": true,
"message": "",
"data": {
"uuid": "d6ffd2fa-1230-4971-a0a4-832b27b3a6c1",
"aliases": [
"14aa21a2-de5a-4987-af11-d2c7c7a0c20f"
],
"campaigns": [
"1d26f72f-d9d5-45cc-b5ac-1c91ed05b76f",
"118d9d82-5e90-466b-8b82-530276b76ecc",
"2552174b-213c-4bfc-b36a-632fb16c5ec2"
],
"short_name": "ECCC",
"long_name": "Environment and Climate Change Canada",
"website": "https://www.canada.ca/en/environment-climate-change.html"
}
}
```
As you can see, the Partner Org has its own useful metadata such as a long name, aliases, a website, and even a handy list of all the campaigns it appears on.

Observant readers will have noticed that `campaign` had a plural field called `partner_orgs` but the table name was the singular `partner_org`. Table names are *always* singular, but related fields can be a singular or plural version of the table name depending on whether only one or many items are linked.

But what if you don't know the UUID of the item you want to query?

## String Match Queries
In practice, it is unlikely that you will know the UUID of the Campaign or Partner Org you are interested in. Instead you will probably know the short name, long name, or maybe a keyword from the description.

Because all datatypes are serialized into strings, most fields can be searched using basic string matching. A native datetype becomes the searchable string `2022-01-15`.

By default, searches are not case sensitive and use a contain logic. For example, the field value of `the yellow clouds` will match to the search string `cloud`.

The following parameters are used when constructing a query.

### search_term
- Contains the actual search string, for example: `aces`, `cloud`, `2022-01-05`

### search_type
- Optional, default=plain
- plain: terms are treated as separate keywords
- phrase: terms are treated as a single phrase
- raw: formatted search query with terms and operators
- websearch: formatted search query, similar to the one used by web search engines.
- Refer to the [PostgreSQL docs](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) for more details on differences and syntax

### search_fields
- Optional, defaults to predefined fields in each model
- Specifies the exact field to be searched: `short_name`, `description`, `start_date`



## Example Queries
We've seen a few examples already above, but in this section we will demonstrate all the common use cases.

### Query an entire table
This query will return metadata for all the campaigns.
```
https://admg.nasa-impact.net/api/campaign
```
### Query by UUID
This query will return metadata for the exact campaign specified by UUID.
```
https://admg.nasa-impact.net/api/campaign/30ba471c-0844-447a-91fd-b63a2f42b715
```
### Query by default search fields
Each table has a list of default search fields, usually `short_name`, `long_name`, `description`, and any other text fields. This query will search all of those fields for the listed term.
```
https://admg.nasa-impact.net/api/campaign/search_term=ACES
```
### Query by specific field
If you know the exact field and want to search it specifically, use the search_fields parameter. Here we are looking for the term `ACES` in the `short_name` field of any campaign.
```
https://admg.nasa-impact.net/api/campaign/search_term=ACES&search_fields=short_name
```
### Query by specific field list
You can also search by a specific list of fields, just join them with a comma. In this example we are searching for the term `ice` anywhere in the `short_name` or `description` field of any campaign.
```
https://admg.nasa-impact.net/api/campaign/search_term=ice&search_fields=short_name,description
```
183 changes: 117 additions & 66 deletions cmr/doi_matching.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
from django.contrib.contenttypes.models import ContentType
from django.core import serializers

from api_app.models import Change
from admg_webapp.users.models import User
from api_app.models import Change, ApprovalLog
from cmr.cmr import query_and_process_cmr
from cmr.utils import clean_table_name, purify_list
from data_models.models import DOI
Expand All @@ -19,6 +20,24 @@ class DoiMatcher:
def __init__(self):
self.uuid_to_aliases = {}
self.table_to_valid_uuids = {}
self.core_cmr_fields = [
'cmr_short_name',
'cmr_entry_title',
'cmr_projects',
'cmr_dates',
'cmr_plats_and_insts',
'cmr_science_keywords',
'cmr_abstract',
'cmr_data_formats',
'doi', # TODO: do we want to not autopublish if this field is different?
]
self.previously_curated_fields = [
'campaigns',
'instruments',
'platforms',
'collection_periods',
'long_name',
]

def universal_get(self, table_name, uuid):
"""Queries the database for a uuid within a table name, but searches
Expand Down Expand Up @@ -281,88 +300,120 @@ def supplement_metadata(self, metadata_list, development=False):

return supplemented_metadata_list

def add_to_db(self, doi):
"""After cmr has been queried and each dataproduct has received recommended UUID
matches, each of this is added to the database. Because DOIs might already exist
as drafts or db objects, this function will create an update for existing DOIs or
a second draft in the case of duplicates. When updating, freshly queried metadata
is prioritized, but previously existing UUID links are preserved.
def is_core_metadata_changed(self, recent_draft, recommendation):
"""Takes a doi_recommendation that includes metadata from CMR and a doi_draft from
the admg database and compares specific fields to find a mismatch.

Args:
doi (dict): DOI metadata dictionary containing original CMR metadata and recommended
UUID links.

Raises:
ValueError: If objects have been added to the database outside of the expected
approval workflow, it is possible to have a nonsensical object creation
history and this error might be raised. This should only happen in local
and staging environments and should never occur in production.
recent_draft (dict): Change object of type model=doi
recommendation (dict): metadata from cmr

Returns:
str: String indicating action taken by the function
bool: True if there was a mismatch
"""

# search db for existing items with concept_id
existing_doi_uuids = self.valid_object_list_generator(
"doi", query_parameter="concept_id", query_value=doi["concept_id"]
return any(
[
recommendation.get(field) != recent_draft.update.get(field)
for field in self.core_cmr_fields
]
)
# this check can fail for some complicated reasons that will be addressed in a future PR
# if len(existing_doi_uuids)>1:
# raise ValueError('There has been an internal database error')

# if none exist add normally as a draft
if not existing_doi_uuids:
doi_obj = Change(
content_type=ContentType.objects.get(model="doi"),
model_instance_uuid=None,
update=json.loads(json.dumps(doi)),
status=Change.Statuses.CREATED,
action=Change.Actions.CREATE,
)
doi_obj.save()

return "Draft created for DOI"

uuid = existing_doi_uuids[0]
existing_doi = self.universal_get("doi", uuid)
# if item exists as a draft, directly update using db functions with same methodology as above
if existing_doi.get("change_object"):
for field in ["campaigns", "instruments", "platforms", "collection_periods"]:
doi[field].extend(existing_doi.get(field))
doi[field] = list(set(doi[field]))

draft = Change.objects.get(uuid=uuid)
draft.update = doi
draft.save()

return f"DOI already exists as a draft. Existing draft updated. {uuid}"

# if db item exists, replace cmr metadata fields and append suggestion fields as an update
existing_doi = DOI.objects.all().filter(uuid=uuid).first()
existing_campaigns = [str(c.uuid) for c in existing_doi.campaigns.all()]
existing_instruments = [str(c.uuid) for c in existing_doi.instruments.all()]
existing_platforms = [str(c.uuid) for c in existing_doi.platforms.all()]
existing_collection_periods = [str(c.uuid) for c in existing_doi.collection_periods.all()]
def create_merged_draft(self, recent_draft, doi_recommendation):
"""Takes an existing doi draft and a newly generated doi_recommendation and
returns a merged object that represents the most up-to-date data, retaining
the originally curated fields but updating any core CMR values.
"""
doi_recommendation = json.loads(json.dumps(doi_recommendation))
for field in self.previously_curated_fields:
doi_recommendation[field] = recent_draft.update[field]

doi["campaigns"].extend(existing_campaigns)
doi["instruments"].extend(existing_instruments)
doi["platforms"].extend(existing_platforms)
doi["collection_periods"].extend(existing_collection_periods)
return doi_recommendation

for field in ["campaigns", "instruments", "platforms", "collection_periods"]:
doi[field] = list(set(doi[field]))
def make_create_draft(self, doi_recommendation):
doi_obj = Change(
content_type=ContentType.objects.get(model="doi"),
model_instance_uuid=None,
update=json.loads(json.dumps(doi_recommendation)),
status=Change.Statuses.CREATED,
action=Change.Actions.CREATE,
)
doi_obj.save()
return doi_obj

def make_update_draft(self, merged_draft, linked_object):
doi_obj = Change(
content_type=ContentType.objects.get(model="doi"),
model_instance_uuid=str(uuid),
update=json.loads(json.dumps(doi)),
model_instance_uuid=linked_object,
update=json.loads(json.dumps(merged_draft)),
status=Change.Statuses.CREATED,
action=Change.Actions.UPDATE,
)

doi_obj.save()
return doi_obj

return f"DOI already exists in database. Update draft created. {uuid}"
def get_published_uuid(self, recent_draft):
if recent_draft.action == Change.Actions.UPDATE:
return recent_draft.model_instance_uuid
else:
# this must be a published create draft, who's uuid will match the published uuid
return recent_draft.uuid

def add_to_db(self, doi_recommendation):
"""After cmr has been queried and each dataproduct has received recommended UUID
matches, each of this is added to the database. Because DOIs might already exist
as drafts or db objects, this function will create an update for existing DOIs or
a second draft in the case of duplicates. When updating, freshly queried metadata
is prioritized, but previously existing UUID links are preserved.

Args:
doi_recommendation (dict): DOI metadata dictionary containing original CMR metadata and recommended
UUID links.

Returns:
str: String indicating action taken by the function
"""

# search db for the most recently worked on draft that matches our concept_id
recent_draft = (
Change.objects.filter(
content_type__model='doi',
action__in=[Change.Actions.CREATE, Change.Actions.UPDATE],
update__concept_id=doi_recommendation['concept_id'],
)
.order_by("-updated_at")
.first()
)

if not recent_draft:
# no DOI draft exists yet for this concept_id, so we create one
self.make_create_draft(doi_recommendation)

# TODO: handle delete drafts?

elif self.is_core_metadata_changed(recent_draft, doi_recommendation):
# a doi draft of some kind exists, and it's different from the new data
generic_admin_user = User.objects.get(username='nimda')
merged = self.create_merged_draft(recent_draft, doi_recommendation)
if recent_draft.status == Change.Statuses.PUBLISHED:
# recommendations have been previously approved and we are just updating
# to the latest CMR metadata
published_uuid = self.get_published_uuid(recent_draft)
doi_obj = self.make_update_draft(merged, published_uuid)
doi_obj.publish(generic_admin_user, notes='CMR metadata updated')
else:
# an update or create draft is in progress and recommendations are
# not yet approved, so we need to fix the in progress object
recent_draft.update = merged
recent_draft.status = Change.Statuses.CREATED
recent_draft.save()
approval_log = ApprovalLog.objects.create(
change=recent_draft,
user=generic_admin_user,
action=ApprovalLog.Actions.REJECT,
notes="New CMR metadata added, needs to be re-reviewed",
)
approval_log.save()

def generate_recommendations(self, table_name, uuid, development=False):
"""This is the overarching parent function which takes a table_name and a uuid and
Expand Down Expand Up @@ -401,7 +452,7 @@ def generate_recommendations(self, table_name, uuid, development=False):
pickle.dump(metadata_list, open(f"metadata_{uuid}", "wb"))

supplemented_metadata_list = self.supplement_metadata(metadata_list, development)

json.dump(supplemented_metadata_list, open('cmr_data.json', 'w'))
for doi in supplemented_metadata_list:
logger.debug(self.add_to_db(doi))

Expand Down
Empty file added cmr/tests/__init__.py
Empty file.
25 changes: 25 additions & 0 deletions cmr/tests/generate_cmr_test_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import json
from cmr.cmr import query_and_process_cmr


def generate_cmr_response():
"""Many of our processes rely on first getting information from the CMR API.
This function is only run once and it saves a sample CMR file to the repository.
All test functions that rely on CMR will use this file, and there is a separate test
that evaluates whether CMR still gives the same response.

To run this function and generate the file, use a manage.py shell
"""

return query_and_process_cmr('campaign', ['ACES'])


def save_cmr_response(cmr_metadata):
"""saves the cmr response generated by generate_cmr_response"""

json.dump(cmr_metadata, open('cmr/tests/cmr_response_aces.json', 'w'))


if __name__ == '__main__':
cmr_response = generate_cmr_response()
save_cmr_response(cmr_response)
Loading