NASA-IMPACT · EJwalker13 · Oct 13, 2022 · Oct 17, 2022 · Oct 19, 2022 · Oct 19, 2022
diff --git a/api_app/api_documentation.md b/api_app/api_documentation.md
@@ -1,130 +1,6 @@
-# Overview
+## Detailed Fields 
 
-CASEI is built on top of a PostgreSQL database with multiple tables that each contain fields and foreign keys. Each endpoint in the API will point to a corresponding table. All the available models and fields can be seen at the bottom of this page.
+Follow this link [link](https://nasa-impact.github.io/admg-backend/) to access CASEI API Documentation and how to run different queries.
 
-Requesting a bare table endpoint, such as `https://admg.nasa-impact.net/api/campaign` will return a list of all the metadata items in the table, in this case, for every campaign in the inventory. Specific objects can be retrieved by adding a known UUID after the table name, and if you don't know the UUID, string match searching is available for most fields. Further details on all search types as well as example queries can be found below.
+Below is a detailed list of fields users can query with the API.
 
-# Queries
-## Full Table Query
-As mentioned above, the most basic query returns the full data from a table. For example `https://admg.nasa-impact.net/api/campaign` will return a list of all published campaign items in the database.
-
-Below is a contrived example of the results from a `campaign` query, with ... indicating the continuation of additional metadata and additional campaigns. Here you can see abbreviated metadata for two campaigns, OLYMPEX and ACES.
-
-```
-{ 
-    "success": True, 
-    "message": ", 
-    "data": [
-        { 
-            "uuid": "2552174b-213c-4bfc-b36a-632fb16c5ec2",
-            "short_name": "OLYMPEX",
-            "long_name": "Olympic Mountains Experiment",
-            "start_date": "2015-11-01",
-            "partner_orgs": [
-                "d6ffd2fa-1230-4971-a0a4-832b27b3a6c1"
-            ],
-            ...
-        }, 
-        { 
-            "uuid": "30ba471c-0844-447a-91fd-b63a2f42b715",
-            "short_name": "ACES",
-            "long_name": "Altus Cumulus Electrification Study",
-            "start_date": "2002-08-02"
-            "partner_orgs": [],
-            ...
-        }, 
-    ...
-    ]
-}
-```
-## UUIDs and Related Objects
-
-In the example results from the Campaign table, we saw several UUIDs listed: one `uuid` that identifies each campaign, and on OLYMPEX a UUID in the `partner_orgs` list.
-
-This is because every item has its own identifying UUID, and related objects linked from other tables are always specified using a UUID. For example, each Campaign might have been conducted in conjunction with a Partner Org. However, Partner Org is not a simple string value. It is an independent object with its own table and additional metadata. So the Campaign API response will list Partner Org as a UUID to the relevant object.
-
-If you would like to see the details on that Partner Org, you must query the `partner_org` endpoint with the given UUID. Using the metadata shown above, we would make the following query:
-```
-https://admg.nasa-impact.net/api/partner_org/d6ffd2fa-1230-4971-a0a4-832b27b3a6c1
-```
-This would return the metadata for the related Partner Org, in this case, ECCC.
-```
-{
-  "success": true,
-  "message": "",
-  "data": {
-    "uuid": "d6ffd2fa-1230-4971-a0a4-832b27b3a6c1",
-    "aliases": [
-      "14aa21a2-de5a-4987-af11-d2c7c7a0c20f"
-    ],
-    "campaigns": [
-      "1d26f72f-d9d5-45cc-b5ac-1c91ed05b76f",
-      "118d9d82-5e90-466b-8b82-530276b76ecc",
-      "2552174b-213c-4bfc-b36a-632fb16c5ec2"
-    ],
-    "short_name": "ECCC",
-    "long_name": "Environment and Climate Change Canada",
-    "website": "https://www.canada.ca/en/environment-climate-change.html"
-  }
-}
-```
-As you can see, the Partner Org has its own useful metadata such as a long name, aliases, a website, and even a handy list of all the campaigns it appears on.
-
-Observant readers will have noticed that `campaign` had a plural field called `partner_orgs` but the table name was the singular `partner_org`. Table names are *always* singular, but related fields can be a singular or plural version of the table name depending on whether only one or many items are linked.
-
-But what if you don't know the UUID of the item you want to query?
-
-## String Match Queries
-In practice, it is unlikely that you will know the UUID of the Campaign or Partner Org you are interested in. Instead you will probably know the short name, long name, or maybe a keyword from the description.
-
-Because all datatypes are serialized into strings, most fields can be searched using basic string matching. A native datetype becomes the searchable string `2022-01-15`. 
-
-By default, searches are not case sensitive and use a contain logic. For example, the field value of `the yellow clouds` will match to the search string `cloud`. 
-
-The following parameters are used when constructing a query.
-
-### search_term
-- Contains the actual search string, for example: `aces`, `cloud`, `2022-01-05`
-
-### search_type
-- Optional, default=plain
-- plain: terms are treated as separate keywords
-- phrase: terms are treated as a single phrase
-- raw: formatted search query with terms and operators
-- websearch: formatted search query, similar to the one used by web search engines. 
-- Refer to the [PostgreSQL docs](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) for more details on differences and syntax
-
-### search_fields
-- Optional, defaults to predefined fields in each model
-- Specifies the exact field to be searched: `short_name`, `description`, `start_date`
-
-
-
-## Example Queries
-We've seen a few examples already above, but in this section we will demonstrate all the common use cases.
-
-### Query an entire table
-This query will return metadata for all the campaigns.
-```
-https://admg.nasa-impact.net/api/campaign
-```
-### Query by UUID
-This query will return metadata for the exact campaign specified by UUID.
-```
-https://admg.nasa-impact.net/api/campaign/30ba471c-0844-447a-91fd-b63a2f42b715
-```
-### Query by default search fields
-Each table has a list of default search fields, usually `short_name`, `long_name`, `description`, and any other text fields. This query will search all of those fields for the listed term.
-```
-https://admg.nasa-impact.net/api/campaign/search_term=ACES
-```
-### Query by specific field
-If you know the exact field and want to search it specifically, use the search_fields parameter. Here we are looking for the term `ACES` in the `short_name` field of any campaign. 
-```
-https://admg.nasa-impact.net/api/campaign/search_term=ACES&search_fields=short_name
-```
-### Query by specific field list
-You can also search by a specific list of fields, just join them with a comma. In this example we are searching for the term `ice` anywhere in the `short_name` or `description` field of any campaign.
-```
-https://admg.nasa-impact.net/api/campaign/search_term=ice&search_fields=short_name,description
-```
diff --git a/cmr/doi_matching.py b/cmr/doi_matching.py
@@ -7,7 +7,8 @@
 from django.contrib.contenttypes.models import ContentType
 from django.core import serializers
 
-from api_app.models import Change
+from admg_webapp.users.models import User
+from api_app.models import Change, ApprovalLog
 from cmr.cmr import query_and_process_cmr
 from cmr.utils import clean_table_name, purify_list
 from data_models.models import DOI
@@ -19,6 +20,24 @@ class DoiMatcher:
     def __init__(self):
         self.uuid_to_aliases = {}
         self.table_to_valid_uuids = {}
+        self.core_cmr_fields = [
+            'cmr_short_name',
+            'cmr_entry_title',
+            'cmr_projects',
+            'cmr_dates',
+            'cmr_plats_and_insts',
+            'cmr_science_keywords',
+            'cmr_abstract',
+            'cmr_data_formats',
+            'doi',  # TODO: do we want to not autopublish if this field is different?
+        ]
+        self.previously_curated_fields = [
+            'campaigns',
+            'instruments',
+            'platforms',
+            'collection_periods',
+            'long_name',
+        ]
 
     def universal_get(self, table_name, uuid):
         """Queries the database for a uuid within a table name, but searches
@@ -281,88 +300,120 @@ def supplement_metadata(self, metadata_list, development=False):
 
         return supplemented_metadata_list
 
-    def add_to_db(self, doi):
-        """After cmr has been queried and each dataproduct has received recommended UUID
-        matches, each of this is added to the database. Because DOIs might already exist
-        as drafts or db objects, this function will create an update for existing DOIs or
-        a second draft in the case of duplicates. When updating, freshly queried metadata
-        is prioritized, but previously existing UUID links are preserved.
+    def is_core_metadata_changed(self, recent_draft, recommendation):
+        """Takes a doi_recommendation that includes metadata from CMR and a doi_draft from
+        the admg database and compares specific fields to find a mismatch.
 
         Args:
-            doi (dict): DOI metadata dictionary containing original CMR metadata and recommended
-                UUID links.
-
-        Raises:
-            ValueError: If objects have been added to the database outside of the expected
-                approval workflow, it is possible to have a nonsensical object creation
-                history and this error might be raised. This should only happen in local
-                and staging environments and should never occur in production.
+            recent_draft (dict): Change object of type model=doi
+            recommendation (dict): metadata from cmr
 
         Returns:
-            str: String indicating action taken by the function
+            bool: True if there was a mismatch
         """
 
-        # search db for existing items with concept_id
-        existing_doi_uuids = self.valid_object_list_generator(
-            "doi", query_parameter="concept_id", query_value=doi["concept_id"]
+        return any(
+            [
+                recommendation.get(field) != recent_draft.update.get(field)
+                for field in self.core_cmr_fields
+            ]
         )
-        # this check can fail for some complicated reasons that will be addressed in a future PR
-        # if len(existing_doi_uuids)>1:
-        #     raise ValueError('There has been an internal database error')
-
-        # if none exist add normally as a draft
-        if not existing_doi_uuids:
-            doi_obj = Change(
-                content_type=ContentType.objects.get(model="doi"),
-                model_instance_uuid=None,
-                update=json.loads(json.dumps(doi)),
-                status=Change.Statuses.CREATED,
-                action=Change.Actions.CREATE,
-            )
-            doi_obj.save()
-
-            return "Draft created for DOI"
 
-        uuid = existing_doi_uuids[0]
-        existing_doi = self.universal_get("doi", uuid)
-        # if item exists as a draft, directly update using db functions with same methodology as above
-        if existing_doi.get("change_object"):
-            for field in ["campaigns", "instruments", "platforms", "collection_periods"]:
-                doi[field].extend(existing_doi.get(field))
-                doi[field] = list(set(doi[field]))
-
-            draft = Change.objects.get(uuid=uuid)
-            draft.update = doi
-            draft.save()
-
-            return f"DOI already exists as a draft. Existing draft updated. {uuid}"
-
-        # if db item exists, replace cmr metadata fields and append suggestion fields as an update
-        existing_doi = DOI.objects.all().filter(uuid=uuid).first()
-        existing_campaigns = [str(c.uuid) for c in existing_doi.campaigns.all()]
-        existing_instruments = [str(c.uuid) for c in existing_doi.instruments.all()]
-        existing_platforms = [str(c.uuid) for c in existing_doi.platforms.all()]
-        existing_collection_periods = [str(c.uuid) for c in existing_doi.collection_periods.all()]
+    def create_merged_draft(self, recent_draft, doi_recommendation):
+        """Takes an existing doi draft and a newly generated doi_recommendation and
+        returns a merged object that represents the most up-to-date data, retaining
+        the originally curated fields but updating any core CMR values.
+        """
+        doi_recommendation = json.loads(json.dumps(doi_recommendation))
+        for field in self.previously_curated_fields:
+            doi_recommendation[field] = recent_draft.update[field]
 
-        doi["campaigns"].extend(existing_campaigns)
-        doi["instruments"].extend(existing_instruments)
-        doi["platforms"].extend(existing_platforms)
-        doi["collection_periods"].extend(existing_collection_periods)
+        return doi_recommendation
 
-        for field in ["campaigns", "instruments", "platforms", "collection_periods"]:
-            doi[field] = list(set(doi[field]))
+    def make_create_draft(self, doi_recommendation):
+        doi_obj = Change(
+            content_type=ContentType.objects.get(model="doi"),
+            model_instance_uuid=None,
+            update=json.loads(json.dumps(doi_recommendation)),
+            status=Change.Statuses.CREATED,
+            action=Change.Actions.CREATE,
+        )
+        doi_obj.save()
+        return doi_obj
 
+    def make_update_draft(self, merged_draft, linked_object):
         doi_obj = Change(
             content_type=ContentType.objects.get(model="doi"),
-            model_instance_uuid=str(uuid),
-            update=json.loads(json.dumps(doi)),
+            model_instance_uuid=linked_object,
+            update=json.loads(json.dumps(merged_draft)),
             status=Change.Statuses.CREATED,
             action=Change.Actions.UPDATE,
         )
-
         doi_obj.save()
+        return doi_obj
 
-        return f"DOI already exists in database. Update draft created. {uuid}"
+    def get_published_uuid(self, recent_draft):
+        if recent_draft.action == Change.Actions.UPDATE:
+            return recent_draft.model_instance_uuid
+        else:
+            # this must be a published create draft, who's uuid will match the published uuid
+            return recent_draft.uuid
+
+    def add_to_db(self, doi_recommendation):
+        """After cmr has been queried and each dataproduct has received recommended UUID
+        matches, each of this is added to the database. Because DOIs might already exist
+        as drafts or db objects, this function will create an update for existing DOIs or
+        a second draft in the case of duplicates. When updating, freshly queried metadata
+        is prioritized, but previously existing UUID links are preserved.
+
+        Args:
+            doi_recommendation (dict): DOI metadata dictionary containing original CMR metadata and recommended
+                UUID links.
+
+        Returns:
+            str: String indicating action taken by the function
+        """
+
+        # search db for the most recently worked on draft that matches our concept_id
+        recent_draft = (
+            Change.objects.filter(
+                content_type__model='doi',
+                action__in=[Change.Actions.CREATE, Change.Actions.UPDATE],
+                update__concept_id=doi_recommendation['concept_id'],
+            )
+            .order_by("-updated_at")
+            .first()
+        )
+
+        if not recent_draft:
+            # no DOI draft exists yet for this concept_id, so we create one
+            self.make_create_draft(doi_recommendation)
+
+        # TODO: handle delete drafts?
+
+        elif self.is_core_metadata_changed(recent_draft, doi_recommendation):
+            # a doi draft of some kind exists, and it's different from the new data
+            generic_admin_user = User.objects.get(username='nimda')
+            merged = self.create_merged_draft(recent_draft, doi_recommendation)
+            if recent_draft.status == Change.Statuses.PUBLISHED:
+                # recommendations have been previously approved and we are just updating
+                # to the latest CMR metadata
+                published_uuid = self.get_published_uuid(recent_draft)
+                doi_obj = self.make_update_draft(merged, published_uuid)
+                doi_obj.publish(generic_admin_user, notes='CMR metadata updated')
+            else:
+                # an update or create draft is in progress and recommendations are
+                # not yet approved, so we need to fix the in progress object
+                recent_draft.update = merged
+                recent_draft.status = Change.Statuses.CREATED
+                recent_draft.save()
+                approval_log = ApprovalLog.objects.create(
+                    change=recent_draft,
+                    user=generic_admin_user,
+                    action=ApprovalLog.Actions.REJECT,
+                    notes="New CMR metadata added, needs to be re-reviewed",
+                )
+                approval_log.save()
 
     def generate_recommendations(self, table_name, uuid, development=False):
         """This is the overarching parent function which takes a table_name and a uuid and
@@ -401,7 +452,7 @@ def generate_recommendations(self, table_name, uuid, development=False):
             pickle.dump(metadata_list, open(f"metadata_{uuid}", "wb"))
 
         supplemented_metadata_list = self.supplement_metadata(metadata_list, development)
-
+        json.dump(supplemented_metadata_list, open('cmr_data.json', 'w'))
         for doi in supplemented_metadata_list:
             logger.debug(self.add_to_db(doi))
 

diff --git a/cmr/tests/__init__.py b/cmr/tests/__init__.py
diff --git a/cmr/tests/generate_cmr_test_data.py b/cmr/tests/generate_cmr_test_data.py
@@ -0,0 +1,25 @@
+import json
+from cmr.cmr import query_and_process_cmr
+
+
+def generate_cmr_response():
+    """Many of our processes rely on first getting information from the CMR API.
+    This function is only run once and it saves a sample CMR file to the repository.
+    All test functions that rely on CMR will use this file, and there is a separate test
+    that evaluates whether CMR still gives the same response.
+
+    To run this function and generate the file, use a manage.py shell
+    """
+
+    return query_and_process_cmr('campaign', ['ACES'])
+
+
+def save_cmr_response(cmr_metadata):
+    """saves the cmr response generated by generate_cmr_response"""
+
+    json.dump(cmr_metadata, open('cmr/tests/cmr_response_aces.json', 'w'))
+
+
+if __name__ == '__main__':
+    cmr_response = generate_cmr_response()
+    save_cmr_response(cmr_response)