Skip to content

Commit e29ced1

Browse files
authored
Merge pull request #1 from nikhilk/main
Initial commit for metadata-as-code and enrichment
2 parents 6fc1714 + 49b07a1 commit e29ced1

51 files changed

Lines changed: 17139 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

toolbox/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
11
# Toolbox
22

33
Contains tools to work with Knowledge Catalog metadata.
4+
5+
* [Metadata as Code](./mdcode/README.md)
6+
Provides the ability to manage metadata in the form of source code artifacts that
7+
can be sync'd with metadata in Knowledge Catalog.
8+
9+
* [Enrichment Agent](./enrichment/README.md)
10+
Provides an ready-to-use agent and customizable harness to produce, evolve/improve
11+
and maintain metadata within Knowledge Catalog and make it ready for consumption by
12+
agents.

toolbox/enrichment/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
build/
2+
dist/
3+
demo/
4+
node_modules/

toolbox/enrichment/README.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Enrichment Agent
2+
3+
The enrichment agent for Knowledge Catalog provides a customizable agentic
4+
workflow for extracting information from various sources to build metadata
5+
about data assets, which can then be used as context.
6+
7+
## Usage
8+
9+
### Prerequisites
10+
11+
The enrichment agent depends on the [Metadata as Code](../mdcode/README.md) capability.
12+
Follow the instructions on that page on using the `kcmd` tool.
13+
14+
### CLI
15+
16+
The package provides the `kcenrich` CLI tool. This is distributed as a standalone binary.
17+
18+
```bash
19+
# Initialize a new catalog snapshot for a bigquery dataset
20+
kcmd init --bigquery-dataset <projectId>.<datasetId>
21+
22+
# Initialize a new catalog snapshot for a bigquery dataset with specific types
23+
kcmd init --bigquery-dataset <projectId>.<datasetId>
24+
25+
# Pull the latest catalog snapshot from the Knowledge Catalog service
26+
kcmd pull
27+
28+
# Run the enrichment tool
29+
kcenrich catalog --path . --config-path ../demo
30+
```
31+
## Developer Workflow
32+
33+
### Setup
34+
35+
```bash
36+
git clone https://github.com/googlecloudplatform/knowlege-catalog
37+
cd toolbox/mac
38+
npm install
39+
```
40+
41+
### Build
42+
43+
```bash
44+
npm run build
45+
```
46+
47+
### Test
48+
49+
```bash
50+
npm run test
51+
```
52+
53+
### Demo
54+
55+
The repository contains a self-contained demo. Running the demo involves creating a BigQuery dataset and a Dataplex EntryGroup within your cloud project.
56+
57+
**Initialize Environment**
58+
```bash
59+
export DEMO_CLOUD_PROJECT="<your-gcp-project-id>"
60+
```
61+
62+
**Initialize gcloud**
63+
```bash
64+
gcloud auth application-default login
65+
gcloud config set project $DEMO_CLOUD_PROJECT
66+
gcloud config set compute/region us
67+
```
68+
69+
**Setup demo resources**
70+
```bash
71+
# Create a BigQuery dataset and table
72+
bq mk ${DEMO_CLOUD_PROJECT}:demo-dataset
73+
bq mk -t ${DEMO_CLOUD_PROJECT}:demo-dataset.demo-table name:string,value:string
74+
```
75+
76+
**Create and populate a catalog snapshot**
77+
```bash
78+
mkdir -p catalog
79+
cd catalog
80+
kcmd init --bigquery-dataset ${DEMO_CLOUD_PROJECT}.demo-dataset
81+
kcmd pull
82+
```
83+
84+
**Enrich the metadata**
85+
```bash
86+
kcenrich catalog --path . --config-path ../config
87+
```
88+
89+
**Clean up**
90+
```bash
91+
bq rm -r ${DEMO_CLOUD_PROJECT}:demo-dataset
92+
```

0 commit comments

Comments
 (0)