This directory includes the scripts to create the innovation benchmark dataset. Use it in 3 steps:
- Create a list of paper titles or keywords in the
papers_to_searchfile. - Run
python 0_crawl_paper.pyto collect papers and the related meta data. - Run
1_create_inno_graph.pyto get the final innovation dataset.
Here are some important notes:
- If you want to collect papers related to some keywords, change exact_match to False in
0_crawl_paper.py. - You need to set your OpenAI key globally, or edit
utils/openai_utils.py. - Your final results is stored at
innovation_graph/innovation_graph_final.json. - The innovation dataset we used in our evaluation is
merged_papers_with_fields.json. It has gone through some additional manual revisement.