Supplier Triplet Dataset
The supplier triplet dataset is constructed to train and validate the ability of Large Language Models (LLMs) to extract supplier triplets from unstructured textual data. The prompts in the dataset comprise page texts extracted from 1,000 supplier web links in North Carolina, United States. The completions in the dataset contain all the triplets extracted from these prompts. For the completion, subjects and predicates are predefined by the SUDOKN ontology, and objects are mostly taken directly from the web pages to maintain accuracy, with the remaining objects double-checked and standardized by Subject Matter Experts (SMEs) according to the SUDOKN ontology. SMEs also harmonize terms to industry standards, such as standardizing “Automotive-ICE” to “Automotive” and “Metal-Aluminum” to “Aluminum”.
Funding
Proto-OKN Theme 1 - Supply and Demand Open Knowledge Network (SUDOKN)
Directorate for Technology, Innovation and Partnerships
Find out more...