Assistant developed by Norwegian engineer Rukaya Johaadien helps transform spreadsheets into standardized GBIF-ready datasets; Planetary Knowledge Base and CoreTech Assistant place second and third in annual incentive prize.
ChatIPT, a chatbot that cleans and standardizes spreadsheets, creates basic metadata and guides students and researchers through the process of publishing data into the GBIF network, has won GBIF’s 2024 Ebbe Nielsen Challenge.
Developed by Rukaya Johaadien, head engineer at GBIF Norway in the Natural History Museum, University of Oslo, ChatIPT helps new or occasional data publishers without specialized knowledge transform a raw, unformatted spreadsheet and share a standardized dataset on GBIF.org. Rukaya Johaadien’s chatbot provides conversation-style support to students and researchers who hold biodiversity data but are first-time or infrequent data publishers. Its prompts guide users as it cleans and standardizes spreadsheets, creates basic metadata, and publishes well-structured datasets on GBIF.org as a Darwin Core Archive.
Selected second- and third-prize winners, respectively are:
Planetary Knowledge Base, an automated transcription service for specimen developed by a team from the Natural History Museum, London, led by postdoc Gu “Hiris” Qianqian with Vince Smith and Ben Scott. This early prototype provides an automated transcription service that captures structured semantic data from specimen label images by leveraging large language models (LLMs) and Graph Convolutional Neural Networks (GCNNs). Through its innovative approach, the Planetary Knowledge Base (PKB) may transform processes for digitizing and analysing natural history collections.
CoreTech Assistant, an AI-based help desk for beginner-level data publishers built by Chen Yao, a Taiwanese crop scientist and volunteer at TaiBIF currently performing military service. Chen Yao’s multilingual chatbot prototype leverages Retrieval-Augmented Generation (RAG) technology to bridge users’ linguistic gaps while reducing the steep learning curve for the Darwin Core data standard.