WP1

WP 1 AI-ready database construction

This work package focuses on bringing together all necessary data for the AISIT AI-ready database and implements quality assurance and quality control (QA/QC) processes to ensure data quality, consistency, transparency, and reusability. WP1 provides the foundational data infrastructure required for robust downstream AI applications.

WP 1.1 Consolidation of BIOPOLE data

BIOPOLE is a multi-institute programme funded by the Natural Environment Research Council (NERC) under the National Capability Science Multi-Centre award scheme (NC-SM2). A key component of BIOPOLE has been the generation of new freshwater tracer data from the polar regions, including rivers, glacial systems, coasts and ocean.

AISIT will bring together and standardise datasets generated by BIOPOLE field campaigns into a single AI-ready database, including metadata (e.g., platform, method, date) and essential ancillary data (e.g., location, depth, temperature and salinity).

WP 1.2 Survey of existing freshwater databases

This task will identify, evaluate, and integrate existing freshwater tracer datasets from a wide range of national and international sources. The work will address the following key elements:

Data liaison, methodology, and licensing: Seawater and freshwater δ¹⁸O and other tracer data have been generated by numerous projects and initiatives. These datasets are often regionally and seasonally focused, use different sampling and analytical methodologies, and are stored as isolated data products within national or institutional repositories. As a result, datasets may be disparate, partially overlapping, and subject to different licensing conditions.

AISIT will liaise with data holders to:

  • Assess data availability
  • Clarify licensing and usage constraints
  • Identify methodological differences and potential measurement biases

This engagement is essential for developing a harmonised AI-ready database of Arctic freshwater tracers.

Tracer and ancillary data integration: Salinity measurements are fundamental to understanding changes in the ocean freshwater system but alone do not identify freshwater sources. When combined with stable oxygen isotopes (δ¹⁸O), salinity data can distinguish between sea-ice melt and meteoric freshwater inputs, including river runoff, glacial melt, and precipitation.

Additional tracers, such as dissolved barium, neodymium, and rare earth elements (REEs), can provide further source resolution but are typically much sparser than salinity and δ¹⁸O datasets. As such this may be collated in the future but AISIT is focused on δ18O. Correct interpretation of all tracer data depends on the availability of ancillary oceanographic measurements, including temperature and salinity.

Locating, collating, and harmonising these diverse data streams from multiple data holdings is a critical step in building the AISIT database.

WP 1.3 Develop a metadata framework

High-quality, well-structured metadata are an essential prerequisite for data interpretation, reuse, and responsible AI deployment. AISIT will develop a comprehensive metadata framework that:

  • Follows community-accepted standards
  • Documents expert-led quality control, biases, errors, and missing values
  • Clearly states licensing terms and constraints on data use
  • Provides full traceability of data provenance

This approach minimises the effort and specialist expertise required by AI practitioners to understand appropriate data usage and ensures transparency and robustness in downstream applications.

AISIT will adopt the Croissant (meta)data standard, which is specifically designed to support discoverability and access to AI-ready datasets in line with FAIR principles.

By contributing to and engaging with the emerging trans-disciplinary community developing Croissant and its associated software ecosystem, AISIT will support efficient, trustworthy, and transparent AI practices. This framework will also enable the future ingestion and standardisation of third-party data sources.

WP 1.4 Host our dataset using a FAIR compliant infrastructure

The AISIT database will be hosted using FAIR-compliant infrastructure. It will be assigned a permanent DOI and released into the public domain under the Open Government Licence (OGL) v3 .
The dataset will be made available via the UK Polar Data Centre, with metadata compliant with the ISO 19115 Geographic Information Metadata standard.

Petra ten Hoopen

British Antarctic Survey

WP1 Lead