GISAID EpiCoV Updater

Regularly fetches updates from GISAID EpiCoV.

GISAID EpiCoV Updater is an RPA-driven software bot that automates the curation of GISAID EpiCoV updates into the ViraTrend datastore. Designed as a robust data-pipeline component, it delivers fast, reliable synchronization of new records with minimal manual effort.

circle-info

See “Real-time Data Ingestion” for how this project supports ViraTrend’s development.

chevron-rightTL;DRhashtag

GISAID EpiCoV Updater is a software bot that automates curation of GISAID EpiCoV updates into the ViraTrend datastore. It showcases a typical data-pipeline use case powered by RPA.

Problem

  • Situation: GISAID EpiCoV access is slow and limited, so apps keep a local copy for performance

  • Challenges: reliably detect, curate, and merge new GISAID entries into the local dataset without gaps or duplication

Solution

An RPA bot that enables near real-time ingestion by:

  • Regularly checking for new EpiCoV records

  • Downloading and cleansing new entries

  • Normalising to the contract schema

  • Transactionally integrating into the existing dataset

Outcome

  • Always-current local dataset with fast access

  • Accurate, deduplicated, and schema-aligned data

  • Up-to-date analyses with minimal manual effort

Why RPA

  • Excels at repetitive, rules-based tasks

  • Interacts with legacy systems to provide an API-like overlay

  • Enables rapid prototyping with minimal or low-code effort

  • Built-in auditing for transparent, accurate action tracing

Process Flow

  1. Logs into GISAID EpiCoV, searches the Australia dataset, and downloads the record‑ID list.

  2. Compares IDs against ViraTrend via Backend API.

  3. Identifies and downloads missing records.

  4. Unzips and loads into a local cleansing datastore.

  5. Normalises data to the contract format.

  6. Commits updates transactionally via Backend API.

Features

  • Reversibility: Full table and record backups support rollbacks.

  • Immediacy: Updates take effect instantly without service restarts.

  • Auditability: Screen recordings, logs, and staged files enable full audits.

  • Data Integrity: Transactional commits ensure all‑or‑nothing updates on errors.

  • Flexibility: Bot tolerates variable download times, unresponsive UIs, and server errors.

  • Recoverability: Robust recovery resumes from the last checkpoint after major interruptions.

  • Differential Updates: Bot fetches only new GISAID entries to save bandwidth and time.

  • Edge Computing: Heavy joins and correlated subqueries run on a local blueprint DB before pushing to production, reducing overhead.

Why is maintaining a local EpiCoV copy hard?

Because GISAID EpiCoV access is slow and constrained, applications often maintain a local copy for performance. The challenge is ensuring that this local dataset stays accurate and complete over time—detecting new entries promptly, curating them correctly, and merging them without gaps or duplicates. Doing this reliably at scale, amid variable portal behavior and intermittent errors, is non-trivial.

How does the bot keep data in sync?

The bot automates near real-time ingestion by regularly checking for new EpiCoV records, downloading and cleansing only the new entries, and normalising them to the agreed contract schema. It then integrates the updates transactionally into the existing dataset, ensuring consistency and preventing partial writes. This creates an API-like overlay on a legacy interface while preserving data integrity.

What benefits does the bot bring?

The bot delivers an always-current local dataset with fast access and minimal operational burden. Data remains accurate, deduplicated, and schema-aligned, enabling up-to-date analyses without manual stitching or rework. The result is a dependable, low-touch pipeline that scales as data volumes grow.

Why use RPA for this workflow?

RPA excels at repetitive, rules-based workflows and is ideal for orchestrating interactions with legacy systems that lack modern APIs. It supports rapid prototyping with minimal or low-code effort, and its built-in auditing provides transparent, precise tracing of every action. In this context, RPA transforms a constrained web workflow into a reliable, production-grade ingestion layer.

What steps does the bot follow?

The bot logs into GISAID EpiCoV, searches the Australia dataset, and retrieves the latest record-ID list. It compares those IDs against ViraTrend via the Backend API to identify gaps, then downloads only the missing records. New data is unzipped and loaded into a local cleansing datastore, normalized to the contract format, and committed transactionally through the Backend API. If interruptions occur, the bot resumes from the last checkpoint to maintain continuity.

What makes the bot production-ready?

The bot is built for resilience and trust. Full table and record backups enable rollbacks, while transactional commits ensure all-or-nothing updates when errors arise. Screen recordings, logs, and staged files provide comprehensive auditability. Updates take effect immediately without service restarts, and the bot tolerates variable load times, unresponsive UIs, and transient server issues. Differential updates minimize bandwidth and processing, and heavy joins or correlated subqueries run on a local blueprint database before promotion to production, keeping the primary system lean.

Last updated