GISAID EpiCoV Updater
Regularly fetches updates from GISAID EpiCoV.
GISAID EpiCoV Updater is an RPA-driven software bot that automates the curation of GISAID EpiCoV updates into the ViraTrend datastore. Designed as a robust data-pipeline component, it delivers fast, reliable synchronization of new records with minimal manual effort.
See “Real-time Data Ingestion” for how this project supports ViraTrend’s development.


TL;DR
GISAID EpiCoV Updater is a software bot that automates curation of GISAID EpiCoV updates into the ViraTrend datastore. It showcases a typical data-pipeline use case powered by RPA.
Problem
Situation: GISAID EpiCoV access is slow and limited, so apps keep a local copy for performance
Challenges: reliably detect, curate, and merge new GISAID entries into the local dataset without gaps or duplication
Solution
An RPA bot that enables near real-time ingestion by:
Regularly checking for new EpiCoV records
Downloading and cleansing new entries
Normalising to the contract schema
Transactionally integrating into the existing dataset
Outcome
Always-current local dataset with fast access
Accurate, deduplicated, and schema-aligned data
Up-to-date analyses with minimal manual effort
Why RPA
Excels at repetitive, rules-based tasks
Interacts with legacy systems to provide an API-like overlay
Enables rapid prototyping with minimal or low-code effort
Built-in auditing for transparent, accurate action tracing
Process Flow
Logs into GISAID EpiCoV, searches the Australia dataset, and downloads the record‑ID list.
Compares IDs against ViraTrend via Backend API.
Identifies and downloads missing records.
Unzips and loads into a local cleansing datastore.
Normalises data to the contract format.
Commits updates transactionally via Backend API.
Features
Reversibility: Full table and record backups support rollbacks.
Immediacy: Updates take effect instantly without service restarts.
Auditability: Screen recordings, logs, and staged files enable full audits.
Data Integrity: Transactional commits ensure all‑or‑nothing updates on errors.
Flexibility: Bot tolerates variable download times, unresponsive UIs, and server errors.
Recoverability: Robust recovery resumes from the last checkpoint after major interruptions.
Differential Updates: Bot fetches only new GISAID entries to save bandwidth and time.
Edge Computing: Heavy joins and correlated subqueries run on a local blueprint DB before pushing to production, reducing overhead.
Why is maintaining a local EpiCoV copy hard?
Because GISAID EpiCoV access is slow and constrained, applications often maintain a local copy for performance. The challenge is ensuring that this local dataset stays accurate and complete over time—detecting new entries promptly, curating them correctly, and merging them without gaps or duplicates. Doing this reliably at scale, amid variable portal behavior and intermittent errors, is non-trivial.
How does the bot keep data in sync?
The bot automates near real-time ingestion by regularly checking for new EpiCoV records, downloading and cleansing only the new entries, and normalising them to the agreed contract schema. It then integrates the updates transactionally into the existing dataset, ensuring consistency and preventing partial writes. This creates an API-like overlay on a legacy interface while preserving data integrity.
What benefits does the bot bring?
The bot delivers an always-current local dataset with fast access and minimal operational burden. Data remains accurate, deduplicated, and schema-aligned, enabling up-to-date analyses without manual stitching or rework. The result is a dependable, low-touch pipeline that scales as data volumes grow.
Why use RPA for this workflow?
RPA excels at repetitive, rules-based workflows and is ideal for orchestrating interactions with legacy systems that lack modern APIs. It supports rapid prototyping with minimal or low-code effort, and its built-in auditing provides transparent, precise tracing of every action. In this context, RPA transforms a constrained web workflow into a reliable, production-grade ingestion layer.
What steps does the bot follow?
The bot logs into GISAID EpiCoV, searches the Australia dataset, and retrieves the latest record-ID list. It compares those IDs against ViraTrend via the Backend API to identify gaps, then downloads only the missing records. New data is unzipped and loaded into a local cleansing datastore, normalized to the contract format, and committed transactionally through the Backend API. If interruptions occur, the bot resumes from the last checkpoint to maintain continuity.
What makes the bot production-ready?
The bot is built for resilience and trust. Full table and record backups enable rollbacks, while transactional commits ensure all-or-nothing updates when errors arise. Screen recordings, logs, and staged files provide comprehensive auditability. Updates take effect immediately without service restarts, and the bot tolerates variable load times, unresponsive UIs, and transient server issues. Differential updates minimize bandwidth and processing, and heavy joins or correlated subqueries run on a local blueprint database before promotion to production, keeping the primary system lean.
Last updated