Tracking the emergence and spread of COVID-19 using sequence data

A summary of Sam Lycett's presentation at the SARS-CoV-2/COVID-19 workshop.

Phylogenetic analyses of SARS-CoV-2 sequences show the virus is very similar to SARS, but also to a variety of other beta-coronaviruses previously identified in bats. We are able to infer these relationships thanks to viral sequencing data, which are being shared globally on the publicly accessible GISAID database (https://www.gisaid.org). Throughout January and February, most sequences deposited were isolated in Asia, but in recent weeks we have seen an increasing number from Europe and North America.

SARS-CoV-2 sequences are very similar to each other, but we do see a few mutations between isolates. Whilst we do not expect these mutations to be affecting virulence, we can use them to trace the spread of the epidemic.

Building a time-scaled phylogenetic tree from all the available sequences, we can estimate the origin of the virus to be November 2019. The same tree indicates that the virus was introduced to the UK multiple times, with clusters spreading from each introduction. We can also see clear clusters in the sequences coming from different continents; for example, the sequences from Europe are all much more similar to each other than the sequences from Oceania.

This data can be overlaid onto a map of the world, which shows that in January and February, the epicentre of viral spread was Asia, into Europe, Australia and North America. However, as time has progressed, the epicentre has moved to Europe instead, spreading to South America and elsewhere.

As we collect more sequence data, we may be able to infer more details about the spread of the virus, such as estimating differences in R0 between different countries and continents. In turn, this may help to show which intervention strategies have been most effective.

Watch Sam's talk