Skip to content

GEM 2014 – Day three

June 11, 2014

…crawl out of bed again and get on the bus even more bleary-eyed than yesterday.  Today I’m presenting my poster, which is an updated version of a poster I presented earlier this year and looks like this:


See also days one and two.  This morning we’re hearing about anopheles:

Session IIX

Dan Neafsey – Begins with acknowledgements.  Why is anopheles the only mosquito that transmits malaria?  Need more than one genome to answer that.  Phylogeny based on all the sequences they have, lots of species.  Quality improved over time, but variable, negative correlation between heterozygosity rate and contig size.  Comparison to Sanger-based assemblies across 3,000 ortholog genes looks mostly good.  Gene models good too.  Anophelines are losing introns even faster than other dipterans.  Chromosomal evolution – anchor 1/3rd to one half of scaffolds to chromosomal arms, which seem to have evolved largely intact although with some rearrangement. But within the arms there is high degree of chromosomal rearrangement.  X is special as it has a distinctly higher rate of rearrangements (this is also seen in other species e.g. Drosophila), up to 3 times higher than autosomes.  Anophelines have 5-fold higer rate of gene gain/loss than drosophilids.  (There are caveats to this analysis, e.g. greater branch length. but result is same whichever way we do it).  ~80 gain/loss events per million years => hundreds of genes gained/lost between even the most closely related species in this collection.  Plot of this rate by gene category, gustatory and olfactory receptors are the most variable.  Olfactory genes (‘chemosensation’) may play a role in choosing hosts, shows pathway diagram from this.  Another CAFE analysis for gain/loss in olfactory receptors.  The mating plug is essential for sperm storage but is not conserved in anopheles.  Absence of mating plug in albimanus suggests no-plug may be ancestral state.  TG3 gene is evolving faster than its paralogues, and it’s present in albimanus even though it has no mating plugs.  LRR (Luecine-rich repeat) and TEP (Thioester-containing ) proteins are the fastest-evolving immune genes.  There are hundreds of cuticalar proteins per species and show concerted evolution and physical clustering – probably indicating rapid duplication but conservation across copies.  Evidence of coexpression of cuticular proteins.  Maybe this is useful for preserving interchangeability of these proteins.  Absence of key genes: Head Peptide, IGF1, CYP18a1.  Concludes.

Questioner asks about coevolution with parasites, e.g. avian malarias only use a single mosquito strain too.  Answer is yes it’s fascinating we need more genomes.

Nora Besansky – starting with Alistair Miles’ talk left off, about introgression.  Defined in 1938 by Anderson & Hubrecht.  It was thought that cross-species hybridisation was rare, but that has changed recently e.g. human<-neanderthal introgression, also butterfly adaptations.  Introgression is hard to document, most common in closely related species.  Difficult to distinguish from incomplete lineage sorting (ILS).  Anopheles Gambiae is a complex of 8 closely related species.  Introgression plausible; fertile hybrid females observed in nature. Includes 3 sympatric major vectors and 2 non-vector species.  Coluzzi pioneered use of inversions to infer phylogeny.  But X chromosome and autosomes suggest different things here and we thought the X chromosome story was introgression.  Talking about this which found 7 genes showing evidence of introgression.  10 years later have 16 anopheles genomes project which resolves this better. Also have sequencing of field samples.  Windows-based approach shows autosome vs. X chromosome discordance (in 50kb windows, 85 of 945 possible rooted topolgies observed).  X chromosome supports completely different topologies.  What is the real branching order?  Introgression reduces species divergence, so the ‘correct’ tree should be the one with greatest mean convergence height.  Trees and a ‘chromoplot’ for Gambiae, L, Arabiensis (I don’t know what ‘L’ is here).  We see greater mean tree height for the X.  So we think Gambiae and Arabiensis are not sister taxa (even though they appear most closely related, e.g. this).  Notes species and inversion-based phylogenies are congruent.  Introgression makes standard version wrong!  Now talking about how they distinguished ILS from introgression.  Used ‘D-statistics’, D_{\text{FOIL}}-statistics (ABBA-BABA, see here maybe, I think this is like a four-population test from here.).  Concludes gambiae-arabiensis introgression is absolutely crazy on the autosomes, but the X doesn’t introgress except the rDNA locus.  Also merus-quadrannialatus introgression.  Concludes introgression has potentially been a very important process in anopheles’ role as vectors.

Frederic Tripet – Going experimental.  M and S type based on rDNA markers.  Picture of mosquito swarm, want to understand assortative mating.  Four classical scenarious of speciation: allopatric, peripatric, parapatric, sympatric.  These leave different genomic signals.  Allopatric has genome-wide differentiation and genome hitchiking, sympatric gives a few loci that are islands of speciation, ‘divergence hitchhiking’.  Mechanisms of assortative mating: most mating is in swarms, males start swarms, females enter the swarms and males chase.  Then there is a sequence of physical interactions for copulation.  Could be a number of cues involved in mate choice.  Genetic basis of assortative mating.  Islands of M/S speciation found at end of Xag, 2L and 3L.  Look for ancestral genes that led to M/S split.  Use backcrossing to get X chromosome S type island in M background mosquitos and vice versa (this took a year or so).  Then cross them to look for assortative mating.  After removing some stuff, ‘M’ type females choose their own type almost always, but males don’t care.  Genome sequencing to prove the crossing worked.  Conclusion: assortative mating can only reasonably be explained by differences in that part of the X chromosome.  Now looking at field populations.  Is X island associated with swarming behaviour?  In the field see hybrid X island types (and they are quite informative).  Claim is that X island also determines swarm choice in males!  (This is weird, what’s going on?)  Conclusions.  Lab system good for studying female choice, points to X chromosome island.  Field system focusses on males and points to X island.

Chris Clarkson – Also about speciation islands.  Talking about this and this.  How much of 2L island has introgressed between Gambiae and Coluzzi?  This talk was nice but I haven’t made notes.

Session IX

Michael Slotman – I missed this talk.

Abdoulaye Diabate – about mosquito swarms.  Swarms are very stable, and occur in specific places which are maintained over days, months, and even years.  (In Burkina Faso some of the swarms are still occuring in the same place as ten years ago.)  Only plausible explanation is that there’s something about the location that the mosquitos like.  Map of swarms that occur in a village.  Swarms occur in clusters.  Kernel density estimate showing heatmap of swarm clustering – ‘hotspots’.  What about swarm size?  Map showing swarm size.  Larger swarms are in cluster hotspots.  Same picture of hotspots in July, August, September, October.  Why?  Answer is extremely simple – we don’t know.  Compares mosquito swarm selection to person choosing to enter a busy petrol station.  Should you stay or try somewhere else?  Most important factor seems to be ‘total number of markers’ (I didn’t understand at the time, but he means physical markers on the ground that mosquitos like to swarm above).  Now mating competitiveness.  What are the characteristics of the sexiest men?  Discusses peacock feathers, bird singing contest,  Spider ‘nuptial gift‘ where spiders bring a gift to the female.  Used digital camera to follow a swarm for 6 months.  Count number of males in swarm and number of females visiting swarm.  Peak swarm size 20mins after sunset.  Female visitation distribution similar.  But big variation in swarm size between sites, and variation in how many mobile males and how many couples occur.  Mating success significantly differs between sites.  Suggestion is that swarm size is important in attracting females but not the only factor in mating success.  Mating males appear to have larger wing size.

In response to question, speaker says that it seems that objects that form contrasts on the ground – especially black sheets – are good markers for swarms.  Does swarming occur inside houses?  Yes but speaker thinks this is very marginal.

Michelle Riehle – malaria has a durable disease transmission system.  Lots of heterogeneity – population and individual.  Using medium density reasonable-cost to track population structure in .  I did not really pay attention to this.

Martin Donnelly and Tiago Antao – The Anopheles gambiae 1000G project.  2012 initiation of parasite diversity network and also anopheles 1000G project.  Opportunistic sampling strategy – what have you got in your freezer? – about 100 individuals from each of 10 populations.  Rift valley is a substantial barrier to gene flow.  Region of high hybridisation in the west.  2161 mosquitos sequenced to 30X.  First tranche released.  More in 2015.  Martin gives over to Tiago.  Three problems: speciation, inversions, insecticide resistance.  PCA of 3L chromosome only (to avoid the inversions).  PCA does not really reflect geography.  Uganda is in the middle of Cameroon on the PC, with half of BurkinaFaso, the other half of BurkinaFaso is in a different part of the plot.  Can’t really interpret this as we would for humans.  Actually PC1 gets coluzzi versus gambiae, it is species that is important.  Kenya is especially bewildering.  It has lots of resistance but also a little sensitive genotypes at KDR.  Lots of runs of homozygosity / loss of diversity in resistant mosquitos.  Now PCA of 2La.  Four of five clusters, again countries are together, they cluster by karyotype (obviously.  PC2 represents M versus S again.)  Postscript: shows admixture plot based on a ‘sloppy’ way of filtering genotype calling.  But proper filtering (by Alistair) completely changes this.  Time spent in getting high-quality dataset is crucial.

Session X

Sarah Volkman – missed this.

Philip Bejon – Mobilis in Mobile.  Ross/McDonald R_0.  R_0<1 is a parasite that will die out.  R_0>1 means it hangs on.  But 1<R_0<2 for most of Africa – it’s not that far from extinction.  But it doesn’t feel like that, one possible explanation is hotspots.  Blanket control measures won’t work very well if there are hotspots.  Trends in transmission in Kilifi, dropping since 1999, possibly rebound since 2008, but it’s probably not a rebound in transmission but a drop in immunity due to lower transmission.  Cool video of transmission intensity which shows quite a bit of change between years and a disproportionate southerly trend since 2008.  High transmission correlates with low bednet usage.  Hotspots within hotspots within hotspots (using something called ‘satscan’ maybe?).  A kind of fractal-like pattern.  For intervention we could look for hot homesteads, hot villages, hot counties.  Which we should do depends on how fast parasites are moving around.  If slowly, intervening in hotspots won’t help much, but it might if they move quickly.  Data from Gambia, Western Kenya (a single 7-week cross section), and Kilifi.  PCA show little structure.  But map of homesteads coloured by PC1 show more often red dots next to each other and more often blue dots next to each other.  “Moran’s I“, testing for auto-correlation, showing substantial autocorrelation over short distances but rapidly tailing off.  Or: pairwise analysis, can relatedness be predicted by separation in time or space?  Distance is informative but less informative as you get distance over time.   Summary.  What next?  Global temporal changes in malaria, Noor et al, 2014.  At many points in Africa we could really try to give elimination a go, but we are not at that point everywhere.

Questioner asks if hotspots are jumping or popping?  Answer is that intensity hotspots can be quite stable but because immunity is quickly developed you don’t see a burden of disease.  Febrile malaria hotspots, by contrast, jump around.  (They pop, I suppose, rather.)

Bryan Greenhouse – What can a (malaria transmission) network tell us?  Cartoon of example transmission networks (also using R_C which is like R_0 that Philip used.  Identify hot ‘pops’: populations / groups driving transmission.  This has been done for flu, SARS, rabies in the USA, but not for malaria – because it is very hard.  Dense sampling required.  Pilot study in Zanzibar.  I’m not really listening.  This talk contains a molecular Mondrian.

Brandyce St. Laurent – shows complex map of global distribution of malaria vectors, saying ‘if only it were this simple’.  Shows 10 different anophelines they’ve caught in Cambodia.  One goal is to get colonies going.  A brief survey during rainy season, 2 weeks in three regions, found over 20 species of mosquito.  At least 14 found to definitely carry (some type of) malaria.  Human landing collections are labour intensive – tells us from personal experience – and in Cambodia doesn’t yield many mosquitos.  A cow in a tent works better and gives similar species distributions.  Plot of number of oocysts per midgut by parasite isolate and vector – in lab cultures taken from field isolates.  This is quite different between mosquito and parasite types.  Are the resistant parasites really going into different mosquito strains?  Initial results suggest not.  This was a pretty nice talk.

Session XI

Lisa J White – Says she’s aiming for worst joke award (see later).  Talking about a mathematical model of transmission in Cambodia.  ‘Stochastic patch model’.  Model of population behaviour and movement.  Assume people move around underneath clouds of mosquitos.  Model made in geographical patches.  Done before for different infections – e.g. H1N1 flu, SARS.  Talking about ‘effective’ (rather than geographical) distance – if you can measure it it is very useful.  Model.  Individuals have no immunity when born.  When they’re infected they get severe symptoms.  In subsequent infections they’re more likely to be asymptomatic.  Then include treatment and recrudescence.  Then include artemisinin resistance.  This model looks pretty complex, but cool.  Also mentions another model which includes partner drug resistance.  Fit to data.  But there is lots of missing data – what proportion are resistant to artemisinin or partner drugs?  Any ACT-resistant infections?  Now model different intervention strategies.  Preliminary results.  Model predicts monthly incidence and prevalence pretty well, dropping with treatment.  Can model effects of mass intervention.  Now the joke: Q: What do you get if you cross a mosquito and a mountain climber?  A: nothing – you can’t cross a vector and a scalar.

Chris Drakeley – the human infectious reservoir of malaria.  Says he also has no data (this turns out to be false) and is aiming for worst graphic award.  Mention Ross quote about gametocytes.  What is the infectious reservoir?  Older people may have chronic infections that last longer and is important for elimination consideration.  Okell et al Nature Communications 2012, comparison of PCR-based methods of malaria detecting to microscopic detecting.  In all locations, found submicroscopic infections, in some locations these were the only infections found.  Higher proportion of submicroscopic infections in older age groups.  More sensitive methods => find more parasites, but influenced by method, target gene, blood volume sampled.  These low-density infections can infect mosquitos (though they infect fewer mosquitos)!  (He does have data after all.)   Hypothesise that fewer mosquitos bite children.  How does this change with time and season?  Table of infectious reservoir studies to date, but limited by lack of standardisation across studies.  Now AFIRM studies.  Now data from Burkina Faso.  130 randomly selected individuals from all age groups.  But this is high transmission season, nearly 100% prevalence of merozoites, gametocytes a bit less, and fewer gametocytes in older age groups.  By end of season, microscopically detectable infections have dropped somewhat.   We think there’s a transient buildup of transmission-blocking immunity over the season.  Younger children less infectious in peak of season whereas older children more infective in peak of season – but cautions not to read too much into this.  Which parasites are infecting mosquitos?  For aedes biting is positively correlated with body size and time indoors.  So adults get most of the bites, e.g. in one house, 49 year-old woman had almost all the bites, children had very few.   Summary: submicroscopic infections are around and contribute to reservoir, understanding dynamics of these infections is crucial.

Questioner asks whether anaemia might contribute to change in infectiousness over season.  Speaker says it might do, doesn’t know.

Dominic Kwiatowski – “Nextgen MalariaGEN”.  Talking about history of MalariaGEN.  2004 Accra meeting which planned the GWAS.  Applied to Grand Challenges.  We haven’t acheived the aim yet but we might be on to something – more later.  Got the award funded from Gates Foundation and the Wellcome Trust 50-50.  45 investigators in 21 countries.  At that time had no tools for getting at parasites so we ignored it.  We formed all our projects into four consortial projects.  Say what you wanted to do and get people to come and provide that data, analogy with many companies coming together to build a bridge.  The ‘sovereignty of clinical data rests with original collector’ term helped get buy in (but it has also made it hard to do things without asking everyone for permission.)  MalariaGEN data fellows.  Web tools to collate data.  Taken a long time to actually execute all of that.  Partly because we are very inefficient – but even if not it would have taken a long time.   An early criticism was, would we get standardised phenotypic data?  We’ve shown lots of consistency across sites.  And GWAS story Chris Spencer gave yesterday.  We now think it’s genuinely difficult because one of our starting assumptions was wrong: thought malaria had such a strong effect on the human genome we would find many loci, but in fact that doesn’t translate to associations.  We now hope putting parasite diversity in there is going to reveal these signals.  Now talking about sequencing of parasites.  LookSeq.  2009 advanced course.  TRAC study.  Panoptes.  Unfortunately not all of this data is public, which has made some data release issues difficult.  Now pushing for ‘Pf3k consortium’ which will be completely open dat aon > 3000 Pf samples.  Now Anopheles Gambia 1000G project.  Plasmodium Diversity Network.  What next? We have been generously funded by the Wellcome Trust but this funding is about to run out, and they and we want to know what next.  Will be very exciting over next few years – data scale up, getting deep into the biology.  Ends with audience survey of what shall we do next?

That’s it (and it was good).  Now drinks.  Then dinner (with drinks).  Then human table football (with drinks).  Then some more drinks.  Then I crashed.



GEM 2014 – Day two

June 11, 2014

…I crawl out of bed and get on the bus bleary-eyed. Eat breakfast and we’re into day two (see also days one and three):

Session IV

Alfred Amambua Ngwa – Claims last night’s stout did not affect his ability to prepare this talk.  Gambia is an independent country inside Senegal.  Mutual migration between Gambia and surrounding country.  Highest malaria prevalence in interior parts of country.  Considerable variation over time in frequency of repeats and indels.  Multiplicity of infection decreasing over time.  Shows structure plot showing drastic change in structure by year in South Korean vivax populations.  Using ‘simple sequence repeats’ (SSR) based on sequencing.  He is arguing that micro satellite diversity is higher than for SNPs so it’s a more powerful tool for determination of fine-resolution population structure.  Now signatures of positive selection.  pfmdr1, pfcrt, dhfr, something on chromosome 6, he shows a regional plot of this and some candidate genes.  In Gambia they still have blood samples in the freezer from 1980.  Expectes to see a spatial / geographical spread of selection signals.  Concludes.

Daouda Ndiaye – Burden of malaria (in Senegal).  Mortality / morbidity has dropped substantially in last 10 years due to ACTs, RDT, IRS.  Bar plot showing that highest incidence is in south of Senegal.  Coartem (=arthemether+lumefantrine) study: therapeutic efficacy study.  Also Thies-Slap study of resistance loci.  2 days clearance time in 2011.  But longer times in 2012-13.  Increasing IC50 as well.  (Sorry, I’ve got distracted by this.) Concludes.  In response to question notes that collection starts at beginning of September until end of April – the rainy season.

Zamin Iqbal – Population analysis of surface antigens with diverged haplotypes.  Wants to use a reference-free approach.  De Bruijn graph cartoon.  Graph contains all the read data, not just reads that map, etc. Lessons for P.falciparum crosses: across core genome, diversity is relatively low, and subtelomeres still broadly inaccessible.  But some regions have incredibly high diversity, shows an example of such a region which does not map to the reference.  These come in families, e.g. MSP3 family.  Talking about DBLMSP (also called MSP3.4) and MSP3.8.  Haplotype plot of these genes shows dimorphic behaviour.  Why?  Don’t really know.  Is it actually a dimorphism?  Is it multimorphism?  Ok: apply cortex to worldwide sample set and construct a ‘catalogue’ of variation.  Mixed infections are difficult, so find all the variations that are in non-mixed infections first.  Multiple sequence align, then go back and query all the samples.  Focus on Ghana Gambia Guinea and Cambodia.  See one long haplotype cluster (‘class 1’).  Reichenowi matches falciparum very closely in the other haplotype cluster suggesting balancing selection.  Recombination is happening but prevented from happening in certain regions.  Dimorphism exists in all four populations.  However, we see too much mixture – unbelievable levels, so there must be a paralogue which they are now trying to locate.  Now MSP3.8.  Again a major cluster of long haplotypes, this time reichenowi is more distant.  Here mixture picture is more realistic.  Natural question – what does it bind to?  Gavin Wright’s team showed that both genes bind to Immunoglobulin M!  A bit weird because to explain balancing selection, we need different properties for both genes (I didn’t quite follow this) – but both bind to IgM.  Next steps: track down putative paralogue of MSP3.4.  Understand balancing selection story.

Philip says it’s striking how often dimorphism occurs in relation to merozoite surface proteins, maybe the paralogues are a mechanism of avoiding immune detection.  Zam says maybe but would expect the immune system to have found two genes by now, so maybe there’s something else (other than IgM) that is going on.  Questioner asks about whether a small inversion (like the anopheles ones) might lead to dimorphism at least in MSP3.8.  Zam not sure but thinks not, Jason says MSP3.4 is not an inversion.

Special Extra – Bryan (not sure which one) talking about PlasmoDB.  PlasmoDB goes faster now.  Also has sample metadata.  Lastly integrating analysis tools e.g. Gene Ontology enrichment.  That’s it!

Session V

This was the parasite resistance story session, unfortunately I missed it due to meetings.

Session VI

Ric Price & Sarah Auburn – They are talking about microsatellite-based studies like APMEN.  Samples around South Korea, central China, Solomon Islands, Malaysia, Indonesia.  I missed most of Ric Price’s talk but Sarah Auburn is now talking about challenges of microsatellite data.  Centralising of variant calling, and the vivaxgen platform.  All pops show moderately high diversity.  An upside is that only a few markers should suffice for fingerprinting.  “Polyclonality may provide a better gauge of transmission intensity and relapse.”  Geographic differentiation plot based on F_{ST} of microsatellites.  Suggests SNPs should do better (didn’t Alfred suggest the opposite above?)  Great diversity in vivax than falciparum in Indonesia.  Clonal expansions also detected in Sabah.  Conclusion: SNPs should give us greater depth to look at epidemiology.  Now preliminary analysis of SNP data in 175 patient isolates.  Around 13 SNPs with \text{MAF} > 10\% enough to fingerprint all distinct isolates.  Shows PCA plots with decreasing numbers of SNPs.  Conclusions, genotyping is going to play a key role in epidemiology of P.vivax, and probably not many SNPs needed.

A questioner (me) asks why the conclusion is different from Alfred’s conclusion earlier today (that microsatellites are better for getting at deep structure).  The answer is that SNPs provide many practical benefits in a multi-centre study like this – in terms of cost, uniformity of typing, etc.

 Jonathan J Juliano – Talking about P.vivax chloroquine resistance.  In Pf, genetic crosses using chimps and other primates were used in determination of antimalarial loci.  In vitro tools for Pv cross are lacking.  So novel methods for studying the progeny en masse are needed.  Did ‘linkage group selection analysis’.  Take resistant + sensitive parent, get unselected recombinant progeny, apply selection pressure to get selected progeny, see this.  Analysing microsatellites in pre- and post-treatment samples (treating with Chloroquine).  Plot where upward / downward peaks are selection for resistant / sensitive allele.  Chromosome 1 and 5 show the same direction (upward) in all monkeys.  So did targetted sequencing of these chromosomes.  In 6/11 monkeys done so far, dip in sensitive allele frequency right on pvcrt.  No coding changes in the parental alleles here.  But there is an insertion (16bp repeat) in 5′ UTR and Snps and 39bp insertion in 9th intron.  Looked at 5′ UTR in 88 Cambodian samples, there’s a distribution of these in the population.

Paul Divis – About p.knowlesi (which infects monkeys in Malaysia, but also humans sometimes.)  Malaria has dropped in Malaysia over last 20 years.  10 years ago first report of major knowlesi infections in humans, there have been more since then.  Picture of macaque and vector geographical distribution.  Tree of mtDNA sequences.  They think infections are due to zoonosis, since mtDNA sequencing shows similar patterns in macaques and humans.  Study developed microsatellite marker assays, 10 in total, specific for P.knowlesi, 5-21 alleles per locus.  Tested on >400 human isolates across Malaysia and 44 Macaques.  Macaques have high multiplicity of infection relative to humans.  Allele frequency distributions showing, I think, differences varying with geography – not much difference between Macaques and humans.  Diversity as measured by mean expected heterozygosity is pretty flat, pairwise F_{ST} < 0.1 within Borneo (except in Miri where slightly higher).  Pensisular pops have higher F_{ST} from Borneo.  F_{ST}/\left(1-F_{ST}\right) grows with geographic distance.  (Vectors do not fly across the sea.  And neither do monkeys.)  Something different about population structure in Miri.  Concludes.  Future work looking for genes under natural selection.

Questioner asks the prevalence of infection in Macaques.  Answer is about 100% of monkeys have malaria parasites, about 98% have knowlesi.  There does not seem to be evidence of transmission between humans or from humans to monkeys (but for this to happen infected humans would have stay out in the forest for long periods and they generally don’t, so maybe it’s just that we don’t see it.)

Session VII

Chris Spencer – About genetic susceptibility to severe malaria (SM).  12,000 SM cases and 17,000 population controls from 11 countries.  Talking about this and this.  And this.  HBB, ABO, ATP2B4 previously picked out in our data by region-based test (and are known to be real associations due to other groups).  Now have 11 pops with Illumina HumanOmni 2.5M array data.  The GWAS recipe.  Bayesian approach to meta-analysis, mentioning our poster which I’ll talk about tomorrow.  This approach lets us average over disease model, effect size, similarity of effect between populations and gives a single summary measure of association and the ability to compare models.  Manhattan.  New annotated locus is FREM3 (SMARCA5) which he’ll talk about.  Forest plot for four modes of inheritance and bayesian posterior bar plot for ATP2B4 and ABO, both are dominance effects the same across populations.  Shows two unnamed regions that have a het model and opposite sign of effect in East and West Africa.  Regional association plot in the FREM3 region.   Forest plot in the full CP1 sample set showing discovery and replication evidence at directly Sequenom-typed SNP.  P<1\times 10^{-10}, and something of cline with 10% frequency in Kenya and very low frequency in West Africa.   Now talking about Ellen’s paper about ancient balancing selection.  Alleles that arose before human/chimp split and have been maintained in both lineages since then, suggesting a process maintaining the polymorphism.  ABO is like this.  And so is FREM3 association region!  Want to do something more statistically formal.  1. define association regions, grouped in to tier 1 (top 100 odd) and tier 2 (next 300 odd).  Distance to nearest ABS locus shows peak within about ~50kb away – most explained by ABO and FREM3 but other loci including ERMAP.  Now pathway analysis using GO as well as genes nearest ABS polymorphisms, ABS are right up there, so are I-set / Immunoglobulin-like pathways.  Tier 2 similarly shows support for these pathways.  Similar analysis from Ellen’s paper pulls out similar pathways.  Simulating GWAS SNPs leads to P<1E-3 for this pathway.   Concludes.

Questioner asks how malaria can have led to ancient balancing selection if it probably jumped to humans quite recently.  Answer I think is pleiotropy (e.g. with other infectious agents) or action by other organisms (e.g. other infectious agents).  Are these actually the same answer?

Nicole Soranzo – About hematological scans.  Cardiometabolic quantitative phenotypes e.g. narrowing of arteries, down to measurements of individual molecular components to form deeply phenotyped cohorts.  DNA, RNA, Protein, risk factors, epigenetic modifications, metabolites.  Large scale meta-analysis.  Now hematopoeisis traits.  RBC, HGB, HCT, MCV, MCH, MCHC, PLT, MCV (so like this paper.)  HaemGen-RBC: 135,367 samples, 46 cohorts, 193 authors.  Giving flavour of results.  Have now identified over 150 genetic loci assocaited with myeloid development.  Known variants and new candidates.  Informed guess of functional genes, take them into functional analysis in mice or zebrafish.  ~100 genes with evidence.  Molecular mechanisms – a handful of variants have evidence of this.  Now INTERVAL study (N=50,000) & UK BioBank (N=500,000).  Looking ahead.  Theme 1: Regulatory variation in the hematopoeitic system.  Look at histone methylation and acetylation, enhancer, open chromatin, transcription factor markings.  E.g: example SNP does not lie under any of these peaks but maybe in LD with one that does, a candidate functional SNP.  FAIRE-seq shows enrichment of signals in open chromatin.  Building epigenomic maps for entire blood system – the “Blueprint epigenome“.  Theme 2: about iPS cells, transfect them with transcription factors, drive megakaryotypic cells.  (I don’t really follow this.)  HipSci project. Theme 3 – large-scale genetic discoveries based on sequencing to fill the medium frequency / medium penetrance space not covered by family or GWAS studies.  But seems to find no new associations in space where we are highly powered.  Initial results for hematological traits.  Concludes.

Dominic asks about development batteries of tests for other red cell traits, e.g. efficienty of red cell pumps, or red cell adherence.  Answer is, these complex things can not be done very simply, could call participants back to the clinical for tests, or could do cell based assays.  But beginning to be quite good about what genes predict these phenotypes, can probably make a good prediction helping to target assays.

Manoj Duraisingh – About RBC determinants of plasmodium invasion.  Picture of merozoite invading an rbc.  Tight junction and apical re-orientation denoted.  Diagram of known receptor / ligand pairs in falciparum, vivax and knowlesi (Is this complete?  I always wonder why things like SEMA7A/MTRAP are not on this diagram).  Both parasite and human side of this interaction are polymorphic.  Lots of exciting big data is being generated!  Want to know functionality.  In vitro genetics in enucleated red blood cell.  Mature erythrocytes from CD34+ hematopoietic stem cells.  Takes 18 days and surfaces look very similar to real RBCs, and parasites can get in and grow and re-invade.  Extreme example: crocodile icefish.  What makes us human?  Slide about CMAH which has been lost a few times in evolution (this slide has Arnie on it for some reason.  Has he lost CMAH?)  Julian Rayner’s lab looked at binding Neu5Gc treated macaque RBCs.  Knowlesi prefers these, falciparum can’t invade them.  Tries chimp-ifying the human rbcs.  Not much effect on falciparum, made knowelsi much better at invasion.  Pkbeta dn gamma bind to sialic acid on rhesus but not human RBCs.  So CMAH loss restricts P.knowlesi invasion into human RBCs.  These variations date to ~2-2.5mya.  But knowlesi has an intrinsic ability to adapt to culture in normal human RBCs.  Adapted lines lose CMAH dependence.  Argues that loss of some ligands results in the use of others, and speciation.  Going back to falciparum receptors, many are blood groups.  Strain-dependent use of glycophorin A (Bei et al JID 2010)  And Basigen.  Interrogating the RBCome.  Forward genetic screening for erythrocyte determinants of erythrocyte growth.  Identifying host determinants of malaria infection.  Find blood groups (c.f. Ruth Sanger ‘Blood groups in Man’).  Genes required for falciparum to invade.  CD44 (Indian blood group), CD55 (Cromer blood group), knockdowns lead to reduction in invasion.  Soluble CD55 can inhibit P.falciparum invasion.  Concludes.

George Busby – Talking about haplotype diversity in African populations.  Aim is to understand it with aim to find regions with outlying patterns of ancestry – that may well be to do with malaria.  Data on 8 African populations from GWAS that Chris Spencer talked about, subsampled in ethnic groups.  Aim is to find how malaria has affected the genome.  Principal components analysis of Africa.  First PCs pull apart ethnolinguistic groupings and geographical differences.  Description of chromosome painting, giving genome-wide copying vector.  Pictures of members of our group with (made-up I hope) copying vectors.   Plot of FineSTRUCTURE matrix across all samples.  Some red on the off-diagonal => some copying across all groups.  Closeup to show that FineSTRUCTURE clusters populations (almost) perfectly according to self-reported ethnicity, due to subtle differences in copying vectors.  Now model populations as mixtures of others, helps to clean up the copying vectors.  Again most stuff is on the diagonal but there is some on the off-diagonal, e.g. Malawi copies from southern African populations.  Examples of cleaning up copying vectors.  Now talking about Fula, who in this analysis show some European coancestry.  HAPMIX plot with peaks of most European-like and most African-like ancestry.  Lactase region seems to show an excess of  European ancestry.  Haplotype plot + haplotype homozygosity.  Closeup of HAPMIX analysis showing painting.  Region with most African ancestry is DARC (Duffy antigen).  Concludes.

Questioner asks about timeframe – can we detect very recent selection?  Answer is that there must have been selection since these European-like haplotypes entered Africa – so if we can date that we could find out.

That’s it for the day.  Now drinks.  And dinner.  (With drinks).  And then some drinks.  Then I miss the bus home again so we get a taxi.



GEM 2014 – Day one

June 8, 2014

I’m here at the Genetic Epidemiology of Malaria 2014 meeting.  We’ve had lunch, we’ve blethered, it’s time to start.  (See also days two and three).

Session I

Bronwyn introduces Dyann Wirth, who says it’s 12 years since the (P.falciparum) genome was published.  Audience is mostly composed of people who can look at any gene they want to online!  We are in the era of big data.  Also transition of non-biologists into the subject.  Last time that happened was 1953 – when DNA was discovered by Watson and Crick (working in the Physics dept.).  Audience survey of what stuff we know now that we didn’t before.  And what will we have soon?  Asks audience, who duly answer (but not all that well.  Hey, we’ve only just kicked off).

Now the talks:

Dan Hartl – Genetic signs on the road to malaria elimination.  When he started out there was lots of theoretical population genetics but not enough data to actually get at it.  That has changed at an astonishing rate.  Can popgun signals serve to gauge and guide progress in malaria elimination?  Talking about malaria in Senegal.  Enhanced intervention in 2007-8, number of cases fell substantially.  Would expect bottlenecks, increased drift, increased allele sharing, regions of IBD, inbreeding.  Shows  a cartoon of this.  Are these signals strong enough to detect and can we scale up detection across a population?  Data on 190 parasites from patient blood from from Pikine, Thies and Velingara.  Good quality (46x depth across 20million sites).  In this data PCA shows little evidence of large-scale population structure.  Develop 24-SNP strain barcode and look at frequencies through time.  Estimate Ne between years, 402 in 2006-7, but much smaller afterwards.  Barcodes show much higher allele sharing after intervention.  Now sequencing data, looking at 100kb windows, lots of IBD blocks shared between strains.  Shows distribution of length of shared blocks which he says is fit well by exponential (to me it looks like there’s a fat tail.)  Nice plot of frequency of barcodes across multiple seasons.  They are attempting to model this.  Hope is that these signals can be interpreted well enough to lead to policy options.

Dominic asks whether when there’s high inbreeding then that is more conducive to emergence of drug resistance.  Secondly can you see emergence of drug resistance with this type of data?  Answer is yes, the prediction is that when population is inbred a resistance allele can very quickly take the whole genome with it.  But smaller effective population size means resistance alleles are less likely to arise, because of random loss of potentially favourable resistance mutations.

Ken Vernick – About Anopheles.  He’s an organiser but thanks Bronwyn for getting him to talk.  Talking about an overview of issues and problems that influence question of how far are we to vector control?  Mosquito is 80,000 times more deadly than the shark –  the world’s deadliest animal.  (Well – it’s malaria that kills, but let’s not split hairs.)  Malaria is durable partly because of the widespread vectorial system – multiple Anopheles species (inc. Gambiae and Funestes), lots of substructure, chromosomal (Bamako, Mopti, Savanna…) and molecular (M, S, Goundry forms) ecotypes, which like different habitats.  We will know much more in a year’s time because of Ag1000G project.  Frequent founder events with adaptive introgression.  Habitats also changing; nice plots of rainfall across central West Africa, and African lake levels looking back to 20,000 years ago with high humidity ~5-10,000 years ago.  Some founder populations are old, most are probably ephemeral.  Plots from Lee et al PNAS 2013.  This all sounds pretty complex.  Sampling methods are also inadequate and highly biased toward indoor-resting mosquitos (you “spray-bomb” a house and pick the mosquitos off a sheet.)  Outdoor resting mosquitos difficult to sample and little studied.  The gold standard would be capture of human-landing mosquitos – obviously difficult.  Other options are door / light / resting traps. Or sampling larvae.  But in West Africa there is not much water around => not much choice in where to lay eggs.  So larval collection should sample all genetic subgroups.  This is good, but disadvantage is that it lacks direct epidemiological information – do they bite?  Also misses any sense of assortative mating etc.  Shows structure plot with k=2 with indoor samples all of one group (the Goundry group), but larval samples of both groups.  More than half of S form mosquitos don’t go inside!

Also there are high standing levels of genetic resistance to P.falciparum – e.g. Pfin loci.  Highly penetrant, explain infectibility very well.  However, this genetic resistance is not partitioned A.gambiae subgroups.  Now consequences.  Substructure renders mating-dependent control strategies (e.g. spread of transgenic parasites) unrealistic.  And there is ‘a diffuse and interconnected gene pool that confers exceptional long-term stability under harsh and fluctuating conditions to populations that consequently may prove difficult to manipulate or eradicate‘.  There is horizontal transfer, as in bacteria.  Sampling is biased which is problematic. And transgenic approaches (even if they spread) are unlikely to be better than natural population resistance to plasmodium.

Questioner suggests that since subgroups do transfer genetic material, that might help spread of transgenetic mosquitos.  Speaker acknowledges this but is still sceptical – ‘you’re talking about a lightning strike, and lightning strike is not the same as sound policy‘ .

Arjen M Dondorp – About artemisinin resistance.  Pleads that we engage more with policy makers.  Doom scenario that happened with chloroquine resistance (which spread across the world within about ten years.)  We were saved by the Chinese who discovered artemisinins.  But parasite clearance times are now getting worse in West Cambodia even for combination therapies (data from Thai-Myanmar border and Cambodia).  It is becoming untreatable.  Now talking about “Global Plan or Artemisinin Resistance Containment” – GPARC.  Says urgency is not yet there because these are low-transmission regions so mortality is not yet increasing hugely.  TRAC study finds slow-clearing parasites as far away as India.  Now genetics.  Ariey et al found mutations in the kelch gene ‘propeller’ domain lead to slow clearance.  Of 24 mutations in K13-propeller, almost all lead to slow parasite half-life.  Clear geographical distribution of these mutations – in fact you find these mutations even in African populations, although they do not seem to have slow clearance phenotype.  Now we have to include K13 in our working definition of artemisinin resistance.  This changes the picture – all ‘tier 1’ resistance regions have K13 mutations, but there appear to be K13 mutations in non-resistance regions of Thailand + Cambodia.  Is it spreading or popping up?  Kelch mutation strains in Myanmar are clear diverged from the Cambodian K13 strains.  This is important because spread can perhaps be controlled more easily.  Maybe there is a genetic ‘backbone’ of permissive  or compensatory background mutations.  So we need containment approach as well as good quality drugs in areas where resistance is not yet present.  To contain, we must eliminate the parasite in resistant regions (otherwise the few remaining parasites will be the most resistant).  And there is a parasite reservoir in asymptomatic infected individuals, so you must treat everyone.

Mechanism of resistance is in ring stage parasites that are normally killed by artemisinin.  Leads to deceleration in ring stage development, suggesting that longer application of artesunate will still be effective – and it is.  Concludes.

Session II – short talks by travel awards recipients.  I didn’t try to report these.

Session III – Genomes and gneome variation

Matt Berriman – Talking about the plasmodium genome.  Different genomes.  P.falciparum arose probably from Gorilla parasites.  P.reichenowi is the closest known parasite to falciparum.  Set of genes with highest \text {polymorphisms} / \text{diversity} is enriched for genes involved in invasion.  Diagram of Rh2a / Rh2b / Rh6  / Rh7 locus between three plasmodium species.  falciparum has lost Rh6 and Rh7 and gained Rh2a relative to reichenowi.  (Interesting because Rh2a/b is thought to be involved in invasion.  I think a receptor is unknown.)    Suggests there is a change in expression levels of, particularly pir genes, in mosquito-transmitted versus in vitro parasites in P.chabaudi.  Parasites seem to be substructures in terms of type of pir genes – mainly short form or long form.   Plasmodium gallinaceum (chicken malaria) is an outgroup to laverania (and looks different from other malarias).  But characteristic var gene structure seems confined to laverania.

Will Hamilton – begins by talking about mutation in fairly general terms.  Then talks about his clone tree, mentioning Richard Lenski‘s E.coli mutation experiment which has run since 1970 and tens of thousands of generations.  Three clone trees of lab strains, also two artemisinin-resistant field isolates from Cambodia.  Over 1,329 days in culture overall and 287 whole genome sequences – lots of data.  SAMtools/bcftools for SNP calling, Delly for structural variants, GATK for indels.  Total of 89 SNPs, point mutation rate pretty similar across strains – no evidence of higher mutation rate in resistant strains.  G/C -> A/T transitions are over-represented in this data, giving Ts:Tv=1.42.  This would reach equilibrium at about 19% GC content which is close to measurements.  Notes methylation, UV exposure, oxidation and bias in DNA repair can lead to G/C -> A/T mutations.  But looked and saw no evidence of the first two.  Now indels, they occur at a high rate in P.falciparum.  Shows an insertion in LookSeq.  Indels seem to occur at higher rate in resistant strains (I am not sure how significant this is).  Indels are caused by DNA polymerase slippage; consistent with this called indels are in highly repetitive A/T rich sequences.  DNA repair mechanism may also be involved.  Now structural variation, which occurs almost entirely in var genes including var genes in both the telomeres and subtelomeric regions.  Non-allelic recombination is generating chimeric var genes.  Conclusions: in a ‘clonal’ infection with 10^{10} parasites, after 48 hours there will be ~100 million base pair substitutions, 500 million indels, ~20 million recombinant var genes exon 1s.  E.g. C580Y ketch mutation happens 5 times in 48hrs in any such infection.

A questioner points out that the resistant parasites have arisen on a KH1 background, so it’s those that are important (and they did show higher mutation rates).

Elizabeth Winzeler – sequencing of P.vivax multiclone infections.  Primaquine is not a great drug, it kills mice, people don’t generally want to take it.  Blood samples from infected man in 2009 and 2011 representing relapse of same parasite.  Relapses appear to be a of a single strain from original mixed infection.  But clones in initial infection were probably meiotic siblings.  Now talking about samples from a gold mining area in Peru which looks like this.  Chloroquine and Primaquine are in use.  Use paired-end and mate pairs to identify individual clones from a multi-clonal infection.  This involves sequencing ~15kb pieces of DNA circularised with tags on. Shows signals of selection in expected genes.  With deep sequencing they could identify all four individual clones from initial infection.  Recombination rates seem similar to those in in vitro culture.  Strains seem to have diverged ~6.7 years or 817 generations ago.  Conclusions: vivax popgen may be more complex than thought, meiotic and mitotic recombination is occurring.  Will this be seen in P.falciparum infections?

Alistair Miles – About Anopheles Gambiae.  Jim Stalker’s plot of base pairs generated over time – the last twelve months generated twice as much data as all of the time before that.  Now have 3995 whole mosquito genomes of about 30X, most from the Ag1000G project, also lab cross project and other collaborations.  Martin Donnelly and Taigo will talk about Ag1000G on Tuesday.  This is a funny talk with some personal anecdotes, like the one about Alistair’s dad who also worked on Anopheles, and it also goes into the history of understanding of Anopheles speciation.  There is speciation but, it turns out, also hybridisation and gene flow between ‘species’ and subspecies.  Talking about the large inversions that occur in anopheles.  These inversions prevent recombination between mutually inverted haplotypes.  Genome size – about 250Mb.  And anopheles has substantial genetic diversity.  There may be one SNP every 3-5 bases!  Now showing panoptes for anopheles data, which is newly released on the MalariaGEN website.

Dinner!  With wine!

Then speed dating!  With beer!

Then the bus goes home without us.  So we drink more and get a taxi.




Multinomial logistic regression in SNPTEST

May 30, 2014

I’ve been implementing multinomial logistic regression in SNPTEST (extending the existing binary logistic regression), testing it by comparison with multinom() from the nnet package in R.  (So far I’ve done some simple simulations: sample some effect sizes, then use  a set of simulated genotypes to generate a phenotype for 1000 individuals.  Then I try to estimate effect sizes again using SNPTEST and nnet.  So far I’ve done a just few SNPs, for between 2 and 10 outcomes.)

The results seem ok: here are two plots, the left shows the SNPTEST-estimated effect size and confidence interval versus the true effect size (red confidence intervals are those that don’t contain the true effect), while the right shows SNPTEST’s estimates versus those from nnet:


i.e. it does what it should.  One thing that isn’t represented in these comparisons is that SNPTEST’s model should handle imputation uncertainty.

There’s a well-known problem with Newton-Raphson iterations for multinomial regression – computation of the second derivative is like O(M^2) if there are M possible outcomes so it gets sloooow as the number of outcomes increases.  My code suffers from that problem at the moment – there are ways round it that can be explored, though.

Unconscious bias

May 22, 2014

Just been to an HR presentation on unconscious bias – turns out I’ve an unconscious bias against HR presentations.  (Allright, a conscious one.)  Let’s tick the stereotypes:

  • Bullet-pointed powerpoint slides.  Tick!
  • Lack of data.  Tick!  (Well, there was one table of data, namely, the data on intelligence from this page, which apparently shows that Brummies and Liverpudlians are perceived as lacking intelligence.  This was shown for about 5 seconds without further discussion.)
  • Come to think of it, there’s that joke about the Mersey Tunnel but, well, never mind.  Tick!
  • Solicitation of audience discussion.  Tick!
  • …but failure to get much response…   Tick!
  • ..probably because of the insistence of presenting meaningless abstractions…  Tick!
  • …while avoiding discussion of any issues that might be emotive or, well, interesting.  Tick!

Got stereotypes?  Challenge them.

Allright, this course encouraged me to challenge stereotypes, so here are some suggestions for you, the HR department. (You do read my blog don’t you?).  Make some provocative statements!  Base them on real data!  95% of all clerical staff in the building are female!  But 80% of group heads are male!  (I made those numbers up.  What are the real numbers?)  Does unconscious bias contribute to those numbers?   Does conscious bias?  Can we do anything about that?  Should we?  Should you?  Reference the debate!  What do we think of publication bias?  Is the peer-review system even more biased?  Is Oxford University biased against black students?  Are Liverpudlians under-represented in academia?  Does that even make sense?  Make us unsure!  Make us wonder!  Make us laugh!  Make us cry!  Make us unbiased!

Kyoto feet

April 7, 2014

Kyoto feet 5
Kyoto feet 4
Kyoto feet 1
Kyoto feet 2
Kyoto feet 3

Mount Fuji taken from the train

April 7, 2014

Fuji from the train 5
Fuji from the train 2
Fuji from the train 4
Fuji from the train 2