Machine learning tools help unravel why human accelerated regions evolved so quickly
Humans and chimpanzees differ in only one percent of their DNA. Human accelerated regions (HARs) are parts of the genome with an unexpected amount of these differences. HARs were stable in mammals for millennia but quickly changed in early humans. Scientists have long wondered why these bits of DNA changed so much, and how the variations set humans apart from other primates.
Now, researchers at Gladstone Institutes have analyzed thousands of human and chimpanzee HARs and discovered that many of the changes that accumulated during human evolution had opposing effects from each other.
This helps answer a longstanding question about why HARs evolved so quickly after being frozen for millions of years. An initial variation in a HAR might have turned up its activity too much, and then it needed to be turned down."
Katie Pollard, PhD, director of the Gladstone Institute of Data Science and Biotechnology and lead author of the new study
The findings, she says, have implications for understanding human evolution. In addition-;because she and her team discovered that many HARs play roles in brain development-;the study suggests that variations in human HARs could predispose people to psychiatric disease.
"These results required cutting-edge machine learning tools to integrate dozens of novel datasets generated by our team, providing a new lens to examine the evolution of HAR variants," says Sean Whalen, PhD, first author of the study and senior staff research scientist in Pollard's lab.
Enabled by machine learning
Pollard discovered HARs in 2006 when comparing the human and chimpanzee genomes. While these stretches of DNA are nearly identical among all humans, they differ between humans and other mammals. Pollard's lab went on to show that the vast majority of HARs are not genes, but enhancers-; regulatory regions of the genome that control the activity of genes.
More recently, Pollard's group wanted to study how human HARs differ from chimpanzee HARs in their enhancer function. In the past, this would have required testing HARs one at a time in mice, using a system that stains tissues when a HAR is active.
Instead, Whalen input hundreds of known human brain enhancers, and hundreds of other non-enhancer sequences, into a computer program so that it could identify patterns that predicted whether any given stretch of DNA was an enhancer. Then he used the model to predict that a third of HARs control brain development.
"Basically, the computer was able to learn the signatures of brain enhancers," says Whalen.
Knowing that each HAR has multiple differences between humans and chimpanzees, Pollard and her team questioned how individual variants in a HAR impacted its enhancer strength. For instance, if eight nucleotides of DNA differed between a chimpanzee and human HAR, did all eight have the same effect, either making the enhancer stronger or weaker?
"We've wondered for a long time if all the variants in HARs were required for it to function differently in humans, or if some changes were just hitchhiking along for the ride with more important ones," says Pollard, who is also chief of the division of bioinformatics in the Department of Epidemiology and Biostatistics at UC San Francisco (UCSF), as well as a Chan Zuckerberg Biohub investigator.
To test this, Whalen applied a second machine learning model, which was originally designed to determine if DNA differences from person to person affect enhancer activity. The computer predicted that 43 percent of HARs contain two or more variants with large opposing effects: some variants in a given HAR made it a stronger enhancer, while other changes made the HAR a weaker enhancer.
This result surprised the team, who had expected that all changes would push the enhancer in the same direction, or that some "hitchhiker" changes would have no impact on the enhancer at all.
Measuring HAR strength
To validate this compelling prediction, Pollard collaborated with the laboratories of Nadav Ahituv, PhD, and Alex Pollen, PhD, at UCSF. The researchers fused each HAR to a small DNA barcode. Each time a HAR was active, enhancing the expression of a gene, the barcode was transcribed into a piece of RNA. Then, the researchers used RNA sequencing technology to analyze how much of that barcode was present in any cell-;indicating how active the HAR had been in that cell.
"This method is much more quantitative because we have exact barcode counts instead of microscopy images," says Ahituv. "It's also much higher throughput; we can look at hundreds of HARs in a single experiment."
When the group carried out their lab experiments on over 700 HARs in precursors to human and chimpanzee brain cells, the data mimicked what the machine learning algorithms had predicted.
"We might not have discovered human HAR variants with opposing effects at all if the machine learning model hadn't produced these startling predictions," said Pollard.
Implications for understanding psychiatric disease
The idea that HAR variants played tug-of-war over enhancer levels fits in well with a theory that has already been proposed about human evolution: that the advanced cognition in our species is also what has given us psychiatric diseases.
"What this kind of pattern indicates is something called compensatory evolution," says Pollard. "A large change was made in an enhancer, but maybe it was too much and led to harmful side effects, so the change was tuned back down over time-;that's why we see opposing effects."
If initial changes to HARs led to increased cognition, perhaps subsequent compensatory changes helped tune back down the risk of psychiatric diseases, Pollard speculates. Her data, she adds, can't directly prove or disprove that idea. But in the future, a better understanding of how HARs contribute to psychiatric disease could not only shed light on evolution, but on new treatments for these diseases.
"We can never wind the clock back and know exactly what happened in evolution," says Pollard. "But we can use all these scientific techniques to simulate what might have happened and identify which DNA changes are most likely to explain unique aspects of the human brain, including its propensity for psychiatric disease."
Whalen, S., et al. (2023) Machine learning dissection of Human Accelerated Regions in primate neurodevelopment. Neuron. doi.org/10.1016/j.neuron.2022.12.026.
Posted in: Molecular & Structural Biology | Genomics
Tags: Bioinformatics, Biotechnology, Brain, Cell, Chimpanzee, DNA, Epidemiology, Evolution, Gene, Genes, Genome, Life science, Machine Learning, Microscopy, Neuron, Nucleotides, Pollen, Research, RNA, RNA Sequencing, Technology
Source: Read Full Article