Computational analysis reveals that many repetitive sequences are shared between proteins and are similar across species, from bacteria to humans

About 70% of human proteins contain at least one sequence consisting of a single amino acid that is repeated multiple times, sprinkled with a few other amino acids. These “regions of low complexity” are also found in most other organisms.

Proteins containing these sequences have many different functions, but biologists at MIT have now found a way to identify and study them as a unified set. Their technology allows them to analyze the similarities and differences between CRLs of different species, and helps them determine the functions of these sequences and the proteins in which they are located.

Using their method, the researchers analyzed all of the proteins found in eight different species, from bacteria to humans. They found that although CRLs can differ between proteins and species, they often share a similar role: to help the protein they join join a large-scale assembly such as the nucleus, an organelle found in nearly all human cells.

“Instead of looking at specific CRLs and their functions, which may appear distinct because they are involved in different processes, our broader approach allows us to see similarities between their properties, suggesting that CRL functions may not be so dissimilar after all,” says Byron Lee, student studies Graduated from the Massachusetts Institute of Technology.

The researchers also found differences between LCRs of different species and showed that the LCR sequences of these species correspond to species-specific functions, such as the formation of plant cell walls.

Lee and graduate student Nima Jaberi Lashkari are the lead authors of the study that appears today in eLife. Eliezer Calo, assistant professor of biology at MIT, is the paper’s lead author.

Extensive study

Previous research revealed that CRLs are involved in a variety of cellular processes, including cell adhesion and DNA binding. CRLs are often rich in a single amino acid such as alanine, lysine, or glutamic acid.

Finding these sequences and then studying their functions individually is a time-consuming process, so the MIT team decided to use bioinformatics — a method that uses computational methods to analyze large sets of biological data — to evaluate it as a larger group.

“What we wanted to do was take a step back and instead of looking at the decertification lists individually, try to look at all of them and see if we could notice some larger patterns that might help us understand what the people who have been assigned are doing, and also help us learn more about What those who do not have certain duties do,” Jabri tells Shakari.

To do this, the researchers used a technology called matrix dotplot, a method of visually representing sequences of amino acids, to generate images of each protein studied. Then they used computer image processing methods to compare thousands of these matrices at the same time.

Using this technique, the researchers were able to classify CSF based on the amino acids most frequently found in CSF. They also grouped LCR-containing proteins based on the number of copies of each type of LCR present in the protein. Analysis of these attributes helped researchers learn more about the functionality of certification revocation lists.

As an illustration, the researchers selected a human protein, known as RPA43, that contains three CRLs rich in lysine. This protein is one of several subunits that make up an enzyme called RNA polymerase 1, which makes ribosomal RNA. The researchers found that the copy number of lysine-rich cerebrospinal fluid is important in helping the protein integrate into the nucleus, the organelle responsible for ribosome synthesis.

biological communities

In a comparison of proteins found in eight different species, the researchers found that certain types of LCRs are highly conserved across species, meaning that the sequences have changed very little over the course of evolutionary periods. These sequences tend to be found in highly conserved proteins and cellular structures, such as the nucleus.

“It appears that these sequences are important for the assembly of certain parts of the nucleus,” Lee explains. “Some principles known to be important for high-order assembly seem to work because the copy number, which can control the number of interactions a protein can perform, is important for the protein to be integrated into this compartment.”

The researchers also found differences between the observed CRLs in two different types of proteins involved in nucleolar assembly. They found that a nucleoprotein known as TCOF contains many glutamine-rich CRLs that can help support aggregation formation, while nucleoproteins containing only a few glutamine-rich CRLs can be recruited as clients (scaffold-interacting proteins).

Another structure that appears to contain many conserved CRLs is the nuclear macula located within the cell nucleus. The researchers also found many similarities between CRLs that are involved in forming large-scale assemblies such as the extracellular matrix, a network of molecules that provide structural support to cells in plants and animals.

The research team also found examples of structures with CRLs that appear to have diverged between species. For example, plants contain distinct LCR sequences in the proteins they use to cement their cell walls, and these LCRs are not seen in other types of organisms.

The researchers now plan to extend their LCR analysis to other species.

“There is a lot to explore, because we can extend this map to almost any species,” Lee says. “This gives us the opportunity and framework to identify new biological populations.”

The research was funded by the National Institute of General Medical Sciences, the National Cancer Institute, the Ludwig Center at MIT, a pre-doctoral training grant from the National Institutes of Health, and the Pew Charitable Trusts.

See also  Thomas Pesquet photographed the Pilate dune from space

Leave a Reply

Your email address will not be published. Required fields are marked *