Genes on steroids? – Science-Education-Research

The high density of science communication

Keith S. Taber

Original photograph by Sabine Mondestin, double helix representation by OpenClipart-Vectors, from Pixabay

One of the recurring themes in this blog is the way science is communicated in teaching and through media, and in particular the role of language choices, in effective communication.

I was listening to a podcast of the BBC Science in Action programme episode 'Radioactive Red Forest'. The item that especially attracted my attention (no, not the one about teaching fish to do sums) was summarised on the website as:

"Understanding the human genome has reached a new milestone, with a new analysis that digs deep into areas previously dismissed as 'junk DNA' but which may actually play a key role in diseases such as cancer and a range of developmental conditions. Karen Miga from the University of California, Santa Cruz is one of the leaders of the collaboration behind the new findings."
Website description of an item on 'Science in Action'

BBC 'Science in Action' episode first broadcast 3rd April 2022

They've really sequenced the human genome this time

The introductory part of this item is transcribed below.

Being 'once a science teacher, always a science teacher' (in mentality, at least), I reflected on how this dialogue is communicating important ideas to listeners. Before I comment in any detail, you may (and this is entirely optional, of course) wish to read through and consider:

What does a listener (reader) need to know to understand the intended meanings in this text?
What 'tactics', such as the use of figures of speech, do the speakers use to support the communication process?

Roland Pease (Presenter): "Good news! They sequenced, fully sequenced, the human genome.
'Hang on a minute' you cry, you told us that in 2000, and 2003, and didn't I hear something similar in 2013?' Well, yes, yes, and yes, but no.
A single chromosome stretched out like a thread of DNA could be 6 or 8 cm long. Crammed with three hundred million [300 000 000] genetic letters. But to fit one inside a human cell, alongside forty five [45] others for the complete set, they each have to be wound up into extremely tight balls. And some of the resulting knots it turns out are pretty hard to untangle in the lab. and the genetic patterns there are often hard to decode as well. Which is what collaboration co-leader Karen Miga had to explain to me, when I also said 'hang on a minute'."
Karen Miga: "The celebrated release of the finished genome back in 2003 was really focused on the portions that we could at the time map and assemble. But there were big persistent gaps. Roughly about two hundred million [200 000 000] bases long that were missing. It was roughly eight percent [8%] of the genome was missing."
"And these were sort of hard to get at bits of genome, I mean are they like trying to find a coin in the bottom of your pocket that you can't quite pull out?"
"These regions are quite special, we think about tandem repeats or pieces of sequences that are found in a head-to-tail orientation in the genome, these are corners of our genome where this is just on steroids, where we see a tremendous amount of tandem repeats sometimes extending for ten million [10 000 000] bases. They are just hard to sequence, and they are hard to put together correctly and that was – that was the wall that the original human genome project faced."
Introduction to the item on the sequencing of what was known as 'junk DNA' in the (a) human genome

I have sketched out a kind of 'concept map' of this short extract of dialogue:

A mapping of the explicit connections in the extract of dialogue (*ignoring* connections and synonyms that a knowledgeable listener would have available for making sense of the talk)

Read about concept maps

Prerequisite knowledge

In educational settings, teachers' presentations are informed by background information about the students' current levels of knowledge. In particular, teachers need to be aware of the 'prerequisite knowledge' necessary for understanding their presentations. If you want to be understood, then it is important your listeners have available the ideas you will be relying on in your account.

A scientist speaking to the public, or a journalist with a public audience, will be disadvantaged in two ways compared to the situation of a teacher. The teacher usually knows about the class, and the class is usually not as diverse as a public audience. There might be a considerable diversity of knowledge and understanding among the members of, say, the 13-14 year old learners in one school class, or the first year undergraduates on a university course – but how much more variety is found in the readership of a popular science magazine or the audience of a television documentary or radio broadcast.

Here are some key concepts referenced in the brief extract above:

bases
cells
chromosome
DNA
sequencing
tandem repeats
the human genome

To follow the narrative, one needs to appreciate relationships among these concepts (perhaps at least that chromosomes are found in cells; and comprised of DNA, the structure of which includes a number of different components called bases, the ordering of which can be sequenced to characterise the particular 'version' of DNA that comprises a genome. ¹ ) Not all of these ideas are made absolutely explicit in the extract.

The notion of tandem repeats requires somewhat more in-depth knowledge, and so perhaps the alternative offered – tandem repeats or pieces of sequences that are found in a head to tail orientation in the genome – is intended to introduce this concept for those who are not familiar with the topic in this depth.

The complete set?

The reference to "A single chromosome stretched out…could be 6 or 8 cm long…to fit one inside a human cell, alongside forty five others for the complete set…" seems to assume that the listener will already know, or will readily appreciate, that in humans the genetic material is organised into 46 chromosomes (i.e., 23 pairs).

Arguably, someone who did not know this might infer it from the presentation itself. Perhaps they would. The core of the story was about how previous versions of 'the' [see note ¹] human genome were not complete, and how new research offered a more complete version. The more background a listener had regarding the various concepts used in the item, the easier it would be to follow the story. The more unfamiliar ideas that have to be coordinated, the greater the load on working memory, and the more likely the point of the item would be missed.²

Getting in a tangle

A very common feature of human language is its figurative content. Much of our thinking is based on metaphor, and our language tends to be full [sic ³] of comparisons as metaphors, similes, analogies and so forth.

Read about scientific metaphors

Read about scientific similes

So, we can imagine 'a single chromosome' (something that is abstract and outside of most people's direct experience) as being like something more familiar: 'like a thread'. We can visualise, and perhaps have experience of, threads being 'wound up into extremely tight balls'. Whether DNA strands in chromosomes are 'wound up into extremely tight balls' or are just somewhat similar to thread wound up into extremely tight balls is perhaps a moot point: but this is an effective image.

And it leads to the idea of knots that might be pretty hard to untangle. We have experience of knots in thread (or laces, etc.) that are difficult to untangle, and it is suggested that in sequencing the genome such 'knots' need to be untangled in the laboratory. The listener may well be visualising the job of untangling the knotted thread of DNA – and quite possibly imaging this is a realistic representation rather than a kind of visual analogy.

Indeed, the reference to "some of the resulting knots it turns out are pretty hard to untangle in the lab. and the genetic patterns there are often hard to decode as well" might seem to suggest that this is not an analogy, but two stages of a laboratory process – where the DNA has to be physically untangled by the scientists before it can be sequenced, but that even then there is some additional challenge in reading the parts of the 'thread' that have been 'knotted'.

Reading the code

In the midst of this account of the knotted nature of the chromosome, there is a complementary metaphor. The single chromosome is "crammed with three hundred million genetic letters". The 'letters' relate to the code which is 'written' into the DNA and which need to be decoded. An informed listener would know that the 'letters' are the bases (often indeed represented by the letters A, C, G and T), but again it seems to be assumed this does not need to be 'spelt out'. [Sorry.]

But, of course, the genetic code is not really a code at all. At least, not in the original meaning of a means of keeping a message secret. The order of bases in the chromosome can be understood as 'coding' for the amino acid sequences in different proteins but strictly the 'code' is, or at least was originally, another metaphor. ⁴

Hitting the wall

The new research had progressed beyond the earlier attempts to sequence the human genome because that project had 'faced a wall' – a metaphorical wall, of course. This was the difficulty of sequencing regions of the genome that, the listener is told, were quite special.

The presenter suggests that the difficulty of sequencing these special regions of "hard to get at bits of genome" was akin to" trying to find a coin in the bottom of your pocket that you can't quite pull out". This is presumably assumed to be a common experienced shared by, or at least readily visualised by, the audience allowing them to better appreciate just how "hard to get at" these regions of the genome are.

We might pause to reflect on whether a genome can actually have regions. The term region seems to have originally been applied to a geographical place, such as part of a state. So, the idea that a genome has regions was presumably first used metaphorically, but this seems such a 'natural' transfer, that the 'mapping' seems self-evident. If it was a live metaphor, it is a dead one now.

Similarly, the mapping of the sequences of fragments of chromosomes onto the 'map' of the genome seems such a natural use of the term may no longer seem to qualify as a metaphor.

Repeats on steroids

These special regions are those referred to above as having tandem repeats – so parts of a chromosome where particular base sequences repeat (sometimes a great many times). This is described as "pieces of sequences that are found in a head to tail orientation" – applying an analogy with an organism which is understood to have a body plan that has distinct anterior and posterior 'ends'.

Not only does the genome contain such repeats, but in some places there are a 'tremendous' number of these repeats occurring head to tail. These places are referred to as 'corners' of the genome (a metaphor that might seem to fit better with the place in a pocket where a coin might to be hard to dislodge – or perhaps associated with that wall), than with a structure said to be like a knotted, wound-up, ball of thread.

It is suggested that in these regions, the repeats of the same short base sequences can be so extensive that they continue for billions of bases. This is expressed through the simile that the tandem repeating "is just on steroids" – again an allusion to what is assumed to be a familiar everyday phenomenon, that is, something familiar enough to people listening to help then appreciate the technical account.

Many people in the audience will have experience of being on steroids as steroids are prescribed for a wide variety of inflammatory conditions – both acute (due to accidents or infections) and chronic (e.g., asthma). Yet these are corticosteroids and 'dampen down' (metaphorically, of course) inflammation. The reference here is to anabolic steroid use, or rather abuse, by some people attempting to quickly build up muscle mass. Although anabolic steroids do have clinical use, abusers may take doses orders of magnitude higher than those prescribed for medical conditions.

I suspect that whereas many people have personal experience or experience of close family being on corticosteroids, whereas anabolic steroid use is rarer, and is usually undertaken covertly – so the metaphor here lies on cultural knowledge of the idea of people abusing anabolic steroids leading to extreme physical and mood changes.

Making a good impression

That is not to suggest this metaphor does not work. Rather I would suggest that most listeners would have appreciated the intended message implied by 'on steroids', and moreover the speaker was likely able to call upon the metaphor implicitly – that is without stopping to think about how the metaphor might be understood.

Metaphors of this kind can be very effective in giving an audience a strong impression of the scientific ideas being presented. It is worth noting, though, that what is communicated is to some extent just that, an impression, and this kind of impressionist communication contrasts with the kind of technically precise language that would be expected in a formal scientific communication.

Language on steroids

Just considering this short extract from this one item, there seems to be a great deal going on in the communication of the science. A range of related concepts are drawn upon as (assumed) background and a narrative offered for why the earlier versions of the human genome were incomplete, and how new studies are producing more complete sequences.

Along the way, communication is aided by various means to help 'make the unfamiliar familiar' by using both established metaphors as well as new comparisons. Some originally figurative language (mapping, coding, regions) is now so widely used it has been adopted as literally referring to the genome. Some common non-specific metaphors are used (hitting a wall, hard to access corners), and some specific images (threads, knots and tangles, balls, head-tails) are drawn upon, and some perhaps bespoke comparisons are introduced (the coin in the pocket, being on steroids).

In this short exchange there is a real mixture of technical language with imagery, analogy, and metaphor that potentially both makes the narrative more listener-friendly and helps bridge between the science and the familiar everyday – at last when these figures of speech are interpreted as intended. This particular extract seems especially 'dense' in the range of ideas being orchestrated into the narrative – language on steroids, perhaps – but I suspect similar combinations of formal concepts and everyday comparisons could be found in many other cases of public communication of science.

An alternative concept map of the extract, suggesting how someone with some modest level of background in the topic might understand the text (filling in some implicit concepts and connections). How a text is 'read' always depends upon the interpretive resources the listener/reader brings to the text.

At least the core message was clear: Scientists have now fully sequenced the human genome.

Although, I noticed when I sought out the scientific publication that "the total number of bases covered by potential issues in the T2T-CHM13 assembly [the new research] is just 0.3% of the total assembly length compared with 8% for GRCh38 [The human genome project version]" (Nurk et al., 2022) , which, if being churlish, might be considered not entirely 'fully' sequenced. Moreover "CHM13 lacks a Y chromosome", which – although it is also true of half of the human population – might also suggest there is still a little more work to be done.

Work cited:

Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., . . . Phillippy, A. M.* (2022). The complete sequence of a human genome. Science, 376(6588), 44-53. doi:doi:10.1126/science.abj6987

Notes

¹ We often talk of DNA as a substance, and a molecule of DNA as 'the' DNA molecule. It might be more accurate to consider DNA as a class of (many) similar substances each of which contains its own kind of DNA molecule. Similarly, there is not a really 'a' human genome – but a good many of them.

² Working memory is the brain component where people consciously access and mentipulate information, and it has a very limited capacity. However, material that has been previously learnt and well consolidated becomes 'chunked' so can be accessed as 'chunks'. Where concepts have been integrated into coherent frameworks, the whole framework is accessed from memory as if a single unit of information.

Read about working memory

³ Strictly only a container can be full – so, this is a metaphor. Language is never full -as we can always be more verbose! Of course, it is such a familiar metaphor that it seems to have a literal meaning. It has become what is referred to as a 'dead' metaphor. And that is, itself, a metaphor, of course.

⁴ Language changes over time. If we accept that much of human cognition is based on constructing new ways of thinking and talking by analogy with what is already familiar (so the song is on the 'top' of the charts and it is a 'long' time to Christmas, and a 'hard' rain is going to fall…) then language will grow by the adoption of metaphors that in time cease to be seen as metaphors, and indeed may change in their usage such that the original reference (e.g., as with electrical 'charge') may become obscure.

In education, teachers may read originally metaphorical terms in terms of teir new scientific meanings, whereas learners may understand the terms (electron 'spin', 'sharing' of electrons, …) in terms of the metaphorical/analogical source.

*This is an example of 'big science'. The full author list is:

Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gershman, Sergey Aganezov, Savannah J. Hoyt, Mark Diekhans, Glennis A. Logsdon,p Michael Alonge, Stylianos E. Antonarakis, Matthew Borchers, Gerard G. Bouffard, Shelise Y. Brooks, Gina V. Caldas, Nae-Chyun Chen, Haoyu Cheng, Chen-Shan Chin, William Chow, Leonardo G. de Lima, Philip C. Dishuck, Richard Durbin, Tatiana Dvorkina, Ian T. Fiddes, Giulio Formenti, Robert S. Fulton, Arkarachai Fungtammasan, Erik Garrison, Patrick G. S. Grady, Tina A. Graves-Lindsay, Ira M. Hall, Nancy F. Hansen, Gabrielle A. Hartley, Marina Haukness, Kerstin Howe, Michael W. Hunkapiller, Chirag Jain, Miten Jain, Erich D. Jarvis, Peter Kerpedjiev, Melanie Kirsche, Mikhail Kolmogorov, Jonas Korlach, Milinn Kremitzki, Heng Li, Valerie V. Maduro, Tobias Marschall, Ann M. McCartney, Jennifer McDaniel, Danny E. Miller, James C. Mullikin, Eugene W. Myers, Nathan D. Olson, Benedict Paten, Paul Peluso, Pavel A. Pevzner, David Porubsky, Tamara Potapova, Evgeny I. Rogaev, Jeffrey A. Rosenfeld, Steven L. Salzberg, Valerie A. Schneider, Fritz J. Sedlazeck, Kishwar Shafin, Colin J. Shew, Alaina Shumate, Ying Sims, Arian F. A. Smit, Daniela C. Soto, Ivan Sović, Jessica M. Storer, Aaron Streets, Beth A. Sullivan, Françoise Thibaud-Nissen, James Torrance, Justin Wagner, Brian P. Walenz, Aaron Wenger, Jonathan M. D. Wood, Chunlin Xiao, Stephanie M. Yan, Alice C. Young, Samantha Zarate, Urvashi Surti, Rajiv C. McCoy, Megan Y. Dennis, Ivan A. Alexandrov, Jennifer L. Gerton, Rachel J. O'Neill, Winston Timp, Justin M. Zook, Michael C. Schatz, Evan E. Eichler, Karen H. Miga, Adam M. Phillippy

Share on Facebook

Author: Keith

Former school and college science teacher, teacher educator, research supervisor, and research methods lecturer. Emeritus Professor of Science Education at the University of Cambridge. View all posts by Keith