Category Archives: Methods to assess TEM and AFM images

One surfactant protein D image, lots of peak number – measurements

One surfactant protein D image, lots of peak number – measurements using various signal processing algorithms.
Just the beginning of the assessment of which peak-finding programs work well with AFM images, are easy to use, and generate more insight than just the opinions of the observers.
There really isnt much new by using these signal processing programs for reasons which I think might be related to the fact that “noise” is a big issue for signal processing, and symmetry and variation are not that well handled. Just a microscopist’s opinion here, not one of a programmer.

Clearly there is still a judgement call to be made on whether to use the “mode or the mean” in deciding what is the best number of peaks.  Differences between the number of peaks on the right and left sides of a dodecamer  (differences in the way the molecule has fallen on the mica, other processing issues) are clearly a stumbling block to a determination of symmetry in peaks, and the slope, threshold and a number of other signal processing options allow for great variability in peak numbers. I am certainly leaning toward the simplest comparison, that of the human eye, and then a plot with modest peak processing to identify peaks and valleys.

To view the actual image (sadly called 41 aka 45) just roam back through the SP-D posts, it appears a jillion times.

Grayscale plots of N termini peak(s) in SP-D hexamers (as dodecamers): one peak or two?

There is a section on high and low molecular weight surfactant protein D from an publication by Grith Sorensen in Frontiers in Medicine, 2018 which has the following excerpt. “High-molecular weight SP-D multimers are only partly dependent on disulfide crosslinking of the N-termini, and a proportion of SP-D subunits are non-covalently associated. This allows interconversion between HMW SP-D and LMW SP-D trimers, as demonstrated using size permeation chromatography (36) (Figure 1B). The HMW/LMW ratio depends on the concentration of the protein in solution, with low-protein concentrations favoring the decomposition of multimers into trimers. In addition, the HMW/LMW ratio increases with affinity purification of SP-D, suggesting that ligand-binding facilitates assembly of SP-D trimers into multimers (Reference to an earlier article by the same author).”

There is specific reference to the ratio of high to low molecular weight multimers of surfactant protein D in relation to protein concentration (in the laboratory setting), and to the methionine 11 to threonine 11 allelic variants on the ratio of high to low molecular weight multimers of SP-D in humans.

It seems almost legitimate to view the two different peak plot patterns foud in the N termini peaks, traced from actual images of SP-D dodecamers (traced as two arms, i.e. hexamers – arm 1, and arm 2) found in the N termini of SP-D dodecamers. This valley seen about half the time in the center of grayscale N termini peaks (LUT tables traced in ImageJ) from AFM images (Arroyo et al, 2018) might suggest that even among dodecamers there can be both close tie between N termini (covalent links between two trimers) and loose associations, as well as a single peak, or two peaks respectively). In addition, the trace depends also on “where the segmented line is drawing during the trace, and the brightness saturation of the image.

 

I dont like this kind of variation in peak finding

I am so frustrated with image and signal processing. I dont care what settings (threshold, smoothing, slope -?) are applied. When I see the results of peak finding tags a tiny peak (see red arrow on left)(not so say this isnt an important peak because i think it is — see orange vertical line under that peak) but ignoring a huge, easily seen, not to be overlooked peak massive peak (see red arrow on the right and peak with NO ORANGE line to the peak) I just dont trust any of it.  I understand that slope and amplitude can be adjusted in these programs, but when upcoming and trailing values mess with “reality” (LOL).

COMMENT: this is a plot of a hexamer of surfactant protein D (CRD peaks are on each end, N termini junction is the center peak)

COMMENT: Just think… climate scientists and financial advisors are using similar algorithms to predict doom/prosperity.

11 peaks along a surfactant protein D hexamer is the absolute minimum, 15 is more likely

11 peaks (brightness/grayscale) traced along a center, 1px line in a surfactant protein D hexamer is the absolute minimum that can be easily described. Furthermore, and routinely, as many as 15 peaks can be found. More are seen with different algorithms, but may not be consistent. Much depends upon how the molecule falls, the over or undersaturation of the contrast (in the oritinal and in the image processing), and the quality of the image (obviously). Critical to finding peaks is the care with which the segmented line is drawn (the greater attention to this is what produces two peaks (sometimes 3) in the center of the N termini junction, and in the area just before the N termini junction (i.e. the area between the glycosylation(s) site and the N termini – which i have referred to in several posts as the “tiny peak”).

Thanks to Arroyo et. al. (about 90 images of SP-D) and Thomas 0’Haver and Aaron Miller (numerous signal processing programs and algorithms) for their contributions.

11 peaks is just the simplest, non-attentive, poor resolution, bare minimum, produced by analysing AFM images using the most basic image and signal processing tools freely available to anyone, but careful analysis shows much much more.

One of the surprises of signal processing plots from ImageJ is an apparent disadvantage in peak detection in being a molecule with large differences in peak brightness, coupled with prominent bilateral symmetry.
In the sense of perception of mirror symmetry has significant functions human perception. It has been shown that symmetry is detected within randomly placed elements, and it would be an easy leap to see this visual function as evolutionarily advantageous. Whats more, subjectively speaking, symmetry is apparently pleasing (using self as a sample) as I do this in my own artwork, and see it in architecture and other artwork as well. Not only symmetry but repetition (particularly iterations with rotation) and nature does this supremely well, and it has been studied mathmatically. Instances are too numerous to list and readily available to search online.

An interesting attribute of human sensitivity to mirror symmetry is “tolerance” to error, meaning that variations dont matter a whole lot. This may relate to the extreme abuncance of what biology does with duplicate molecules in arranging them into dimers, trimers, etc on ad nauseum. I am thinking that my interest in surfactant protein D, which has so much “symmetry” (not unlike other patterned, replicated, duplicated, inverted, rotated, molecules found everywhere), and just enough noise may be a visual symmetry puzzle, at best.

What is exciting is that SP-D runs the gamut of everything from monomer to multimer (multimer here being that unique molecule named the “fuzzy ball” by some surfactant protein D researchers, which at any point can have 100 or more monomers). In time, such a multimer will likely show numerous mirror, rotate, iterations which can be seen at quick glance before any image or signal processing.   Just for fun I re-uploaded a graphic mandala that I made a couple years ago, which is a “fake” look at fuzzy ball symmetry (12 trimers, 36 monomers. Alleged glycosylation peaks are a dark blue ring arou the center, CRD are the bumps at the ends of the molecules. Actual SP-D image, masked and colored in CorelDRAW. Pink background, black, white and blue abstract borders are “fills” and not relevant to SP-D structure. Artistically speaking i should go back and edit the background and border for more pleasing texture tile settings and colors. (LOL) and rearrange in the trimers.  Even in this artsie craftsie image the peaks along the arms can be seen. I am not ready yet to relegate the entire interpretation of the shape of SP-D to the best image and signal processing programs. Human input is obviously required still.

A dodecamer of pulmonary surfactant protein D (SP-D) – as I see it.

Apologetics: I am tired of trying to find a way to calculate peaks and valleys of this protein in a way that might be considered “biasless”.  Great quote I confiscated from a valuable signal processing website gave me inspiration to make a re-do of it.

“image and signal processing do not substitute for judgment, any more than a pencil substitutes for literacy” modified from Robert McNamara.

That said, I have made (with image processing in gwyddion and photoshop) a really nice image of an SP-D dodecamer and so clearly there are about 5 bumps in each of the CRD domains, a funnel shaped bright spot in the neck domains, smaller (and also thinner) peaks along the adjacent collagen like domain, and variable lumps and sizes for the area of the collagen like domain which is believed to be glycosylated.  THe lumps and bumps in this peak area appear to me to be due to the possible partial glycosylation (one, two or three) of each of the trimers.  Then there is the tiny peak which very often shows up right in the valley between the site of glycosylation and the typically very tall N terminal junctions of the four trimers. The latter (shown in this image) has a little depression which is commonly seen in the N peaks dividing that central area into two separate peaks, and even in some cases with a smaller elevation between the two.  All in all, I didn’t need signal processing of the peak plots to see this, and only used the basic filter functions in CorelDRAW and Photoshop to make them stand out.  Lots of effort went into the “image and signal processing” which has taken about two years, and was really not that informative to ME, but was just required to satisfy the predicted onslaught of “bias” comments.

AFM of surfactant protein D

Here is the image described above (41_aka_45, mentioned and shown many times in previous posts on this blog). White arrows and circles point to details mentioned above in obvious places, but all can be found in countless other images of the trimers.

AFM of pulmonary surfactant protein D

Losing the tiny peak

It seems likely that one of the reasons that the tiny peaks on either side of the N termini junction of a surfactant protein D dodecamer when employing signal processing is that with the enormity of the N termini peak the requirements for an adjacent peak are too great.  I think in some kinds of image processing this might also be a factor, maybe those which “sharpen” in particular, but it surely must happen in some comperable fashion.  I really noticed it more in the signal processing algorithms than image processing.

Also missing is the splitting of the N termini junction peak into two (sometimes with a very small peak in the center) using the signal processing algorhythms. Maybe for similar reasons.

It is a little distressing to watch the signal processing algorithms continually pass over a peak that I have seen many times in many molecules, and then say there are 4 peaks in a short span ot distance where there are literally “no peaks to be seen”.

With all due respect for Octave (Matlab)

With all due respect for Octave it becomes clear that the output graphics of the peak finding programs (ipeak; findpeaksplot; autopeaksplot; findpeaksplot; etc, and even some excel templates for finding peaks, have little to do with what can be used graphically to show results in a publication-ready manner. Peak symbols are big and clumsy, peak locations are offset to a degree that they can’t be used to illustrate parameters gathered, like peak width, height and area.  You have created a very cumbersome application for those who are interested in visualizing microscopic data.  I know you guys are total genius in writing algorithms, … but not in creating presentable graphics, and its OK,  just like I am not genius in signal processing, but more capable in graphics.

My recommendation is that you hire (or train, or associate with) someone who can walk you through design, graphics, and scientific illustration.  That may sound negative but it is not. It is a legitimate recommendation and an offer to help. Just like i need help with Octave, you all need help with scientific illustration methods.

Just as one example, output to csv and import into excel or a vector graphics program (like CorelDRAW) is totally cumbersome, and we all know already that for 30 years excel has been unfriendly with their output for publication graphics. Octave takes this to a new and outstanding level of  unnecessary lines and objects.

If Octave is a freebie of Matlab, and the programs are largely interchangeable, then Matlab has the same problem.

Look up table plots of pulmonary surfactant protein D (SP-D) made in ImageJ; also signal processed in an xlsx peak finding template

In thinking about signal processing programs for analyzing plots of grayscale peaks and valleys from traces (made in ImageJ) of SP-D molecules (AFM, an image published by Arroyo et al) I was sort of surprised to see that sometimes what I would call “nonsense” peaks and valleys showed up, and others i thought should have been counted were left out.  This only surprised me because I dont understand the algorithms that are used to predict peaks…. i understand that. But I am not willing to let go of what my eyes see as a peak, to some that doesn’t understand the greater symmetry in biology, and in particular, in the dodecameric structure of SP-D.  I took one image, and plotted both hexamers in both directions.

Previously when plotting in Gwyddion it was evident that the plots were not reproducible when the image was rotated 45 degrees, so these plots were made in ImageJ, and those replicates seem to look pretty much the same.  (see figure below) the original tracing in imageJ  (1px line) left to right through both hexamers.  The reverse plot went from right to left (but the arms were given the same names (1, and 2).

This first set of images is processed (5px gaussian blur), plotted in ImageJ, exported to excel. Peaks and valleys were selected by hand. Top set (original plot), second line, those plots mirrorred, third line, plots in reverse direction. Colors: peach (N term junction); purple, tiny peak so far undescribed; light green peak is likely the glycosylation area; dark green, pink and white peaks not yet defined; yellow and orange, neck and CRD in varying orientations. The hexamer is allegedly bilaterally symmetrical so peak heights and widths and numbers should be consistent. BUT in at least two instances they wont be, that is at the neck and CRD since there is a host of different ways that those three CRD can fall and be arranged during processing; and the glycosylation peak area may reflect different heights widths and lumpiness depending how many sites are glycosylated.  In the arm 1 below, one glycosylation peak is considerably lower than the other and this likely is meaningful.

 

The second set of images (not shown yet) will be made using the same arrangement, but an excel template PeakValleyDetectionxlsx (Thomas O’Haver) arranged in the same way.-, with same color arrangements. While I would have liked to see the valley markers at the beginning and end of the valley-plot produced (black line at bottom shows where valleys are calculated), cropping the valley series at the bottom of the plots at the same point as the length of the plot of the peak series helps make the last peak an appropriate width. The forward and reverse plots (compare the reverse plots at the bottom with the middle set (mirrored plots), show quite similar data. In fact, where I draw the segmented line accounts for the greatest differences in peaks size and number as that line grabs the grayscale values.  Such differences in peaks/valleys occur in the plots of the N terminal junctions  (center peak, peach color).

Finding peak height and width from image plots (ImageJ) and signal processed plots (Octave and Excel templates)

Finding peak height and width from image plots (ImageJ) and signal processed plots (Octave and Excel templates)  can be done, but by and large, trying to figure out any details for where the markers for the widths of the peaks  in the signal processed plots is no better than doing it by hand, EXCEPT for one excel program which is free to anyone developed by Thomas O’haver, and is called PeakValleyDetection.  This program allows a new series to be plotted which shows the valley marks on the plot.  For me, this is the best program so far for determining where the valleys are “without my personal input”.  Looking for valleys and peaks in all other measurements seems to be a mattern of “selection” by the user.

Here is an example. 1) original image and plot using ImageJ, 2) plot from image J smooth using PeakValleyDetectionTemplate-xlsx, 3) comparison plots from my choice of valleys and peaks within the ImageJ plot and the valley choices made automatically by a smooth factor of 11, in the PeakValleyDetectionTemplate-xlsx. They are close.  I made some different choices, and the algorithm made some as well. One issue with the xlsx template is that it tends to leave off (or not count) the last peak. I dont think this is a good thing, but I bet that the problem is that it uses a “backlooking” perspective and does not account for the fact that in biology there is so much repetition (mirror, duplication, inversion, tandem, bilateral, etc etc ad nauseum) that iterations of a pattern are not taken into account like they should be.  This is one of the reasons that the plots were analyzed as trimers separately.

OK, a new issue just came to mind – and that is a) the segmented line used for plots always went from left to right, and therefore in the algorithms, there may be some bias, whereas with the human eye, probably not.  So that begs the question, if i plot all trimers outward from the complete N termini peak, will this change the results in those plots which plotted in the mirror of each other.

Peak widths from image and signal processed plots of SP-D

This is the beginning of width and height counts for the summation of numerous measures of many image and signal processed plots for a single dodecamer. (see previous posts) and image of the actual dodecamer at the end of this post – a guide to labels and names used in the summaries).

Starting with the width of the N terminal peak which is a junction of four N terminal domains from each of four trimers, thus 12 parts. Results in nm (calculated from the bar marker in the original image).

No variation here in the width of the N term peak regardless of how one measures it. It appears that going from one side to the other within the context of each arm shows an even and regular arrangement of the three trimeric N’s.  When measured individually, the same results appears. Some of my original counts did NOT include a tiny peak on either side of the N term juncture and this accounts for the change from @20 nm width to the current value of @15 nm.  The appearance of that tiny peak resulted from the use of signal and image processing tools.

AFM of surfactant protein D