Category Archives: Methods to assess TEM and AFM images

With all due respect for Octave (Matlab)

With all due respect for Octave it becomes clear that the output graphics of the peak finding programs (ipeak; findpeaksplot; autopeaksplot; findpeaksplot; etc, and even some excel templates for finding peaks, have little to do with what can be used graphically to show results in a publication-ready manner. Peak symbols are big and clumsy, peak locations are offset to a degree that they can’t be used to illustrate parameters gathered, like peak width, height and area.  You have created a very cumbersome application for those who are interested in visualizing microscopic data.  I know you guys are total genius in writing algorithms, … but not in creating presentable graphics, and its OK,  just like I am not genius in signal processing, but more capable in graphics.

My recommendation is that you hire (or train, or associate with) someone who can walk you through design, graphics, and scientific illustration.  That may sound negative but it is not. It is a legitimate recommendation and an offer to help. Just like i need help with Octave, you all need help with scientific illustration methods.

Just as one example, output to csv and import into excel or a vector graphics program (like CorelDRAW) is totally cumbersome, and we all know already that for 30 years excel has been unfriendly with their output for publication graphics. Octave takes this to a new and outstanding level of  unnecessary lines and objects.

If Octave is a freebie of Matlab, and the programs are largely interchangeable, then Matlab has the same problem.

Look up table plots of pulmonary surfactant protein D (SP-D) made in ImageJ; also signal processed in an xlsx peak finding template

In thinking about signal processing programs for analyzing plots of grayscale peaks and valleys from traces (made in ImageJ) of SP-D molecules (AFM, an image published by Arroyo et al) I was sort of surprised to see that sometimes what I would call “nonsense” peaks and valleys showed up, and others i thought should have been counted were left out.  This only surprised me because I dont understand the algorithms that are used to predict peaks…. i understand that. But I am not willing to let go of what my eyes see as a peak, to some that doesn’t understand the greater symmetry in biology, and in particular, in the dodecameric structure of SP-D.  I took one image, and plotted both hexamers in both directions.

Previously when plotting in Gwyddion it was evident that the plots were not reproducible when the image was rotated 45 degrees, so these plots were made in ImageJ, and those replicates seem to look pretty much the same.  (see figure below) the original tracing in imageJ  (1px line) left to right through both hexamers.  The reverse plot went from right to left (but the arms were given the same names (1, and 2).

This first set of images is processed (5px gaussian blur), plotted in ImageJ, exported to excel. Peaks and valleys were selected by hand. Top set (original plot), second line, those plots mirrorred, third line, plots in reverse direction. Colors: peach (N term junction); purple, tiny peak so far undescribed; light green peak is likely the glycosylation area; dark green, pink and white peaks not yet defined; yellow and orange, neck and CRD in varying orientations. The hexamer is allegedly bilaterally symmetrical so peak heights and widths and numbers should be consistent. BUT in at least two instances they wont be, that is at the neck and CRD since there is a host of different ways that those three CRD can fall and be arranged during processing; and the glycosylation peak area may reflect different heights widths and lumpiness depending how many sites are glycosylated.  In the arm 1 below, one glycosylation peak is considerably lower than the other and this likely is meaningful.


The second set of images (not shown yet) will be made using the same arrangement, but an excel template PeakValleyDetectionxlsx (Thomas O’Haver) arranged in the same way.-, with same color arrangements. While I would have liked to see the valley markers at the beginning and end of the valley-plot produced (black line at bottom shows where valleys are calculated), cropping the valley series at the bottom of the plots at the same point as the length of the plot of the peak series helps make the last peak an appropriate width. The forward and reverse plots (compare the reverse plots at the bottom with the middle set (mirrored plots), show quite similar data. In fact, where I draw the segmented line accounts for the greatest differences in peaks size and number as that line grabs the grayscale values.  Such differences in peaks/valleys occur in the plots of the N terminal junctions  (center peak, peach color).

Finding peak height and width from image plots (ImageJ) and signal processed plots (Octave and Excel templates)

Finding peak height and width from image plots (ImageJ) and signal processed plots (Octave and Excel templates)  can be done, but by and large, trying to figure out any details for where the markers for the widths of the peaks  in the signal processed plots is no better than doing it by hand, EXCEPT for one excel program which is free to anyone developed by Thomas O’haver, and is called PeakValleyDetection.  This program allows a new series to be plotted which shows the valley marks on the plot.  For me, this is the best program so far for determining where the valleys are “without my personal input”.  Looking for valleys and peaks in all other measurements seems to be a mattern of “selection” by the user.

Here is an example. 1) original image and plot using ImageJ, 2) plot from image J smooth using PeakValleyDetectionTemplate-xlsx, 3) comparison plots from my choice of valleys and peaks within the ImageJ plot and the valley choices made automatically by a smooth factor of 11, in the PeakValleyDetectionTemplate-xlsx. They are close.  I made some different choices, and the algorithm made some as well. One issue with the xlsx template is that it tends to leave off (or not count) the last peak. I dont think this is a good thing, but I bet that the problem is that it uses a “backlooking” perspective and does not account for the fact that in biology there is so much repetition (mirror, duplication, inversion, tandem, bilateral, etc etc ad nauseum) that iterations of a pattern are not taken into account like they should be.  This is one of the reasons that the plots were analyzed as trimers separately.

OK, a new issue just came to mind – and that is a) the segmented line used for plots always went from left to right, and therefore in the algorithms, there may be some bias, whereas with the human eye, probably not.  So that begs the question, if i plot all trimers outward from the complete N termini peak, will this change the results in those plots which plotted in the mirror of each other.

Peak widths from image and signal processed plots of SP-D

This is the beginning of width and height counts for the summation of numerous measures of many image and signal processed plots for a single dodecamer. (see previous posts) and image of the actual dodecamer at the end of this post – a guide to labels and names used in the summaries).

Starting with the width of the N terminal peak which is a junction of four N terminal domains from each of four trimers, thus 12 parts. Results in nm (calculated from the bar marker in the original image).

No variation here in the width of the N term peak regardless of how one measures it. It appears that going from one side to the other within the context of each arm shows an even and regular arrangement of the three trimeric N’s.  When measured individually, the same results appears. Some of my original counts did NOT include a tiny peak on either side of the N term juncture and this accounts for the change from @20 nm width to the current value of @15 nm.  The appearance of that tiny peak resulted from the use of signal and image processing tools.

AFM of surfactant protein D

Peak counts from image and signal processed plots of SP-D

Just to reiterate: these are all values from analysis of a SINGLE SP-D image (noted many times before in this blog). Various methods of enhancing the image for detecting peaks (whether by filters and masks in photoshop, corelDRAW, or corelDRAW photopaint, or Gimp or gwyddion, and several more raster adjusting programs, as well as a few signal processing programs (PeakValleyDetection xlsx, PeakDetection xlsx, Octave (ipeak.m, findpeaks.m, allpeaks.m), there is concensus. IT IS CONCENSUS BY CHOICE. It has to be recognized that the CHOICE, whether of filters and masks in image processing, or functions in signal processing, is mine (YOURS). Peak counts can be manipulated to go from 1 to 40 in both image and signal processing. It requires sensible input from the user.  That said – 15 peaks per hexamer looks pretty solid.

Here are the N, mean, sd, var for trimer peak counts – which numbers include the processing that has been done so far, so this will change with other variations on the signal processing data. Please note that the complete N terminus peak is included in the counts of every trimer (this means that from the distal edge of the center bright peak to the CRD is what is measured, so counts include the whole N term of the dodecamer as a ONE peak) in each trimer.


Alpha fold?

I happened upon this website, saying alpha fold is helping with protein structure, which it probably is, but it hasn’t helped with the structure of SP-D yet —  as the areas of low significance (orange and red) are completely out of line with any microscopic evidence (see AFM from Arroyo et al) which shows nice correlation with the blue areas (high probability of being correct) vs the rest of the model which has absolutely no correlation with the shape of the rest of an SP-D trimer.  Images as presented in this model (proposed models left hand part of the figure below) are the reason I continue to try to establish an accurate count of the number of peaks, their height, and width, using various imaging and signal processing programs to help define the shape along the more or less “straight” collagen-like and N term domains of an SP-D trimer.

Carbohydrate recognition domain and neck domains (BLUE) are spot on, collagen-like domain and N termini junction (orange and yellow)… really really no good.

surfactant protein D trimer with overlay of molecular model of SP-D

Summary of 392 plots from different signal and image processing algorithms of one SP-D dodecamer image

Summary of 392 plots from different signal and image processing algorithms of one (YES JUST ONE so far) SP-D dodecamer image, maybe a long and repetitive approach, but ultimately it may say something about which signal and imaging functions and filters get the most valuable structure data from an image of a molecule.  The reason for using SP-D was the host of wonderful images available in the published literature (Arroyo et al) and because the current molecular model of SP-D is unfinished.  Numerous models have been made of the carbohydrate recognition domain, but little else has surfaced.

I saw a need to compare the benefits of image vs signal processing to determine such things as peak width, height, and peak number in the arms of SP-D dodecamers to find the most informative, and easiest way to determine what the rest of the molecular landscape of SP-D might look like.

The image below was selected deliberately. It has features that are common to AFM images of SP-D, it had definite bilateral organization  (in this case it is better called radial symmetry since the N termini are a central very bright peak and the arms extend to the CRD of each trimer. The labels on the image are guides to the values (N, mean, sd and var). 392 different plots were obtained from trimers in this image which were processed in a dozen different filters and algorithms with different programs and to many different degrees.  You have seen this image many times before in this blog – i call the molecule 41 aka 45….  no surprises.

Some data here which can be explained by the labels in the image above: arm 1 is nearly horizontal, while arm 2 moves to a more vertical position from left to right. N termini – center black dotted ring, arrows indicate the directions of the plots.  The whole N terminus is included at the beginning of each trimer plot. The trimers 1 and 2 are subdivided into 1a (left side of the micrograph) and 1b (right side), and 2a (again, the left) and 2b the more vertical arm on the right. bar marker=100nm. the diameter is shown by a blue dotted line and the criteria were that the circle had to graze three of the four CRD, in this case, left, right and top were used to calculate the approximate diameter of the dodecamer (the N here includes that many measurements using various processing and filtering of the image above.

A comparison to a single imaging program and plots of this molecule are just very similar to the data from the bigger dataset above,  see those values here.

More measures below on this dataset. Thesse are values  from a single image. It is clear that there are some filters and algorithms that are more informative than others.

Comparing peak height and peak width for a single hexamer of surfactant protein D

Comparing peak height and peak width for a single hexamer of surfactant protein D has lead me to the conclusion that:

1. many methods (image and signal processing) can be used that produce very similar results
2. many methods (image and signal processing) can produce rediculous results
3. concensus may or not provide the best results

I have examined this particular AFM image of SP-D (which i call 41_aka_45 (an image of am SP-D dodecamer from a publication by Arryoy et al) — the name is given here so it is possible to relate this post with many previous posts on this image)for hours, literally, using more than half a dozen image processing programs and dozens of image processing filters, as well as signal processing using two excel templates for finding plot peaks (by Tom O’Haver) and peak finding functions which use Octave. The purpose is to find the best (and easiest) method(s) for determining peak number, peak width, peak heights of grayscale plots (made using ImageJ) of this type of image and similar images. I was really pleasantly surprised when Aaron Miller added a function in one of the excel templates that displayed the valley points in the excel plot. This provided a second plot which, when exported as a metafile, could be used to quickly define peak widths as vertical lines. While not using peak slope to calculate peak width in a more sophisticated way, it does allow for easy comparison of peak number, width and height obtained as signal processing, with those parameters obtained with image processing and plotting in ImageJ.
Below are two plots, top identifies peak valleys (peak width) and height (on an image that had been filtered with a 5px gaussian blur, but no signal processing), and the lower plot was defined using signal processing on the same image, in this case the PeakAndValleyTemplate for excel (by Tom O’haver) with a smoothing factor of 11 or 9 – i need to check.

crd= corelDRAW 19; gausblur 5px-gaussian blur 5px; PVDxlsx (PVDxlsx=PeakAndValleyTemplate  for excel); (compare colors and widths in the two plots: dark orange outer peaks=CRD, yellow= possible neck domain, white, pink and darker green represent as yet undefined domains likely in the collagen-like dolmain, the light green the named glycosylation site (glycosylation appears to cause a lumpiness (perhaps relating to glycosylation of 1 – 3 molecules in the trimer) a small peak (purple) just before the N termini junction(light peach color), with the latter often divided at the center with a valley).

Most conservative estimate for number of brightness (LUT) peaks along an arm of a dodecamer of SP-D

Most conservative estimate for number of brightness (LUT) peaks along an arm of a dodecamer of SP-D is shown in this plot, with color coding from CRD (orange) inward to the Ntermini junction (peach). The recognized peak (glycosylation site(s) are light green. Other peaks are not generally know but consistently divide out into height and width (nm) as shown here, cascading downward in height but varying in width from center (N) to CRD). In most plots the CRD are composed of two, even more, peaks as the CRD fall into irregular places during processing. In this plot however, they show up as one. My impression gained by using less “blur” and more edge detection is that the consistent number of peaks is more like 15, with tiny peaks beside the Nterm central peak. This will, I hope, show up in analysis of all the different processing filters, and with more than a single molecule, as shown here (this is SP-D dodecamer 41 aka 45 (named by me, from publication of Arroyo et al), seen many times before on this blog).

Each peak width and height has been measured, hopefully a summary of all different image and signal processing will confirm this pattern.

My hope is also that somehthing specific about the degree of glycosylation (light green peaks) can be determined, as seen here with different peak heights for that area. It is also clear that the peak area that has been shown to be glycosylated is rarely a single rounded peak, but more often multiple peaks within a general peak. This is vaguely demonstrated on the light green peak shown on the right hand side of the plot. Differences in the length of each trimer of this hexamer are most likely due to how the molecule was spread during preparation. This can be partly overcome by adjusting the trimer plots to the same widths.
minimum of 13 peaks along one hexameric arm of an SP-D dodecamer

Peak drag

Just looking at some grayscale peaks in a plot of a surfactant protein D dodecamer and noticed that one hexamer, plotted in ImageJ, of this molecule (AFM, 41 aka 45 – sorry for that id i have given this dodecamer) with a CorelDRAWx5 photopaint program and a 50 percent-10px minimum filter shows a drag on the peaks which I dont think i have noticed before. These occur on one trimer (left side, minimally on the first peak with the red arrow, then on the next four peaks more prominently) of the hexamer plotted (see actual dodecamer – bottom image) of this dodecamer, but not on the second trimer (right side of the plot).

It looks like a “nice” demonstration of a drag artifact.  A displacement from left to right seen on the LEFT half of this plot (horizontal trimer), but not the right side of the plot (vertical trimer).

Original image used to make this plot is below. It is interesting that the drag did not appear on the trimer which is more vertically oriented, i.e. the right hand trimer (closer inspection of the image also shows a smearing of that trimer which till now I had not paid any attention too, but it certainly translated into a change in the plot). bar=100nm