Category Archives: Methods to assess TEM and AFM images

Peak counts of a single surfactant protein D molecule (an AFM image): Peaks counted in trimers

Peak counts of a single surfactant protein D molecule (an AFM image of a dodecamer): Peaks counted in trimers – all processing methods are summarized here.  The mean peaks per trimer can be an odd interger depending upon whether the N termini peak is actually one peak, or two, in which case the number of peaks per trimer must be an even number. Both variations occur, and it begs the question of how the four trimers are linked in the middle (ie the 12 monomers at the N termini junction)… some say it varies, and i think i would have to agree.

Variation would be a partial explanation for differences in the center area of the N term peak group in terms of small peaks at the very high point of the N term.

That is a separate topic, this is just total number of peaks per trimer. I used the same molecules, and color scheme, and dataset that has been used in previous posts concerning peaks, per surfactant protein D molecule (trimers, hexamers and dodecamers). Programs used for image processing are listed on the left. Signal processing can be found in previous posts, and i also included the citizen scientist peak counts.


The difference in positioning, and stretching molecules (right and left, or top and bottom) makes a difference in how many peaks are found. This generally is completely known to the observer, the differences in trimer length can be easily seen (as stretching, folding, or compression) and documented. In this particular case the left half of the dodecamer image  (arms 1a and 2a, have fewer peaks counted by all peak counting methods) and the right hand side of the dodecamer  has more (please look up this link to see the image) (and when i calculate the nm length it will be obvious there too). There are a couple comments on why this happens,

1) is that the extra spread shows the lumpy nature of the glycosylation peak and the adjacent slightly smaller peak.

2) the CRD and neck peaks tend to be exaggerated because of the many ways that part of the molecule can fall onto the mica, and

3) the N termini junction of the dodecamer often then shows a peak right at the high point of that N term peak and the

4) tiny peak in the valley either side of the N termini peak, is often overlooked because of the overwhelming width and height of the N termini peak.  Aside from being concealed by the brightness (high gray scale value) of the N term peak, signal processing is very inefficient in detecting tiny peaks when either side has large peaks.  So machine learning may be the only way that this tiny peak can be varified.  Peak value for the right hand arms of the dodecamer is almost 9, while the left hand is just over 7.

Peak counts of a single surfactant protein D molecule (an AFM image): Citizen scientist counts

This quick look at what the perception of peaks might be in unbiased individuals was an interesting side-trip. Two instances of KNOWN bias on the plots accounted for the smallest peak count and the largest peak count. Haha… I did not expect that.  I represent a reasonable biased count.

I have counted peaks on this same image hundreds of times (actually 274 times – one or two times with the mode of signal and image processing (not every single time, but obviously many many times) which means i counted the original images (pixelated and rough, to the most processed, gaussian blur and limitrange, and the counts with some programs which added rediculous numbers of variations (one example would be “roughen-inside” using Inkscape) and others that eliminated some detail. My peak counts from the exact two plots given to the citizen scientists differ considerably from theirs. I counted the gaussian blur 10px and limitrange 100-255 image as separate peak counts before, and at the same time as that the image was being used by another imaging or signal processing program for peak counts. The mean and standard deviation of my counts are variations in my own observations, which over two years have changed.

My variously obtained peak counts: 1) from images only, 2) from counting plots from image and signal processing programs 3) from graphing out the peak widths from plots obtained from processing.

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: scipy.signal

I used a lot of settings with this scipy app (scipy signal find peaks)(made for me by Daniel Miller from online resources and libraries) which allowed settings of “prominence” “distance”  “width”   “threshold”  and “height”.  See below. The image of a surfactant protein D dodecamer has been in this blog dozens of times. I used (image processing = either a 5 or 10 px gaussian blur, and on half the images i used an additional filter –  limitrange (100-255)- using Gwyddion. Image processing was applied before signal processing (gaussian blur of 5 px or 10 px, and limitrange. The image processing programs are listed below. The whole dataset comprised 10 dodecamers (thus 20 hexameric arms).

 

sci/py-P0.2D30W10T0H0
sci/py-P0.2D30W5T0H0
sci/py-P0.3D15W7T0H0
sci/py-P0.5D15W10T0H0
sci/py-P0.5D15W5T0H0

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Octave – AutoFindPeaksPlot

All settings that I used in Octave, AutoFindPeaksPlot were lower than those which I thought should be counted.  Below see my counts of the plots generated by Octave while counting peaks, and the results of counting peaks using various settings in AutoFindPeaksPlot.m (see table, at bottom) You can see from my counts of peaks present on the output plots that accompanied the other values (peak height, width), that the autofindpeaks values (at least the default values, and a few that I typed in) are much less lenient.  I found Ipeaks.m was closer to what I liked, and likely Autofindpeaksplot.m wont be used on the other 100+ images.

octave-AFPP-xy0.000108,32.71,32,32,3
octave-AFPP-xy0.000108,8.43,32,32,3
octave-AFPP-xy0.000129,28.99,29,29,3
octave-AFPP-xy0.000129,8.5,29,29,3
octave-AFPP-xy0000066,24.8,41,41,3
octave-AFPP-xy0000085-24.5867-36-36-3
octave-AFPP(x,y)

 

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: PeakDetectionTemplate xlsx

This excel peak finding template (Thomas O’Haver) was easy to use to find peaks, though I personally found that peaks that I deemed to be background were typically found, while one tiny peak that I find in many images was constantly overlooked.

My counts from both the image, and the individual plots in ImageJ were about 15 peaks per hexamer, where here it was 19.25 +/- 0.82.  The N is small but there was not much variation when i used different settings. Only one image was processed, using 0.6 as the amplitude threshold and either a slope threshold of 0 or 1 (thus four hexamer (arms, CRD to CRD with N peak in the center) were measured. The PeakValleyDetectionTemplate, using this same image processing combination, that is, gaussian blur and limitrange 100-255 in Gwyddion, also produced a large number of peaks (see previous post).

The more appropriate setting for the PeakValleyDetectionTemplate was used on a single image of 41_aka_45, and that was AmplitudeThreshold 0.6, SlopeThreshold 2.5

I have more peakfinding results using other SP-D dodecamers, but just one of the dodecamer I have called 41_aka_45.

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: PeakValleyDetectionTemplate-xlsx

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: PeakValleyDetectionTemplate-xlsx uses a program by Thomas O’Haver. I found that the best results for the plots of surfactant protein D were found using a smooth level of 11.  In the results below, 10 of the 12 image processing programs that I have used to determine the number of peaks along a segmented line drawn through the center of surfactant protein D (hexamer) arms were “peak counted” with this excel program.  The peaks counted at the value of smooth 11 was in line with counts made directly from the images, and with other signal processing programs.  This peak counting program was the easiest (in my opinion) to use and it exported a plot with valleys marked making it very easy to come up with a graphic identifying the position (width and height) of peaks.


Sample plot above, peak count from ImageJ plot and peak count using PeakValleyDetectionTemplate.xlsx smooth 11.

Below, examples of smooth 5,7,9,11 and 15.  Smooth 11 varations and combinations of the image processing plots were used.

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Lag Threshold Influence

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Lag Threshold Influence were measures put into a “batchprocess” app by Aaron Miller which allows selection of different parameters for LTI, and processing on many excel files at once.  The parameters of Lag Threshold and Influence were adjusted, and peaks found in the images (same ones used for finding peaks in other signal processing programs, also listed in a previous post).  This program normalizes x and y and esports those excel files to a new folder using the L T I settings chosen. I ran peak finding on many molecules other than this particular one, where there are only a few samples.  Charts below.  L T and I change peaks counted, and like Octave Ipeaks function, one can choose the input statements to create a count you like.  This is not exactly what I wanted to see with signal processing, but it is what I have come to understand, is what i get. It comes back to using common sense with the image, and then using the programs for automation.

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Octave, Ipeak

Over 14 peaks, and a mode of 15 peaks was found as the mean of peak counts of a single surfactant protein D molecule (an AFM image) using five programs, and within each, a variety of peak finding settings. Each of these programs defines how they detect peaks in different ways (which makes it a little difficult (for a non programmer like me) to compare them. Nevertheless, the number of peaks seems to be very close to what is found by application of conventional image processing filters and masks.

A list of the image processing programs used before signal processing is provided on the left, the types of image processing is give above that list. (two programs were not used, inkscape and Octave, the former because it did not have the conventional filter names and applications as other programs, and Octave because this program was very cumbersome compared to dedicated image processing programs.  Again, the mean number of peaks is very close to that found with image processing alone.

Counts of peaks with signal processing algorithms do NOT include counts from the image directly, or the original plots created in ImageJ, unlike the values found in the mean number of peaks using image processing. The list of full names of the image and signal processing programs has been in previous posts.

Two image processing programs (cpp19 and gwyddion) were used with Octave as a signal processing program to detect peaks. Images were subjected to either a 5 px or 10 px blur, and then using gwyddion, some were subjected to a limitrange filter (100-255). Various settings were used in Octave Ipeaks.  Except for the limitrange (most right set of numbers) more peaks were detected by Octave than by peak counts by hand from images, or plots.Number of peaks found was easily shown to increase when the suggest peak number was increased.

Ipeaks input statements and results were grouped as above (and individually, below).

Do image processing blurs and filters have a significant impact on peak counts of SP-D hexamers?

5 and 10 px gaussian blur filters have little impact on peak counts of SP-D hexamers, as it seems from a summary peak counts (see image below).
Much of the peak count data collected for single dodecamer of surfactant protein D (as a grayscale plot with peaks along a line drawn through the center of each of the two hexamers, CRD to CRD)(my molecule number – 41_aka_47) was performed on images that had been subjected to a 5px or 10px gaussian blur. The blur application using the programs listed in previous posts did not specify the px radius of the blur, except one (a filter in Photoshop 2021) that called this blur 10px radius. This plot was included with all other gaussian blur filters. Only one image was processed with what i presume to be a gaussian blur (Octave blur 101-10).
Of the total 159 sets of plots, 7 images received NO processing, 62 received a 5px gaussian blur, 50 with a 10px gaussian blur. Almost always the lowest pixel blur to barely smooth the image was employed. Gaussian blurs were used before other image and signal processing to eliminate low res pixellation in the original images (saved from pdf files). High levels of blur are not in the best interest of preserving detail, and the amount of blur was always dependent on the quality of the original (access to the original digital files, presumably higher resolution images was not possible).

Below is a summary of the impact of gaussian blur on peak counts. Gaussian blur (either 5px or 10px) alone, or with some additional image processing, or the whole set together. The mean peaks counted in each the hexamers 15.0+/-1.24 (nothing really different from what the entire set of plots predicted (see pervious post).

It would appear that the removal of pixellation using minimal processing (in this case  just a modest gaussian blur, or a median filter, does reduce the number of grayscale peaks in each hexamer. The highest number of peaks per hexamer is in the “no image processing” group. The effects of processing were easy to see directly from the plots, but required a more unbiased verification. Please dont confuse the titles of the summar data eg “mean filter” “median filter” “maximum” “minimum” “box blur” (WHICH APPLY TO THE “FILTERS APPLIED”  with the vertical data which calls the calculated data by similar names “mean number of peaks”, Median, Mode, Max (as in the maximum number of peaks counted in a dataset of a plot) Min, Sum, and Var (variation).  Totally different things…. same names.


Limit range filter (Gwyddion) was the filter that I liked best, especially when used with a gaussian blur. There is only one image on the graph below, but there are dozens using this filter under the signal processing group. There is a pretty obvious increase in peaks with this filter.

Maximum, minimum, mean and box filters applied to this image, sometimes with gaussian blur as well. Perhaps the minimum and box filters increased number of peaks found, but I would not personally use these filters to enhance peak detection. It was reasonably evident from the image after application of the minimum, mean and box filters that the result was not what I was looking for.

Lowpass, unsharp mask, and  smart blur. (All counts from image processing)

Just using the bitmap filters and masks of CorelDRAW and CorelPhotoPaint, Photoshop, Gwyddion, ImageJ, Paint.net, Inkscape, Octave (just for image processing no signal processing here), and GIMP show the following summaries.  (All peak counts from each of the image processing programs — each analylzed separately to see whether there was variation in the algorighms used.)



and the value I see as putting the image processing into a category of “nice” not too specific. There is so little variation between programs that “opinion” and “ease of use” and type of “output” would seem to be the best criteria for which to use in microcopy.  I have a preference for the proprietary programs, just for ease of use (except ImageJ which is really a great program) and Gwyddion, though the only use i found was for image processing, and i also found the plotting function produced lots of errors (in my hands). But Gwyddion does have a great function for limiting range and I used that often.  It seems that with image processing, 15 peaks per hexamer is going to be the very best result, consistent and easy to verify.  Abbreviations are listed in a different blog (here).

Summary of peak counts for ONE (1) surfactant protein D molecule, after the application of 18 image and signal processing apps, with variations in settings

1 (ONE) molecule (AFM image of surfactant protein D) which I call 41_aka_45 ( published by Arroyo et al, 2018)

SUMMARY: 159 different grayscale (LUT) plots of one surfactant protein D dodecamer — as 2 hexamers (arm 1, arm 2)  and as 4 trimers (arm 1a, 1b and arm 2a and 2b) show that personal judgement is still critical for determining the number of brightness peaks along this molecule.

METHODS: 12 image processing programs (listed below) were used to filter, mask, limit range, change contrast, HSL, etc, to enhance the appearance of peaks in this image.
2 programs (listed below) were used for plotting grayscale data (ImageJ (used on almost all images) and Octave/Matlab (occasionally).

5 signal processing programs were used to count the number of peaks in grayscale plots made by a 1px segmented line  using imageJ, with dozens of variations in the  input statements for those signal processing programs.

12 peak counts were obtained from volunteers for a single set of plots of this dodecamer (ages 8 – 74  volunteer citizen scientist impressions of the number of peaks).

DATA: All data were saved in an excel file with all image and signal processing parameters to allow assessment of combinations of processing programs and types to produce the most convincing peak counts, widths, and heights.

PURPOSE: To identify a method(s) for assessing the number of peaks, relative peak widths and heights for AFM images of surfactant protein D. (A method that could be applied to countless other AFM application and other molecules).
RESULTS: Nothing produced the results I had hoped for, but there is a clear trend in peak number, width and height. (See next post sometime in the future).
CONCLUSION: Variations in the number of peaks detected in each hexamer produce both an even number or odd number of peaks, the mean and mode are similar,  for hexamers (mean=15 peaks, median=15 peaks, mode is something closer to 13 peaks).  The likely peak number for any given surfactant protein D hexamer is still open, since this is an analysis of methods, not molecules.  The use of this molecule was based on observations of about 100 other images, and represents a reasonable “good choice” to select a methodology.  There are clear options for the most efficient peak detection in image and signal processing, and there are just as clear deficiencies.

A summary of the number of programs, plots and peaks applied to this one molecule is shown below – and it represents the sum total of the data image processing, signal processing, and quick peak counts from citizen scientists, as well as the 100 or so counts of my own, from each image.

Abbreviations:
Image processing programs: psd=Photoshop (proprietary, Photoshop 6 and Photoshop 2021; cpp=corelPhotoPaint (proprietary – raster graphics program, CorelPhotoPaint x5 and 2019); cdr=corelDRAW (proprietary vector graphics program, CorelDRAW x5 and 2019 where the  image adjustment menu was used); gw=Gwyddion, a multiplatform modular free software for visualization and analysis of data from scanning probe microscopy techniques, used here ONLY for image processing; paint=Paint.net (free, open source raster graphics program); gimp=GIMP GNU Image Manipulation Program (free, opensource); inkscape= Inkscape.org (free and open source) vector graphis program; Octave/Matlab (Octave is free and open source) used briefly for image processing (separate from signal processing, limited to 3 plots total, as this was a super cumbersome way to process images). ImageJ, used for both image processing and excel plots (ImageJ is a Java-based image processing program developed at the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation)(free). I added a column of counts of my own, made from each processed image as well (my peak counts).
Signal processing apps:
batchprocess  (Aaron Miller’s app for batch processing excel files using  the Lag, Threshold, Influence (open source library); Octave/Matlab (various settings for FindPeaksPlot, AutoFindPeaksPlot, Ipeaks)(check out Thomas O’Haver’s website); scipy, (Daniel Miller’s app for peak finding using Prominence Distance Width Threshold Height (Sci/Python open source library); Two excel peak finding templates – 1) PeakDetectionTemplate.xlsx and 2) PeakValleyDetectionTemplate.xlsx (Thomas O’haver).  Many variations for amplitude threshold, slope threshold, lag, distance, width, smoothing and many others) were used in signal processing.
Citizen science:  Peak counts in a single set of plots of this dodecamer were obtained from a group of friends and family, ages 8 – 78. I did not include my counts in this category as they numbered in the hundreds, not just one set of plots as the former.

Below is just a summary of the number of trimers plotted, in each of the above image and signal processing programs.


Summary of all counts of 2 hexamers, one dodecamer

My counts from the image as processed dozens of times with dozens of filters with the 12 vector and raster imaging programs produced the most consistent results, but very similar to my own counts of the actual excel plots generated by a trace through the center of the hexamers (2 trimers) were found in a manual count of the peaks of plots made in ImageJ.  Both image processing filters and signal processing algorithms have a huge impact on the variation (var) and the min, and max of the number of peaks counted (judgement is required).
It is worth noting that my counts of peaks from the images is the the lowest, meaning to me that processing might be a good backup for confirming what is seen by eye. – In fact, the reason for this study was that I saw a pattern in the peaks (mine more detailed and specific than the pattern reported in the literature), and I wondered if it was provable.

I doubt adding more peak counts to this data is going to change much (LOL). So now, the approach is to separate out the filters, masks, and algorithms which best fit the mean and median.