Monthly Archives: April 2022

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Octave, Ipeak

Over 14 peaks, and a mode of 15 peaks was found as the mean of peak counts of a single surfactant protein D molecule (an AFM image) using five programs, and within each, a variety of peak finding settings. Each of these programs defines how they detect peaks in different ways (which makes it a little difficult (for a non programmer like me) to compare them. Nevertheless, the number of peaks seems to be very close to what is found by application of conventional image processing filters and masks.

A list of the image processing programs used before signal processing is provided on the left, the types of image processing is give above that list. (two programs were not used, inkscape and Octave, the former because it did not have the conventional filter names and applications as other programs, and Octave because this program was very cumbersome compared to dedicated image processing programs.  Again, the mean number of peaks is very close to that found with image processing alone.

Counts of peaks with signal processing algorithms do NOT include counts from the image directly, or the original plots created in ImageJ, unlike the values found in the mean number of peaks using image processing. The list of full names of the image and signal processing programs has been in previous posts.

Two image processing programs (cpp19 and gwyddion) were used with Octave as a signal processing program to detect peaks. Images were subjected to either a 5 px or 10 px blur, and then using gwyddion, some were subjected to a limitrange filter (100-255). Various settings were used in Octave Ipeaks.  Except for the limitrange (most right set of numbers) more peaks were detected by Octave than by peak counts by hand from images, or plots.Number of peaks found was easily shown to increase when the suggest peak number was increased.

Ipeaks input statements and results were grouped as above (and individually, below).

Verge of a Dream: Gray

Whatever color it is
A hovering gray
Of a sky, of a day
There is feeling of
a careful distance
From the several
window panes
To stay.
With the glass
as an empty
sunroom will.
Whatever color it is.
A tear stain gray
in years so still the
drawn in cell walls
become by
discouraged dreams
of home.
Whatever color it is.
An age made gray
why ask
when it became
white… perhaps made
In the small agony
of surprises
each time
in the
image the
mirror provides.
Though not in lying,
It will not make
plain the next
time is not the
last time.
The finger snap
of one time
already passed.

RLB 04/28/2022

Verge of a Dream: glass shade

I saw you in the semi-darkness
stepping on the work bench
foot rest
one dark heel below the other
and the glass shade held
by the hand that created it.
Bound and belonging on
a foundaried base.
The art in reverse
On the glass, maybe cosmos
Or thick dahlia….
The light was less so,
the Jerusalem lily unbrightened,
As though through
settled fogginess
silently asking it be
brought to me
brought to me
and no other.

RLB 04/27/2022

Do image processing blurs and filters have a significant impact on peak counts of SP-D hexamers?

5 and 10 px gaussian blur filters have little impact on peak counts of SP-D hexamers, as it seems from a summary peak counts (see image below).
Much of the peak count data collected for single dodecamer of surfactant protein D (as a grayscale plot with peaks along a line drawn through the center of each of the two hexamers, CRD to CRD)(my molecule number – 41_aka_47) was performed on images that had been subjected to a 5px or 10px gaussian blur. The blur application using the programs listed in previous posts did not specify the px radius of the blur, except one (a filter in Photoshop 2021) that called this blur 10px radius. This plot was included with all other gaussian blur filters. Only one image was processed with what i presume to be a gaussian blur (Octave blur 101-10).
Of the total 159 sets of plots, 7 images received NO processing, 62 received a 5px gaussian blur, 50 with a 10px gaussian blur. Almost always the lowest pixel blur to barely smooth the image was employed. Gaussian blurs were used before other image and signal processing to eliminate low res pixellation in the original images (saved from pdf files). High levels of blur are not in the best interest of preserving detail, and the amount of blur was always dependent on the quality of the original (access to the original digital files, presumably higher resolution images was not possible).

Below is a summary of the impact of gaussian blur on peak counts. Gaussian blur (either 5px or 10px) alone, or with some additional image processing, or the whole set together. The mean peaks counted in each the hexamers 15.0+/-1.24 (nothing really different from what the entire set of plots predicted (see pervious post).

It would appear that the removal of pixellation using minimal processing (in this case  just a modest gaussian blur, or a median filter, does reduce the number of grayscale peaks in each hexamer. The highest number of peaks per hexamer is in the “no image processing” group. The effects of processing were easy to see directly from the plots, but required a more unbiased verification. Please dont confuse the titles of the summar data eg “mean filter” “median filter” “maximum” “minimum” “box blur” (WHICH APPLY TO THE “FILTERS APPLIED”  with the vertical data which calls the calculated data by similar names “mean number of peaks”, Median, Mode, Max (as in the maximum number of peaks counted in a dataset of a plot) Min, Sum, and Var (variation).  Totally different things…. same names.

Limit range filter (Gwyddion) was the filter that I liked best, especially when used with a gaussian blur. There is only one image on the graph below, but there are dozens using this filter under the signal processing group. There is a pretty obvious increase in peaks with this filter.

Maximum, minimum, mean and box filters applied to this image, sometimes with gaussian blur as well. Perhaps the minimum and box filters increased number of peaks found, but I would not personally use these filters to enhance peak detection. It was reasonably evident from the image after application of the minimum, mean and box filters that the result was not what I was looking for.

Lowpass, unsharp mask, and  smart blur. (All counts from image processing)

Just using the bitmap filters and masks of CorelDRAW and CorelPhotoPaint, Photoshop, Gwyddion, ImageJ,, Inkscape, Octave (just for image processing no signal processing here), and GIMP show the following summaries.  (All peak counts from each of the image processing programs — each analylzed separately to see whether there was variation in the algorighms used.)

and the value I see as putting the image processing into a category of “nice” not too specific. There is so little variation between programs that “opinion” and “ease of use” and type of “output” would seem to be the best criteria for which to use in microcopy.  I have a preference for the proprietary programs, just for ease of use (except ImageJ which is really a great program) and Gwyddion, though the only use i found was for image processing, and i also found the plotting function produced lots of errors (in my hands). But Gwyddion does have a great function for limiting range and I used that often.  It seems that with image processing, 15 peaks per hexamer is going to be the very best result, consistent and easy to verify.  Abbreviations are listed in a different blog (here).

Summary of peak counts for ONE (1) surfactant protein D molecule, after the application of 18 image and signal processing apps, with variations in settings

1 (ONE) molecule (AFM image of surfactant protein D) which I call 41_aka_45 ( published by Arroyo et al, 2018)

SUMMARY: 159 different grayscale (LUT) plots of one surfactant protein D dodecamer — as 2 hexamers (arm 1, arm 2)  and as 4 trimers (arm 1a, 1b and arm 2a and 2b) show that personal judgement is still critical for determining the number of brightness peaks along this molecule.

METHODS: 12 image processing programs (listed below) were used to filter, mask, limit range, change contrast, HSL, etc, to enhance the appearance of peaks in this image.
2 programs (listed below) were used for plotting grayscale data (ImageJ (used on almost all images) and Octave/Matlab (occasionally).

5 signal processing programs were used to count the number of peaks in grayscale plots made by a 1px segmented line  using imageJ, with dozens of variations in the  input statements for those signal processing programs.

12 peak counts were obtained from volunteers for a single set of plots of this dodecamer (ages 8 – 74  volunteer citizen scientist impressions of the number of peaks).

DATA: All data were saved in an excel file with all image and signal processing parameters to allow assessment of combinations of processing programs and types to produce the most convincing peak counts, widths, and heights.

PURPOSE: To identify a method(s) for assessing the number of peaks, relative peak widths and heights for AFM images of surfactant protein D. (A method that could be applied to countless other AFM application and other molecules).
RESULTS: Nothing produced the results I had hoped for, but there is a clear trend in peak number, width and height. (See next post sometime in the future).
CONCLUSION: Variations in the number of peaks detected in each hexamer produce both an even number or odd number of peaks, the mean and mode are similar,  for hexamers (mean=15 peaks, median=15 peaks, mode is something closer to 13 peaks).  The likely peak number for any given surfactant protein D hexamer is still open, since this is an analysis of methods, not molecules.  The use of this molecule was based on observations of about 100 other images, and represents a reasonable “good choice” to select a methodology.  There are clear options for the most efficient peak detection in image and signal processing, and there are just as clear deficiencies.

A summary of the number of programs, plots and peaks applied to this one molecule is shown below – and it represents the sum total of the data image processing, signal processing, and quick peak counts from citizen scientists, as well as the 100 or so counts of my own, from each image.

Image processing programs: psd=Photoshop (proprietary, Photoshop 6 and Photoshop 2021; cpp=corelPhotoPaint (proprietary – raster graphics program, CorelPhotoPaint x5 and 2019); cdr=corelDRAW (proprietary vector graphics program, CorelDRAW x5 and 2019 where the  image adjustment menu was used); gw=Gwyddion, a multiplatform modular free software for visualization and analysis of data from scanning probe microscopy techniques, used here ONLY for image processing; (free, open source raster graphics program); gimp=GIMP GNU Image Manipulation Program (free, opensource); inkscape= (free and open source) vector graphis program; Octave/Matlab (Octave is free and open source) used briefly for image processing (separate from signal processing, limited to 3 plots total, as this was a super cumbersome way to process images). ImageJ, used for both image processing and excel plots (ImageJ is a Java-based image processing program developed at the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation)(free). I added a column of counts of my own, made from each processed image as well (my peak counts).
Signal processing apps:
batchprocess  (Aaron Miller’s app for batch processing excel files using  the Lag, Threshold, Influence (open source library); Octave/Matlab (various settings for FindPeaksPlot, AutoFindPeaksPlot, Ipeaks)(check out Thomas O’Haver’s website); scipy, (Daniel Miller’s app for peak finding using Prominence Distance Width Threshold Height (Sci/Python open source library); Two excel peak finding templates – 1) PeakDetectionTemplate.xlsx and 2) PeakValleyDetectionTemplate.xlsx (Thomas O’haver).  Many variations for amplitude threshold, slope threshold, lag, distance, width, smoothing and many others) were used in signal processing.
Citizen science:  Peak counts in a single set of plots of this dodecamer were obtained from a group of friends and family, ages 8 – 78. I did not include my counts in this category as they numbered in the hundreds, not just one set of plots as the former.

Below is just a summary of the number of trimers plotted, in each of the above image and signal processing programs.

Summary of all counts of 2 hexamers, one dodecamer

My counts from the image as processed dozens of times with dozens of filters with the 12 vector and raster imaging programs produced the most consistent results, but very similar to my own counts of the actual excel plots generated by a trace through the center of the hexamers (2 trimers) were found in a manual count of the peaks of plots made in ImageJ.  Both image processing filters and signal processing algorithms have a huge impact on the variation (var) and the min, and max of the number of peaks counted (judgement is required).
It is worth noting that my counts of peaks from the images is the the lowest, meaning to me that processing might be a good backup for confirming what is seen by eye. – In fact, the reason for this study was that I saw a pattern in the peaks (mine more detailed and specific than the pattern reported in the literature), and I wondered if it was provable.

I doubt adding more peak counts to this data is going to change much (LOL). So now, the approach is to separate out the filters, masks, and algorithms which best fit the mean and median.


It is not sufficient just to count the number of peaks in an AFM image of surfactant protein D

It is not sufficient just to count the number of peaks in an AFM image of surfactant protein D. That sounds like a rediculous comment but the signal processing programs do just that.  Provide a number of peaks. This just doesn’t help when there are subtle peaks, very prominent peaks, differences in peak width and height, some relative to the height of other peaks and gross variations in the way the trimer (three loosly or tightly bound monomers, depending upon which domain you are looking at and the CRD even with areas which bend over the neck domain in the images which changes the plot lines dramatically.

Similarly, the N termini junction has been  noted as having two different ways for the trimers to assemble into multimers and this too is shown in plots (sometimes two (or three) peaks at the top of the N term grayscale peaks), sometimes just 1 peak.  This likely will sort out in two ways 1) whether the line plotting grayscale goes through a side of the N term junction or through the “center” the latter likely not always “up” in the image.

Sorting the peaks into at least four different categories is necessary when counting up the number of peaks.  In this study, the N termini peak is counted from its valley most distant from the CRD through the entire N term-peak, UNLESS there are TWO distinct peaks (counted by the signal processing algorithms, in which case each half of the whole N term is counted separately.

That way an entire N term length is counted for each trimer, actually with a grayscale plot which always is recorded from N to CRD (regardless of the direction that the protein lies in the image).  The actual line CRD to CRD but the width and peak heights are recorded in the database in order that all calculations can be calculated on trimers, and all calculations begin with N and end with the CRD.

Point anomalies, pattern anomalies ??

There are programmers who have interests in biology, I am finding, that understand the need for signal processing of repeating and symmetrical pattern-containing signals.  It is an important issue, as there are times when i look at the peaks defined by algorithms (such as Lag, Threshold, Influence) which just dont do what I would like them to do, and as well, there are examples of peak finding (and peak ignoring) which just dont make visual sense to me.  See the plot below (end to end tracing (grayscale plot) of a surfactant protein D hexamer) that has peaks detected using the LTI values using an app made expressly for me by Aaron Miller. It detects lots of the peaks that are obvious, but I am pointing out in the middle and lower images, those peaks which because of the previous values are just ignored, while other peaks which are just tiny bumps in a larger peak, are tagged as a separate peaks.

Top image: blue line is the grayscale plot; boxes are the peaks widths (marked as valleys on either side of the algorithm’s detection of peaks (purple lines); Grayscale axis (y) normalized to 100. This set of peaks in this particular plot (representing one CRD-CRD segmented line drawn through the center of a single dodecamer of surfactant protein D (image is from Arroyo et al). Overall the plot is not very different in terms of peak number than that ascribed to the plot by “citizen scientists” (friends and family) “my counts” (about 500 of them) and various signal processing programs (Octave/matlab peak detection functions); Scipy app (from Daniel Miller), excel peak and valley detection templates (Thomas O’Haver); and an LTI app (from Aaron Miller).

Here are two instances where i don’t like the peaks that are flagged. This sample is from the LTI app (A Lag of 5 will use the last 5 observations to smooth the data. A threshold of 1 will signal if a datapoint is 1 standard deviations away from the moving mean. And an influence of 0.5 gives signals half of the influence that normal datapoints have.) I have put into the link the LTI values for this particular plot.  Two specific instances where i disagree are shown in the plots below, each an excerpt from the complete plots above. Plot excerpt on the left shows one peak NOT detected ( fat red line above the undetected peak), and on the right shows a nonsense (in my opinion) peak (tiny thin red bar above the peak).

RED BARS are over the peak on the left i would LIKE to have detected, and red bar over the peak on the right seems like it should not have been detected. The challenge is to find a model plot and compare the “real plots” to back to the model thus allowing for the extraordinary discrepancies in peak height and width to be tagged, and not removed in moving averages.  The same issues exist in image processing…. but one ends up using judgement, but then, with judgment comes bias.

It is these irregularities that are causing me to go into signal processing for biology with much disappointment.

Woefully lacking in subtle peak detection.

This has been a long and frustrating journey — and I thank the two kids who have helped me and a retired professor, and other friends and neighbors who have listened to me complain.  The bottom line….

peak finding programs are great at getting rid of noise, but really poor at detecting subtle bilateral peak symmetry

Its almost like I find them unable to get out of their “ruts”, slopes, thresholds, sliding averages for this, and two peaks before this and what about rounded peaks.  Just doesnt work.

Case in point is the enormous number of times I have plotted the same molecule of SP-D and failed to pick up some really tiny peaks beside the N termini junction of a dodecamer, but managed to find 4 peaks on the downslope of the presumed glycosylation peak. I dont doubt for an instant that adding individual molecules to a site in one, two or three strands of a trimer can result in a bumpy elevation, but if the peak finding algorithms find peaks there, then why not the tiny peak burried right beside the very tall, very wide peak for the N termini.  It is like the curse of position.  I could and should at some point determine whether the direction (before or after) the N term peak the tiny peak at the bottom of that valley is ignored or found.  But then it is at the valley between the two largest grayscale peaks in the molecule, right between the glycosylation peak and the N termini peak.  so it is doomed.  And also, not picked up by novice peak finders.  I know this peak is meaningful but how to get it into the peak detection programs is another story.

Symmetry and subtle peaks are just lost in the numbers.

One image: All plots to date

Surfactant protein D is listed in many websites, and even wikipedia, as just the carbohydrate recognition domain and neck domain, little else of the molecule (which includes the N and collagen-like-domain) has been modeled. This blog has many posts dedicated to understanding why the other two domains have not been modeled.

Wonderful images found in Arroyo, et al, offer a great opportunity to look critically at the structure and it is clear that there is much information available from a deeper look at those AFM images.  That was the initial purpose of this blog, however, it became clear that just plotting the grayscale along a line drawn through the images of the dodecamer arms (hexamers) of SP-D some serious processing of the images was required. So I set out to find the “best” that best enhanced the images without changing their data. It also became clear that an unbiased count of the grayscale peaks along the plots of hexamers and trimers) was required. Then numerous signal processing programs were used to find the “best” algorithms for counting peaks.  This, along with image processing ARE still subject to the bias of the investigator.

The parameters for both image and signal processing are driven by the opinion of the investigator, and then I though perhaps some citizen scientists (friends and family) could be asked to count peaks in the grayscale plots to compare to plots from image and signal processing.  The bar graph shows over 500 individual attempts to find the number of peaks in a trimer from a single image of SP-D.  Images below that show the actual image, and an example of one such plot and graph that summary bar graph.

Mean peaks per trimer = 8.14 +/- 2.48 , but the mode is 7, shown below. The mode is 7 likely because when counting the peaks the N term gets counted for each trimer, but is usually seen as one central bright(est) high(est) peak.

White arrow on the image of SP-D shows that the entire N term plus the trimer arm is plotted toward the CRD domain. Known peaks are N, gly, and CRD, the other four peaks are consistent and will be meaningful at some point. The peak at the neck is sometimes seen, often depending on whether one of the CRD of the trimer is lying overtop.

Just from my own observations, there will emerge an additional, very low and narrow peak just at the bottom valley of the N termini peak.  Not shown here, but barely detectable on the line plot (but not marked with a color in the lower bar graph) between N and gly.

One surfactant protein D image, lots of peak number – measurements

One surfactant protein D image, lots of peak number – measurements using various signal processing algorithms.
Just the beginning of the assessment of which peak-finding programs work well with AFM images, are easy to use, and generate more insight than just the opinions of the observers.
There really isnt much new by using these signal processing programs for reasons which I think might be related to the fact that “noise” is a big issue for signal processing, and symmetry and variation are not that well handled. Just a microscopist’s opinion here, not one of a programmer.

Clearly there is still a judgement call to be made on whether to use the “mode or the mean” in deciding what is the best number of peaks.  Differences between the number of peaks on the right and left sides of a dodecamer  (differences in the way the molecule has fallen on the mica, other processing issues) are clearly a stumbling block to a determination of symmetry in peaks, and the slope, threshold and a number of other signal processing options allow for great variability in peak numbers. I am certainly leaning toward the simplest comparison, that of the human eye, and then a plot with modest peak processing to identify peaks and valleys.

To view the actual image (sadly called 41 aka 45) just roam back through the SP-D posts, it appears a jillion times.