Total peaks and sub-peaks per peak from AFM images of surfactant protein D

BOTTOM LINE (lol here at the top)  Do signal processing apps work for detecting variations in symmetrical, and mirrored images? and do they have as much bias (from the math in the algorithms – lag-threshold-smoothing-distance-height-bla bla) as the human brain — albeit different bias) — is a combination of image, signal, and NO processing best, or useless?

Summary of peaks found per hexamer (12 dodecamers, 376 hexamers, 752 trimers), each plotted for grayscale (LUTs)  with ImageJ and either counted from the image by hand, counted by hand from the plot produced by ImageJ,  or peaks counted from the latter using a peak finding functions (Octave (Findpeaksplot,xy; ipeakM80), Scipy, Lag (smoothing) Threshold (z-score),Influence (moving mean and SD) (stackoverflow), and a PeakValleyDetectionTemplate.xslx (smooth 11) available from Tom O’Haver. Images were mostly those from a published article by Arroyo et al.

Peaks found previously to be about 15 per hexamer. This was used as a base, as it was also the most commonly found “visual” ID of trimers and hexamers of SP-D looking at images taken at the atomic force microscope (AFM).  Grayscale plots were made in ImgeJ, and peaks numbers assessed by many signal processing functions and also using the image, and the plot formed by ImageJ.

This process has been used on at least 4 iterations (three shown below) to determine the number of peaks in any hexamer of SP-D.  Hundreds of plots subjected to image filters, and/or signal processing functions resulted in the table below: summaries from a total of 6 dodecamers, 8 (previous 4 pluts 2 new molecules) and 12 dodecamers (8 previous and data added from 4 new dodecamers.)

The sort from the peaks found with signal processing functions were galthered together to comply with the mean peak number value of 15 per hexamer, found previously.

Three peaks (N term, glycosylation peak, and the CRD peak are present 100% of the time. There are two additional peaks that are present at 99% of the time, and 95% of the time which clearly need to be added to considerations of molecular shape. The neck peak (yellow) is very often covered by the CRD peak (three lumps in the trimer which can lie in many different positions on the mica before images are taken), shows up about half the time. This is when the CRD domains fall away from the center line traced with the segmented line in ImageJ.

Two other peaks appear less frequently, one is peak 5, which is seen about 74% of the time, is narrow and a low unimpressive, yet definable peak.  The peak called “tiny peak” (purple color as ID) occurs less than half the time: two reasons for this, the big one is that signal finding functions base their peak detection on characteristics of the previous peaks. The tiny peak is on the shoulder of the tallest, broadest peak in the hexamer, and thus influences the counting of peaks before and after.

1: ways to combat this are to “count them myself”,

2: plot the segmented line in both directions and compare the outcome

3: fine tune the signal processing functions to detect even the smallest peak (this however translates into dozens and dozens of detected peaks which are not relevant, but represent pixels primarily).

Dividing up the results into the peaks that i counted from the image and the ImageJ plots, and those which were subjected to signal peak finding apps, there is a difference but it really involves the detection of peaks that do not fit into the “15” per hexamer, that was found with all filters and functions previously.  That doesn’t mean that the number “15 peaks per hexamer” is never to change…  so that means more molecules to check.