rlb 8-20-2023

# Plotting peaks in a hexamer of a “fuzzy ball” multimer of SP-D

Plotting peaks in a hexamer of a “fuzzy ball” multimer of SP-D are problematic at best, and the plots that most coincide with the hexamer plots from dodecamers are “V” shaped tracings making the boundaries of the N term domain peaks hard to define. It is a decision that needs to be made, that is whether to go back to measuring hexamers “within the fuzzy balls” is important to completeing a writeup on SP-D trimer (hexamer) peak number, size, and shape.

# N term peak in a plot of a SP-D trimer

First plot of a trimer, for comparing the N term peak height and width in “trimers” vs “dodecamers” shows that the former domain peak to be about 11nm wide where the peak width for an N term in a hexamer is about twice that.  When plotting hexamers which are part of a dodecamer all four trimer N terms domains are present, typically as single very large peak but sometimes with a tiny depression at the top of the peak.

to be sure there is no consistent difference in width between the plots of two hexamers of a dodecamer – hexamers alone need to be plotted.

The peak width of hexamers (plotted as part of a dodecamer – which was done here) is close to double what the trimer N term peak is and trimer peak height is about half the peak height of the N term peak plotted in a hexamer or dodecamer. It looks very much like an area  4x (so convenient).

One other thing to examine on trimers is the presence of a more pronounced “tiny peak” which may be more visible relative to a single trimer than four trimer peaks in a dodecamer.
It will take a lot of plotting to confirm this observation (which now is just an N=1), but certainly it was evident on this image. N term is on the left, plot starts on the left, CRD is on the right (typical two peaks where different floppy CRD might fall during preparation, glycosylation peak is close to the left.  It is easy to count at least 8 peaks along this unprocessed image. mar marker is from the original (Arroyo et al, 2020) at 80nm.

# NATURE: inexcusable abundance and waste

What is the reason for the inexcusable abundance of nature? Take for example the number of seeds in a pod, and the number of pods on a red bud tree, or the number of keys from a maple tree, or seeds in a sunflower, or eggs thrown off by a frog in the water.  What is the reason that the countless brilliant thoughts and creative and wonderful ideas of humans are just lost to the “forever” star dust as each artist, craftsman, builder, and scientist and educator and poet dies.

# The original and the processed: SP-D

Figure below shows an original plot of surfactant protein D (Arroyo et al, 2020) presumably made by the program that comes with the Atomic Force Microscope that was used to visualize the protein preparation. Brightness on the y axis, distance on the x.

I used this plot as a “published” documentation of peak count of a trimer of SP-D. While looking at the images in this and other articles by Arroyo et al, I became convinced there were patterns in the brightness (peaks), symmetrical and mirrored, in the hexamers, dodecamers and multimers (though determining patterns of brightness peaks in the latter is more problematic than in the hexamers and dodecamers).

My own counts of peak number, and peak width were consistent enough that I sought out ways to verify the number and properties of these peaks in an “unbiased” way.  (Of course there is no way to be totally unbiased in any research, but I did try to select image filters and signal processing functions that seemed appropriate, easy to use, and produced consistent data which I then applied in the same manner to 12 dodecamers of SP-D).

After more than 1000 plots of Sp-D hexamers, this figure (above) shows that not only are the three peaks defined by Arroyo et al, correct, but the tiniest variations in her plot that would not have been considered significant or relevant are actually verified as actual peaks (see the summary plot with which her plot is compared).  This seems to me to reveal several things.

The reason that the N term peak is so large in the dodecamer compared to what Arroyo et al measured in their trimer is that there are four N terminals in a single (maybe not always single, perhaps sometimes side by side) peak that form the common intersection.  All other measured peaks are a domains of a single trimer.

1. There is more to be learned about molecular structure from AFM images than is generally perceived, but this requires different types of image and signal processing.

2. There are many programs useful for defining or confirming peak number, width, height, and valley in plots generated by ImageJ that are free and easy to use and include signal as well as image processing apps (and both).

3. ImageJ has a great free plotting app for plotting grayscale values which is easily exported to excel and can be saved as metafiles and manipulated in draw programs such as CorelDRAW.

4. Resulting data here provides details which will assist those who are working to “finish” constructing the molecular model of human surfactant protein D.

# Overzealous Debunking

### Overzealous Debunking

Love it,  totally true of the anti-  and advocate- personality types. It is the world in “two words”, the subject matter matters not.  It is the dicotomy in political “religiosity”.  It is the heubris of “i am on the right side,  you are not”

# Bias: the good, the bad, the learned – peak finding functions and image filtering

Purpose: To contribute to predictions about the current structural model of surfactant protein D, in particular, the collagen-like domain.

Aim: To suggest there are recognizable patterns in the number and shape of peaks in grayscale plots of SP-D obtained as tracings from CRD to CRD of hexamers. These grayscale plots were made with ImageJ using published AFM (atomic force microscope) images (Arroyo et al, 2018) of dodceamers of recombinant human surfactant protein D (SP-D):  as unfiltered images and as images subjected to a variety of image processing filters and/or signal procesing peak finding functions.

Abstract: Sufactant protein D has four domains:  N terminal domain,  collagen-like domain,  coiled coil neck domain, and a carbohydrate recognition domain (CRD). Monomers of SP-D are coiled homotrimers which readily form multimers joined at a communal peak of N terminal domains.
Common multimers are hexamers and dodecamers, but multimers with 30+ arms can be found (Arroyo et al, 2018).
RCSB ( ) (as of this writing) has many molecular models for the CRD and neck domains, but the full trimer (all four domains) (or hexamers or dodecamers are not currently modeled(   ). Clearly, from AFM images, the collagen-like domain is reasonably straight, but this information has not yet contributed to a full molecular model.

A total peak count for an entire hexamer was published at 5 peaks (Arroyo et al, 2020), with peakes labeled at either end as CRD, peak in the middle as the N terminal attachment junction, and two peaks (on either side of, and close to the N termini peak).

However, using a manual count of bright peak from the image, a count of peaks obtained from the grayscale plots, and peak count obtained using signal processing-peak counting functions demonstrated that the

Closer examination of the raw images, images subjected to image filtering, and peak finding functions, suggest the peak count for a hexamers is  15+/  .  The peak number found by careful examination of the grayscale plots, without the use of image filters and without peak finding functions, is not significantly different than the number of peaks found after processing.

Twelve image processing programs and 4 signal processing programs (each with numerous settings for filtering (e.g. sharpen, median, mode, mean, blur, limit range, noise  reduction, etc) and peak detection (smooth, lag, influence, distance, height, threshold, etc)) were applied to a representative dodecamer in an unbiased (naive) selection process (availiability, ease of use, cost, filtering options, output format) deemed to be consistent with the visual data from the original images. The resulting number of peaks detected (15.xx+/xx) was used as a bench mark.

A selection from those applications, resulted in a set of  7 imaging programs, with numerous filters and 4 peak finding programs to assess peaks in 13 additional dodecamers, analyzed individually and together, which demonstrated 15 peaks per hexamer, of which 9 peaks (5 peaks per trimer) were present 95-100% of the time, two peaks per hexamer (1 peak per trimer) was present 71% of the time, the neck peak was detected 51% of the time. One tiny peak was often visible, lying near the valley on the down-slope on either side of the N term junction peak. It was consistent enough in appearance (42% of the time) position, size and frequency to merrit a classification, though it was both  shallow  and narrow.

The linear aspect of the collagen-like-domain, and the presence of 5 easily detected peaks therein, along with relative peak heights, widths and valleys should provide useful information for predicting the molecular structure of the collagen-like-domain of SP-D. In addition, the data show that visual identification of  peak number per hexamer of SP-D in images subjected to filters was not significantly different than peak number found using pre-screened signal processing functions, and also not different than manual counts from the original images before processing by any app.

Red arrow (figure 1) shows the trajectory of a the trimer (beginning at the bottom, CRD, moving upward to the N terminal domain which is linked to three other trimers at the center of the dodecamer.

Methods:

Peak count per hexamer:  Peak count was obtained initially, using hundreds of plots, both manual peak counting and counts found using an inclusive number of programs for both image filtering (12 image programs) and signal processing programs (7). Results from all initial peak finding functions from all software initially tested and one dodecamer image (my number 41_aka_45) for a total of 633 plots.  These 633 plots were used 1) to define which programs, which filters, which functions produced peak counts most comperable to the peak plots created in ImageJ.  This included the lowest counts from a variety of (unbiased?) citizen-scientists,  counts including poor resolution (highly pixelated) images, images filters and subjected to peak finding functions. This set of plots determined the number of peaks per hexamer to be 15, a number which was also verified stepwise with 4, 6, 8, 12 and with the final dataset (14 dodecamers and selected functions).

15 peaks per hexamer was used as a baseline for assessing the peaks found by plots analyzed in several different signal processing programs and settings. Both counting peaks by hand, and by function certainly carry some bias. It becomes matter of selection of how to apply parameters in many cases even when visually it appears illogical for the inclusion, or exclusion of some peaks by signal processing functions (see post  “to peak or not to peak“. Never-the-less, all data (no selection bias) were used to determine that in all likelyhood, there are 15 bright areas in each hexamer (8.1+/2.4 peaks per trimer where the N termini is measured as a whole).

Image processing programs and filters:  Programs used for initial, and final image filtering listed below included free software, as well as two prominent paid programs. Typically the free-ware provided fewer options for subtle filtering than paid programs. ImageJ was used for plotting grayscale values (peaks) exclusively. The only other program used for plots was Gwyddion and some discrepancies in tracing grayscale when plots were made in vertical directions as opposed to horizontal, however Gwyddion was used for image filtering.  The final choice of software for image filtering is listed on the left.

Initial signal and image filtering apps are seen in the top portion of this figure, and final choices for all 14 dodecamers lies below.

Further analysis and fine tuning images with gaussian, median, mean, sharpening, and range limiting filters, as well as optimizing peak finding options such as smoothing, distance, height,  lag, threshold, width, influence, etc  in signal processing shows the peak number to be more than 15 peaks per hexamer.

Line tracings of SP-D to produce grayscale plots:  End to end through the center of the hexamer, segmented line, plotted as grayscale in ImageJ.  Some details here.

Three peaks per SP-D trimer (5 peaks per hexamer,  9 peaks per dodecamer) have been identified. The tallest peak is central in each hexamer/dodecamer comprising the N terminal junction: 1 N-terminal domain for each of the trimers in the dodecamer. Grayscale plots through the center of a hexamer will have N=4 N terminal domains if it is plotted in a dodecamer.   the glycosylation peak(s) (when the SP-D is glycosylated) lie on either side of the N-term peak and the carbohydrate recognition domain (CRD) peak(s), and each of these was recognized by Arroyo et al, (2018). However, their plot (Figure 2, C) 15 peaks can easily be counted, not just the 5 peaks per hexamer that were listed (a very conservative count) in their legend. After extensive analysis, 15 peaks is a reasonable number and, in addition corroborates peaks on their initial plot.

and peak visually from unfiltered images before plots were made (group 1 ) a visual count of bright peaks (arm a in figure 1). Grayscale plots were made (using ImageJ) to determine peak number, height, width and valley of those same unfiltered images by drawing a segmented line lengthwise through individual hexamers (2 per dodecamer) beginning at one CRD through the center width of the N termini junction peak and continuing to the second CRD) (figure 1 yellow plot line, arm b)( a count of peaks from the ImageJ plot.( group 2) .  Each of the plots were were then subjected to signal processing functions (group 3) to compare (confirm?) visual assessments (eliminate bias?). LEGEND: Surfactant protein D dodecamer. Two hexamers, with each hexamer of the dodecamer labeled as arm a or arm b. CRD at bottom, center, bright spot (labeled START), moving in the direction of the white arrow to the bright spot at the top of the image (the CRD at the other end of the hexamer). .  Red arrow shows the extent of a trimer plot, from CRD (at bottom labeled START, through the entirety of the brightest peak (N termini junction).

Total number of plots examined for 14 SP-D (figure 2) dodecamers came to over 1500 trimers (that is 385 dodecamers).  14 dodecamers were thus, plotted about 100 times each. (see number for each different dodecamer below.  The largest numbers were those several dodecamers that were used to establish the mean number of peaks per hexamer. Clearly dodecamer labelled 41_aka_45 was used to determine which image filtering programs, and which settings for signal processing filters would be used for the other dodecamers. The list also shows the image of each of the 14 dodecamers (labeled in white on AFM images.

Molecules numbered 127 and 4A are the same but derived from different figures in the same publication. Bar markers in the images varied in the figures from 20,30 to 200nm. Each image was manipulated “along with” its bar marker to insure that dimensions were consistent.

A list of image filters and signal processing functions: (41-aka-45) 292 different image filters plus signal processing functions (trimers so n=73 dodecs)  332 plots, all different image filters in x different image processing programs n=83 dodec)

Image processing programs and filters

Training: the dictionary defines training as  “the action of teaching a person or animal a particular skill or type of behavior”. That definition now includes computers and each comes both with great potential, and great limitations.

Learning: the dictionary defines learning as “modification of a behavioral tendency by experience”, and in the case of artificial intelligence, to learn without explicit programming.

Bias: the definition relevant to research is “systematic error introduced into sampling or testing by selecting or encouraging one outcome or answer over others” or “a disproportionate weight in favor of or against an idea or thing”. A rather negative view of bias in research (Zvereva and Kozlov, Sci Rep 11, 226 (2021)DOIhttps://doi.org/10.1038/s41598-020-80677-4), but suggest two important approaches to limit bias – 1) understand the measures available to avoid bias and 2) report measures used to avoid bias. They also state “Cognitive biases are unconscious, which means that simply being aware of the existence and importance of biases is not sufficient to avoid them”.

Machine learning bias: “Machine learning bias, also known as algorithm bias or AI bias, is a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning (supervised and reinforced machine learning) process.”

(it seems like unsupervised, supervised and reinforced machine learning should be great backup for limiting bias in interpretation? – aka mistakes, selection bias?  in a relatively simple assessment of peaks in a given plot.

Bias present:

1) non-response bias (missing value): “As a rule of thumb, the lower the response rate, the greater the likelihood of nonresponse bias. Nonresponse bias becomes an issue when the response rate falls below 70%.” (says who)
2) automation bias: “Automation bias is an over-reliance on automated aids and decision support systems”. (method bias)?
3) in-group bias: ” the tendency for us to give preferential emphasis to one group, while ignoring outgroups”.
4) implicit (unconscious) bias: automatic and unintentional, yet impacting outcome (judgement)”.
5) reporting bias: “the decision about what to report depends on the direction or magnitude of the findings”. (thats what peer review is for)
6) false impression bias: “also known as the frequency illusion or recency illusion”
7) sampling bias: “a type of selection bias” –( e.g. test molecules being systematically more likely to be selected in a sample than others).
8) selection bias: “selecting an item (or various items), not using randomization of those items. therefore the data is not representative of the given population”.
9: confirmation bias: “the tendency to search for, interpret, favor, and recall information in a way that confirms or supports one’s prior views”
10) measurement (data collection) bias (errors): ” refers to the tendency of algorithms to reflect human biases (supervised and reinforced machine learning), (personal communication : “you chooses the settings” which is true for python-scipy peak finder (prominence 0.2, distance 30, width 5, threshold 0 height 0); for PHP Zscore (Lag 5, Threshold 1, Influence 0.05), for Octave’s AutoFindPeaksPlot.m (xy), ipeaks.m (M80), and also in PeakValleyDetection.xlsx (smooth 11)).

Bias relevant to outcome,

Selection Bias (yes, just dodecamers, from one researcher)
Spectrum Bias
Cognitive Bias
Data-Snooping Bias
Omitted-Variable Bias (missing data)
Exclusion Bias (out of focus molecules)
Analytical Bias
Reporting Bias (this would appear to be an ethical issue)

The definition of all of the above words has changed: in society, in science, in philosophy.
In the context of this post, the To create an “unbiased” count of the number of peaks

“people should assume right now that the models only perform to about 95% of human accuracy.” (https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained).

Results and Discussion:
Peaks, subpeaks. Figure below shows the analysis at four different tiers in analysing the number of peaks and subpeaks in dodecamers: as an N= 6, 8, 12 and 14 individual molecules, each processed in many ways, and each included  in subsequent analyses.
Peak number per trimer is shown in graphic below of an analysis of 14 trimers shows there is no statistically significant difference (none was expected either since SP-D should appear as a bilaterally symmetrical molecule ) between the number of peaks in either of the hexamer’s arms (a and b, i.e. left and right sides of a hexamer, respectively) and therefore, of any trimer in a dodecamer.

1. Arroyo et al, 2018, https://doi.org/10.1016/j.jmb.2018.03.027

https://imagej.nih.gov/ij/
https://terpconnect.umd.edu/~toh/spectrum/PeakFindingandMeasurement.htm#ipeak
https://terpconnect.umd.edu/~toh/spectrum/PeakFindingandMeasurement.htm#findpeaksx
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html
https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data/22640362#22640362 (version: 2020-11-08)

# Verge of a Dream: Not revealed

I could take all
The unfairness of the
World. I would not mind.
The surrounding cold, the
Coming darkness.
I wonder if it is
Something you never
Wanted, until now.
It could be quiet as
The cloister.
Party’s noise.
A hard day’s work, or
The stings remaining
From the inevitableness
Of life.
The curtains will open
Next morning.
If light moves in waves
It happens in secret like
Love not unreturned
only not revealed by
The covering star light.

RLB 06/23/2023

# Toxi-city:

Toxi-city: Dream of creating cartoons for environmental health —  decades ago…. haha.