Category Archives: surfactant proteins A and D

N termini junction peak width in nm for one surfactant protein D dodecamer, tiny peaks at valley beside it

Lots of measurements, maybe this will allow me to pick and choose which processing provides what I think is the best overall measurement of the N termini junction peak for SP-D.  This includes image processing (dozens of programs and filters and masks done separately) signal processing (several algorithms from libraries used by Octave, Scipy, some excel templates and others) and a section of citizen scientists who pointed out peak number and position.

The tiny peaks on either side of the N termini junction are elusive, and dont show up all the time. Fact is that I see them frequently but only record their widths and peaks if the signal processing programs detect them. It is likely that I pick them out when signal processing only does so occasionally As for citizen scientists (1/52 plots) does not. Therefore a comparison of that image processing vs the signal processing data will be quite different. 165/632 trimer tracings (38%) showed a tiny peak (38%).  The image processing was best at detecting those small peaks on either side of the tall N termini junction peak (detected 133 times out of the 332 plots made with images processed in various manners. Citizen scientists just did seem to see it.   The mean nm for that peak (actually those peaks, tiny, and at the valley on either side of the N termini junction peak) is shown to have a peak width (this is measured valley to valley) of 3.55nm.


N termini peak of a single dodecamer of surfactant protein D

There are variations on the grayscale peak within this molecule, which has not yet been described. I have found three variations.
1) simple smooth peak
2) two peaks
3) three peaks (this is infrequent enough to be ignored, perhaps, but I am giving you relative incidence, peak width and height anyway.
The values below are for all peaks counted in three datasets: image processing, signal processing, and citizen scientist counts.
The dataset has plots from one image removed (Inkscape, roughen inside – which produce peak numbers for the entire dodecamer which were well over two standard deviations from the other processing programs). The other change is the switching two sets of data back to normal, from the order they were traced – which was in reverse direction, meaning that the what turns out to be a significant difference in arms length overall, these trimers now the correct order.
Only 13 of the 634 (recall that four traces were removed (Inkscape, roughen inside) total plots of trimers showed a tiny peak within the N termini peak.

peak width is based on the mean nm length of a hexamer from molecule 41_aka_45. That mean, SD, median mode etc, is below (142.87 nm from CRD to CRD) as distance recorded from the plot through the center of the hexamer using ImageJ, and calculated back to the bar maker that accompanies the micrograph.  Each plot varies slightly, and the nm was recorded for each image processing filter and mask.  No hexamer length was recorded for signal processing plots, as these values are all based on the existing image data. and would constitute a repetition.

All the incidences of tiny peaks within the N term peak (not to be confused with the tiny peaks on the valleys either side of the N term peak) occurred on the right half of this particular dodecamer, likely because of some incident as the dodecamer was delivered to the mica. Whether this is something to rare has to be determined. Of the 8 occurrences (, 2 were in arm 1 trimer a, 6 were in the plots of arm 2 trimer a (2=arm 1a, 6=arm 2a).  Peak width conting so few is less than 4 nm in width. See chart below.


If there are 15 peaks in a surfactant protein D hexamer, how should they be sorted?

I mentioned in the last post that there are some structural characteristics of surfactant protein D that are accepted as reasonable (which seem pretty likely to be the case). This includes the character of the carbohydrate-recognition domain, which has a structure that globular.  The yellow arrows show what kind of plot peaks will occur in a trace through the center of a hexamer (i.e. CRD+neck end of a hexamer to the opposite end where the neck+CRD of the second trimer are).

In the grayscale plots of SP-D it has become clear that there is an assortment of peaks that lie over the CRD and neck region that are accounted for by the position of the CRD as they hang from the neck domain, sometimes lying atop each other, or side by side, or over the neck region completely. One can anticipate that two peaks at the end of a plot of a trimer might represent one CRD domain lump and a peak of lesser height as the neck domain. Other times, there will be two peaks of similar height. In these cases when I observe the plot superimposed on the image it is clear which peak is aligned with which domain. I have given the peaks separate colors, as I have sorted them out during this project.

You can check out many previous posts where I have sorted peaks by color/domain, both known peaks and those yet identified but not named (the colors have been pretty consistently assigned during this process). Image on the right is what RCSB shows for the CRD+neck end of a trimer of SP-D, and even in this ball and stick model (colored by amino acids), the density of the transparent image shows one that a grayscale plot could show brightness peaks (and does show them) where there are two CRD overlapping.

The image below is a dodecamer which has been image processed with a gaussian blur and a limitrange of 100-255.I have superimposed the transparent CRD+neck ball and stick models over the end of an image of a dodecamer (arroyo et al) (41 aka 45 by my number). Yellow arrows show where several peaks in one CRD region will be found.

While you are looking at this dodecamer notice the dark center where the N termini junction is, and that this molecule also has varying brightnesses of the glycosylation peaks (in particular compare the lower right hand trimer with the upper right and upper left hand trimers).


Summary of number of grayscale peaks in the trimers of a single dodecamer of surfactant protein D

Summary of number of grayscale peaks in the trimers of a single dodecamer of surfactant protein D.  The mean is very close to 8 peaks. This includes the entire width of the N termini junction with each trimer counted as a peak (sometimes there are two peaks here but only counted once). The total number of trimers plotted for each processing type (image processing, signal processing, and citizen science opinions) are included so that it is understood that the heaviest investment in peak counting occurred with image processing (various programs and various filters and masks). The next most common processing was signal processing, which included initial image processing and automated peak counting after signal processing by various algorithms.  Lastly, small number of random individuals were asked to count what they thought were peaks along two plots (hexamers) of the very same image that was used for all other peak counts.  At this point, the image and signal processing programs which come closest to producting the 8 peak count will be used for other dodecamers (about 100 of them). This translates into either two peaks at N, with a total hexamer peak count for a hexamer at 16, or one peak at N, with an odd number of total peaks at 15 for each hexamer.

There are three (at least) places along the hexamer that can account for a two-peak reading or a one peak reading, in my opinion — and from what i have observed:  1) the CRD on either end, which can be folded and bent to expose part of one or two molecules of the trimer, 2) the N termini junction where there is indication that variations in binding might leave a “valley” between the 4 trimers, 3) the glycosylation sites, where (also my opinion) one two or three molecules might be attached in a lumpy manner causing a broad and lumpy plot.

At that point, i  think it will be pretty easy to “teach” a signal processing program what to look for in a symmetrical array of very varying peaks, peak heights and widths.  At least that is the plan.

One thing for sure,  LOL,  i will likely NOT use people to pick peaks, and will omit a couple image processing programs (Inkscape and which have “cute” filters and masks, but are not what provides a clearer picture of the micrographs.  The highest number of peaks came from one program where the filter was “roughen edges”  which indeed it did and cause 23 peaks to appear along the hexamer.

In terms of image processing, Gwyddion, Photoshop, and CorelDRAW, ImageJ and maybe GIMP, come out as being the very easiest to use and provide the best enhancement of the images. The filters include the most common (gaussian blur, median blur, limitrange, contrast enhancement, resampling). Likely only Gwyddion, Photoshop, and ImageJ will be used for the 100 other dodecamers.

In terms of signal processing, my favorite so far is an excel function (PeakValleyDetectionTemplate (offered by Thomas O’Haver) which is utterly simple to use and is an interface (unlike Octave) with which most are somewhat familiar. I found for my purposes the smooth 11 was best, but that would be entirely dependent upon each person’s choice. I will use batch process (lag 5, threshold 1 and influence .0 5- one setting)(app provided by Aaron Miller) and scipy (sci/py-P0.5D15W10T0H0)(app provided by Daniel Miller,  (one setting for Octave (ipeak x,y,100) (4 signal processing programs)

Syncing the x and y axis was convenient on two ways (batch process, provided by Aaron Miller) and just plain old assigning the x and y axes a graphic standard (using CorelDRAW) where aligning, superimposing, assigning peaks to one of the four possible domains of the surfactant protein D trimer was easiest for me in a vector program (CorelDRAW). There are so many examples of that vector program in this blog that it is not necessary to state that any further.

Excel has been used for assembling the metadata, and is used with online calculators (but could be done with a formula in excel).  Means for peak widths, and heights will be found….peak area?


Image processing, signal processing, citizen scientist counts – all significantly different peak counts for SP-D

Image processing, signal processing, citizen scientist counts – all significantly different peak counts for SP-D.  I am trying to find concensus here… LOL, all three methods produced results in which there is no concensus.

One arm of one molecule of surfactant protein D measured to find the best method for detecting peaks (literally hundreds of times), show that a method needs to be chosen that the researcher finds a best fit for the images.  A really bad take on a truism about how we see ourselves, might go like this “the peaks that will be, are the peaks I see”. Column colors denote the image processing programs used for the analysis. In the center set, the image processing programs are given on the right, and the signal processing programs (so many variations on the settings not listed there) are the green and yellow colors) statistics are in the third column in each set. N= for image, signal and citizen scientist peak counts is the number of “trimers”, and the mean peak count from the N term to the CRD.  The image that shows the trimer arms marked is HERE.


Peak counts of a single surfactant protein D molecule (an AFM image): Peaks counted in trimers

Peak counts of a single surfactant protein D molecule (an AFM image of a dodecamer): Peaks counted in trimers – all processing methods are summarized here.  The mean peaks per trimer can be an odd interger depending upon whether the N termini peak is actually one peak, or two, in which case the number of peaks per trimer must be an even number. Both variations occur, and it begs the question of how the four trimers are linked in the middle (ie the 12 monomers at the N termini junction)… some say it varies, and i think i would have to agree.

Variation would be a partial explanation for differences in the center area of the N term peak group in terms of small peaks at the very high point of the N term.

That is a separate topic, this is just total number of peaks per trimer. I used the same molecules, and color scheme, and dataset that has been used in previous posts concerning peaks, per surfactant protein D molecule (trimers, hexamers and dodecamers). Programs used for image processing are listed on the left. Signal processing can be found in previous posts, and i also included the citizen scientist peak counts.

The difference in positioning, and stretching molecules (right and left, or top and bottom) makes a difference in how many peaks are found. This generally is completely known to the observer, the differences in trimer length can be easily seen (as stretching, folding, or compression) and documented. In this particular case the left half of the dodecamer image  (arms 1a and 2a, have fewer peaks counted by all peak counting methods) and the right hand side of the dodecamer  has more (please look up this link to see the image) (and when i calculate the nm length it will be obvious there too). There are a couple comments on why this happens,

1) is that the extra spread shows the lumpy nature of the glycosylation peak and the adjacent slightly smaller peak.

2) the CRD and neck peaks tend to be exaggerated because of the many ways that part of the molecule can fall onto the mica, and

3) the N termini junction of the dodecamer often then shows a peak right at the high point of that N term peak and the

4) tiny peak in the valley either side of the N termini peak, is often overlooked because of the overwhelming width and height of the N termini peak.  Aside from being concealed by the brightness (high gray scale value) of the N term peak, signal processing is very inefficient in detecting tiny peaks when either side has large peaks.  So machine learning may be the only way that this tiny peak can be varified.  Peak value for the right hand arms of the dodecamer is almost 9, while the left hand is just over 7.

Peak counts of a single surfactant protein D molecule (an AFM image): Citizen scientist counts

This quick look at what the perception of peaks might be in unbiased individuals was an interesting side-trip. Two instances of KNOWN bias on the plots accounted for the smallest peak count and the largest peak count. Haha… I did not expect that.  I represent a reasonable biased count.

I have counted peaks on this same image hundreds of times (actually 274 times – one or two times with the mode of signal and image processing (not every single time, but obviously many many times) which means i counted the original images (pixelated and rough, to the most processed, gaussian blur and limitrange, and the counts with some programs which added rediculous numbers of variations (one example would be “roughen-inside” using Inkscape) and others that eliminated some detail. My peak counts from the exact two plots given to the citizen scientists differ considerably from theirs. I counted the gaussian blur 10px and limitrange 100-255 image as separate peak counts before, and at the same time as that the image was being used by another imaging or signal processing program for peak counts. The mean and standard deviation of my counts are variations in my own observations, which over two years have changed.

My variously obtained peak counts: 1) from images only, 2) from counting plots from image and signal processing programs 3) from graphing out the peak widths from plots obtained from processing.

Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: scipy.signal

I used a lot of settings with this scipy app (scipy signal find peaks)(made for me by Daniel Miller from online resources and libraries) which allowed settings of “prominence” “distance”  “width”   “threshold”  and “height”.  See below. The image of a surfactant protein D dodecamer has been in this blog dozens of times. I used (image processing = either a 5 or 10 px gaussian blur, and on half the images i used an additional filter –  limitrange (100-255)- using Gwyddion. Image processing was applied before signal processing (gaussian blur of 5 px or 10 px, and limitrange. The image processing programs are listed below. The whole dataset comprised 10 dodecamers (thus 20 hexameric arms).



Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: Octave – AutoFindPeaksPlot

All settings that I used in Octave, AutoFindPeaksPlot were lower than those which I thought should be counted.  Below see my counts of the plots generated by Octave while counting peaks, and the results of counting peaks using various settings in AutoFindPeaksPlot.m (see table, at bottom) You can see from my counts of peaks present on the output plots that accompanied the other values (peak height, width), that the autofindpeaks values (at least the default values, and a few that I typed in) are much less lenient.  I found Ipeaks.m was closer to what I liked, and likely Autofindpeaksplot.m wont be used on the other 100+ images.



Peak counts of a single surfactant protein D molecule (an AFM image): signal processing programs: PeakDetectionTemplate xlsx

This excel peak finding template (Thomas O’Haver) was easy to use to find peaks, though I personally found that peaks that I deemed to be background were typically found, while one tiny peak that I find in many images was constantly overlooked.

My counts from both the image, and the individual plots in ImageJ were about 15 peaks per hexamer, where here it was 19.25 +/- 0.82.  The N is small but there was not much variation when i used different settings. Only one image was processed, using 0.6 as the amplitude threshold and either a slope threshold of 0 or 1 (thus four hexamer (arms, CRD to CRD with N peak in the center) were measured. The PeakValleyDetectionTemplate, using this same image processing combination, that is, gaussian blur and limitrange 100-255 in Gwyddion, also produced a large number of peaks (see previous post).

The more appropriate setting for the PeakValleyDetectionTemplate was used on a single image of 41_aka_45, and that was AmplitudeThreshold 0.6, SlopeThreshold 2.5

I have more peakfinding results using other SP-D dodecamers, but just one of the dodecamer I have called 41_aka_45.