Category Archives: surfactant proteins A and D

Four SP-D molecules: glycosylation site peak width

Below are the numbers for peak width for the glycosylation peak. The characteristics of peak, that is whether it is a single simple peak or a peak with multiple elements,  I dont find good mention of in the literature.

It is worth noting that because there presumably are three sites (one per strand of the trimer) which are potentially glycosylated, and whether there could be a bumpiness to the bright areas in an image representing the glycosylation peak. This might reflect structural features of “image” the three single strands if they are or are not glycosylated while wound in to a trimer thus causing a displacement of glycosylation sites perhaps showing up as peaks within the glycosylation peak (envision three beads on three separate cords, wound tightly, each displaced slightly creating a lumpy look.  This is completely different than the images which show the large CRD and neck elements which do show large bright areas quite distinctly separated at either end of the hexamer.  see CRD peak appearance with AFM images, left, and possible completely glycosylated trimer on right (NOT TO SIZE).

In most micrographs there is a dominant “bright” region which has been identified as the glycosylation peak in numerous publications, including the one by Arryoy et al which i rely upon heavily) as a single peak. Close visual examination reveals a couple of things, 1) it is a peak of varying width, and 2) sometimes it is possible to see small sub peaks within the whole. Signal processing and peak counts from images before and/or with the application of image filters, it is common to see bright spots within what might be considered a single larger bright region (brightness in this context is higher on the gray scale value y axis than lower). Defining the “modest but colvincing ” number for peaks along a trimer has been the focus of this research. Both total relative peak heights and valley to valley widths (not taking into account slope or mid peak areas) have yet to be determined for a hexamer of SP-D.

So using just a half dozen analysis protocols on FOUR (4) SP-D dodecamers (16 trimers – where the entire N term peak is included in each trimer), the first three peaks (beginning at the whole N termini junction peak) looks like this. N=19.86nm+1.46, tiny peak=2.53+0.74, glycosylation peak width=8.4+0.69.

My tendency was/is to see the AFM images of this molecule as having 15 peaks. Two pominent CRD peaks at either end of the hexamer, one peak (possibly with its own central valley (so moving the odd number of peaks to an even 16) in the middle and two recognized peaks either side of the N term. These peaks are numbered in many of these blog posts as Peak 1=N term, two tiny peaks on either side of 1, mirrored as peaks=2, then the two glycosylation peaks =3 (either side and lateral to the tiny peaks and sometimes clustered peaks.  Peaks which follow I am now suggesting are “real events along the collagen like domain”  as peak 4,  moderate size, fairly wide peaks, next very narrow low peaks=5, broad low peaks=6, and sometimes visible peak which appears to be the coiled coiled neck domain where sometimes it is covered by the CRD, sometimes not=7, at the end, the CRD peaks which are very prominent sometimes exhitibing clustered peaks=8. These I have colored with specific colors through out the peak counting process, as my own assessment of which of these categories the signal processing peak programs miss the mark (which sometimes they do, sometimes they dont).

Not all trimers show all peaks, thus the attempt to find a mean width, peak height and humber amont the dozens of AFM images I have collected.

Here is a link to description of the ridge plot (Joy plot) shown below.

Four surfactant protein D dodecamers: N termini peak widths

Summary of peak widths (valley to valley) using image processing as well as five signal processing programs to calculate that dimension. In one sense the results are really good, in another sense they are basically so similar to those just done by “my minds eye and a good measuring stick” that the two years has provided no new information.  20 nm is a very stable number for the valley to valley measurement of the N termini junction of surfactant protein D as is shown in AFM image.   This value is found as all trimers with the full N term peak, as for each of the individual molecules that have been measured (in this case my numbers for them are arbitrary: 41_aka_45; 42a_aka_44; 43; and 97-1) and also the mean of the four dodecamers (N=4) . The number of  trimers measured in one molecule was greater than the other three  so this data set was primarily made only those plots which have similar processing.  That is with 1: images with no processing, 2: images with a gaussian blur (either 5 or 10px); images that have a gaussian blur and a limitrange filter. 3: each of those images from each of those groups of image processing were then exposed to signal processing by five different algorithms (as part of several programs: Octave (Autofindpeaks.m, and iPeak.m; Scipy; LagThresholdInfluence (batch processing) and an excel template for Peakand ValleyDetectionTemplate.xlsx). Therefore the balance of bias was similar among the four images of SP-D.

The tiny peaks on either side near the valley of the N term peak are located by image and by a combination of image and signal processing algorithms only about 31% of the time. This is a little disappointing, but hand counting the peaks provides a more robust counting of those tiny but very consistently found peaks.

The number of peaks WITHIN the N term peak varies from 1-2 typically. About 50% of the N term peaks consist of two identifiable sub-peaks. The widths of the sub-peaks is not always equal and depends upon how the segmented line is drawn through the image.

Width of the peaks is about 3nm wide, which still can be identified within the micrographs very often. In this set there is one value which is large, skewing the data.
I calculated that single image without the large value below.

Here is the information (peak width in nm of the tiny peak) from months and months ago, which literally has not changed with the addition of the signal processing assessments.  I dont know whether to be happy this didn’t change, or be miffed because of the amount of time to confirm that the original assessment was pretty good.  Tiny peaks appear about 30% of the time (rather are measured as “peaks” by the five signal processing and one image processing plots about 30% of the time. There are many times that tiny peaks are visible, just are not counted.

Four surfactant protein D dodecamers: trimer length in nm

The length of a trimer was determined  from the center of the N termini junction peak to the edge of the carbohydrate domain (since this is a lumpy molecule…. the most central route for the line (plot) through the CRD was used. Many examples of the type of line are given in previous posts.

Applications were AS BEFORE- No processing,  gaussian blur (5 or 10px) and gaussian blur with limitrange (100 (or above)-255), and each is then subjected to all of the following signal processing applications for peak detection: Lag 5, Threshold 1,  Influence 0.05 (batch process), Scipy (Prominence 2, Distance 30, Width 5, Threshold -, Height -), Octave, autofindpeaksplotx,y and iPeaksM80, and excel template PVDTxlsx smooth 11.  This means that there are two hexamers (four trimers) each with no processing, some processing, and each of the latter subjected to 5 different signal processing routines.  Some select other processing can be included.

Trimer widths in nm, as all lumped values, individual values, and as a single group of four.

The number of peaks per trimer is calculated with all independent measures, as molecule measures (that would be each trimer of four individual SP-D images). The N term peak is calculated in full, WITH each N term peak that is not divided into two peaks. So adding up the peak number — keep in mind that N is counted twice if it is a single large peak.

Previously, using dozens of image and signal processing programs for literally hundreds of plots, of a single SP-D image (41_aka_45), the number of peaks per trimer was 8. Check out the post here. It is very encouraging to have a selected set of image and signal processing programs provide almost identical results to that original single molecule dataset. That means to me that the gaussian blur and limit range imaging filters can be used somewhat confidently to provide easier counting of peaks along an arm of a hexamer (or trimer) of SP-D.

Just analyzing the left hand side of each of the four images, the trimer length in nm is different (as relates most likely to preparation artifact, in stretching or folding of the arms.

just the righ hand side of the image (the second trimer in the hexamers to be traced are as follows.

There are so many ways to sum these arms up i just decided to do them separately as 16 trimers comprising 4 dodecamers.  It makes little difference that i can detect and a nice conservative number of 145 nm as the usual hexamer length, and  73 for the usual trimer length (not exact yes… ). but counting more molecules might be more efficacious than deliberating on just four.

Two conclusions. 1)  Image processing is helpful, signal processing to count peaks, not so much.  The plots smoothed with the gaussian blur and enhanced with the limit range function are easy enough to manually count peaks, and the peaks counted manually are guided by ones knowledge which the current algorighms are not (as in not recognizing symmetry, and not permitting small peaks to follow big ones. etc).  When a few more molecules are counted and added to the list then perhaps I will find someone who knows how to “train” peak counting algorighms.  LOL.

2) The number of peaks per hexamer is likely to be between 13 and 15. THis is much greater than that proposed by Arroyo et al, who found 5.


Four surfactant protein D dodecamers: comparisons: peaks per hexamer

Four dodecamers of SP-D, plotted in ImageJ, with and without gaussian blur, limit range image processing, and processed again with signal processing algorithms to determine peaks per hexamer.
Applications were – No processing or gaussian blur (5 or 10px) and gaussian blur with limitrange (100 (or above)-255), and each is then subjected to all of the following signal processing applications for peak detection: Lag 5, Threshold 1,  Influence 0.05 (batch process), Scipy (Prominence 2, Distance 30, Width 5, Threshold -, Height -), Octave, autofindpeaksplotx,y and iPeaksM80, and excel template PVDTxlsx smooth 11.
Four dodecamers were included in this set of numbers below…. analyzed as a total input, and as an N of four. Not a big difference.  This number is not too far off the extensive search for peaks in just ONE dodecamer (which was chosen for its microscopic appearance), where the peak number per hexamer was about 15,

BIAS vs LEARNING: am I rejecting valid “learned” input

No surprise here — except that it was a surprise, a little bit anyway, that as I add more and more peak finding algorithms to the bank of data on surfactant protein D, and understand that the input values for those algorithms are “human” intuitions (knowledge), then it is no surprise that as I find peaks just by visually scanning a grayscale plot of SP-D that I can hear my thoughts… 1) what is the relationship between the peak I am examining and the peaks along the entire molecule; 2) what is the relationship between the width of the peak and the entire plot, 3) height of the peaks that i consider noise.  I have never considered myself to have any knowledge of algorighms… i have no interest in math or equations or programming, but thats actually what I do when I examine a plot and pick my own “peaks”.

For me it was an interesting revelation.  I value my input now more than I did previously as the whole search for signal processing programs to analyze SP-D grayscale peaks was because somehow I felt that my peak choices were not “scientific” (and of course if i submit a manuscript, you would have felt that my peak choices were not “scientific” either, as you review the submission.  In fact however, the “ai” in my mind, is superior to any of the peak finding programs (for this very narrow, and specific peak finding task (as i would not suggest that in some of the noisy data from other applications i could even begin to find peaks……. but specifically in this data, where there are a reasonable number, say something around 10-20 peaks, my input is considerably more sensible than the algorithms I have found optimal in Octave, excel template (PVDT), scipy, and LTI… just saying…  LOL, why should i reject my own observations and accept ignorant input.

Yes I introduce BIAS in my peak finding, i call it LEARNING HERE. Ultimately I will compare the data from my peak counts, to those algorithms.

I see the comments that say  “you have to “fine tune” these algorithms”…. thats what the cortex does, fine tune.  Signal processing provides as much opportunity for introducing learned BIAS as image processing, the whole thing gets reduced to integrity of research, sample number and common sense.

I am NOT talking about really noisy data, ….  where casual inspection would be nearly useless.

Ridge plot, and sample plot using PeakValleyDetectionTemplate-xlsx: surfactant proteint D

The choice of what algorithms to choose for analyzing 100+ surfactant protein D images (AFM) will amount to selected image and signal processing functions. 6 – 10 plots will be made for each hexamer of a single dodecamer.  The concensus plot to which to compare all these plots will come from a concensus of peak number, and peak widths and peak heights from these data.  A composite ridge plot created from a single molecule (41_aka_45) (below) indicates that very predictable patterns of peaks and valleys do occur.  That said, it is pretty certain that there in an ideal tracing of a trimer there are 7 peaks, and something around 15 in a hexamer (which depends upon whether there are portions of two CRD in the tracing and two peaks found in N (which does happen).

1. maybe two peaks at N
2. two tiny peaks on either side of N
3. The glycosylation peak man NOT be just one peak but a set of rolling peaks – as the number of carbohydrates goes from 1, 2, to 3.
4. a consistently appearing peak peak similar in size to the glycosylation peak located on the lateral sides of the glycosylation peaks, and
two additional,smaller, as yet unnamed peaks, lateral to that peak and before the neck domain.
5. Occasional appearance of a peak related to the neck domain, dependent upon which direction the globular carbohydrate recoginition domains lie during processing.
6. One or two large peaks of similar width and height occur on either end of the hexamer (depending upon the positions the carbohydrate recognition domains fall into as they settle during processing.

Image processing:
ImageJ for tracing all plots
one image unprocessed
Photoshop 6 for sizing, dpi, contrast and gaussian blur (5px or 10px filters)
Gwyddion for gaussian blur (5px or 10px) and limit range (@100-255)
CorelDRAW x5 for graphics and normalizing plots, dividing plots into trimers,

Signal processing:
one image unprocessed (other images image processed as above)
PeakValleyDetectionTemplate-xlsx-smooth 11
Lag 5 Threshold 1 Influence 0.05 (?Stackoverflow I dont know whom to credit)
Octave, Ipeak M80
Scipy (Prominence 0.2-Distance 30-Width 5-Threshold 0-Height 0)

Data collection:
Excel and

Peaks will be traced as hexamers using a 1 px segmented line, always left to right, and always with an ID of which arm ( hexamers (arm 1 and arm 2), and trimers (arm 1a (always left, and 1b always right, etc). No rotation of the original image will be made (to randomize the traces in terms of possible warping in depending upon the direction of the line as it follows the molecule (this was a serious issue in Gwyddion – but i have not detected it so far in ImageJ), plots exported from ImageJ will be saved, plotted in excel. Screenprints of the molecule, tracing, and plot will be saved as “white paper”.

Ridge plot and individual plots: comparison

This is really a visual document of what I have found with a single surfactant protein D molecule (image of Arroyo et al, plots made in ImageJ, some programs for image processing include the industry standards (corelDRAW, PhotoPaint, Photoshop and others) and industry standards for signal processing (Scipy, Octave, and excel Templates (Thomas O’Haver, and others).  It demonstrates to me that the known peaks in SP-D are not the only peaks that will influence a model of the structure of that “trimer” “hexamer” “dodecamer” or other “multimer”. Each trimer shows what I believe to be at least 6 peaks on either side of the N termini junction peak (the tallest peak in the center of each of the diagrams below). The plots here are of hexamers – that is, two trimers with C term on either end of the plot (ie mirrored) , and the N termini junction in the center of two trimers.

The set of plots in the ridge plot (plots stacked and staggered, background black) are those obtained from various image and signal processing plots.  Sample plots (from one of the two hexamers of this particular protein (which i call 41-aka-45), are of arm 2). I have colored the peaks that are known  (light orange in center=N term juncture; darker orange on either ends=carbohydrate recognition domains; lighter green=glycosylation peaks; Purple peak=unknown tiny peak in the valley of N term juncture; darker green=unknown peak beaide the glycosylation peak; pink peak=consistent narrow and not tall peak; yellow=neck region, often seen peak that corresponds with the differences in obscurance by the CRD peaks as they may or may not lie over the alpha coiled neck domain.

Lower image=actual plots (some from each arm of the hexamer) to demonstrate how the different colors in the ridge plot have been determined.

Ridge plot: a single surfactant protein D dodecamer – many image processing programs

This was an idea I got from searching how to compare grayscale plots of AFM images of varioius molecules, but in this case, that is, the RIDGE PLOT. Image below is a ridge plot of of just a few of the grayscale (LUT plot) tracings made of a single SP-D dodecamer (image 41_aka_45). The variations in the plots arise from the two different hexamers (top plots are one hexamer, bottom half is the other hexamer), subjected to various image processing filters (in numerous programs). The second source of variation would be where in the center of each hexamer the segmented line for the plot is drawn (all drawn using ImageJ).  The differences in image processing filters cause noticeable differences in peak height, but change little in the number of peaks per plot.  The basic shape of the plot of each hexamer is very consistent, but the plots for the two hexamers has greater variation.

While I dont know yet if there is a way to compare the ridge plots using signal processing (I continue to look for that) in the mean time, this purely graphic representation (unedited) shows clear evidence of consistently appearing peaks between the N termini junction peak and the peaks associated with the CRD (no new news there). But the visual information is quite accurate.

Where plots are almost identical one understands that the variations in filters did very little to change the grayscale plots, where there is greater difference, then the image processing enhanced or reduced the heights of the peaks. More variability in the plots one of the two hexamers is evident (top tracings).

Image with plots normalized for x axis and center peak centered in this image.

I will use CorelDRAW to normalize the width of the plots, and center the N termini junction.

Here is a ridge plot, each tracing staggered slightly to the right, and transparent graded color marking KNOWN, and reported peaks.  Center=N termini junction, light orange; on each side of the N term, Glycosylation peaks=green; and at right and left ends, CarbohydrateRecognitionDomain peak(s) sometimes one sometimes two=orange.

N termini junction peak width in nm for one surfactant protein D dodecamer, tiny peaks at valley beside it

Lots of measurements, maybe this will allow me to pick and choose which processing provides what I think is the best overall measurement of the N termini junction peak for SP-D.  This includes image processing (dozens of programs and filters and masks done separately) signal processing (several algorithms from libraries used by Octave, Scipy, some excel templates and others) and a section of citizen scientists who pointed out peak number and position.

The tiny peaks on either side of the N termini junction are elusive, and dont show up all the time. Fact is that I see them frequently but only record their widths and peaks if the signal processing programs detect them. It is likely that I pick them out when signal processing only does so occasionally As for citizen scientists (1/52 plots) does not. Therefore a comparison of that image processing vs the signal processing data will be quite different. 165/632 trimer tracings (38%) showed a tiny peak (38%).  The image processing was best at detecting those small peaks on either side of the tall N termini junction peak (detected 133 times out of the 332 plots made with images processed in various manners. Citizen scientists just did seem to see it.   The mean nm for that peak (actually those peaks, tiny, and at the valley on either side of the N termini junction peak) is shown to have a peak width (this is measured valley to valley) of 3.55nm.