Daily Archives: May 12, 2022

If there are 15 peaks in a surfactant protein D hexamer, how should they be sorted?

I mentioned in the last post that there are some structural characteristics of surfactant protein D that are accepted as reasonable (which seem pretty likely to be the case). This includes the character of the carbohydrate-recognition domain, which has a structure that is globular.  The yellow arrows (figure below the molecular model) show what kind of peaks (on an ImageJ plot) will occur in a trace through the center width of a hexamer (i.e. a line through CRD+neck end of a hexamer to the opposite end where the neck+CRD of the second trimer are).

In the grayscale plots of SP-D it has become clear that there is an assortment of peaks that occur at the CRD and neck region that are accounted for by the positions of the CRD as they dangle from the coiled neck domain, sometimes lying atop each other, or side by side, or over the neck region completely. One can anticipate that two peaks at the end of a plot of a trimer might represent one CRD domain lump and a peak of lesser height as the neck domain. Other times, there will be two peaks of similar height. In these cases when I observe the plot superimposed on the image it is clear which peak is aligned with which domain. I have given the peaks separate colors, as I have sorted them out during this project.

You can check out many previous posts where I have sorted peaks by domain (aka color), both known peaks and those identified but not named (the colors have been pretty consistently assigned during this process). Image on the right is what RCSB shows for the CRD+neck end of a trimer of SP-D, and even in this ball and stick model (colored by amino acids), the density of the transparent image shows one that a grayscale plot could show brightness peaks (and it does show them) where there are two CRD overlapping.

The image below is a dodecamer which has been image processed with a gaussian blur and a grayscale limitrange of 100-255. I have superimposed the transparent CRD+neck ball and stick models over the end of an image of a dodecamer (arroyo et al) (41 aka 45 by my number). Yellow arrows show where several peaks in one CRD region will be found.

Looking at the middle, and brightest area (the N termini junction) there is a slightly darker center in that peak which is often detected in peak counting. In addition this molecule also has varying brightnesses of the glycosylation peaks (in particular compare the lower right hand trimer with the upper right and upper left hand trimers).

 

Summary of number of grayscale peaks in the trimers of a single dodecamer of surfactant protein D

Summary of number of grayscale peaks in the trimers of a single dodecamer of surfactant protein D.  The mean is very close to 8 peaks. This includes the entire width of the N termini junction with each trimer counted as a peak (sometimes there are two peaks here but only counted once). The total number of trimers plotted for each processing type (image processing, signal processing, and citizen science opinions) are included so that it is understood that the heaviest investment in peak counting occurred with image processing (various programs and various filters and masks). The next most common processing was signal processing, which included initial image processing and automated peak counting after signal processing by various algorithms.  Lastly, small number of random individuals were asked to count what they thought were peaks along two plots (hexamers) of the very same image that was used for all other peak counts.  At this point, the image and signal processing programs which come closest to producting the 8 peak count will be used for other dodecamers (about 100 of them). This translates into either two peaks at N, with a total hexamer peak count for a hexamer at 16, or one peak at N, with an odd number of total peaks at 15 for each hexamer.

There are three (at least) places along the hexamer that can account for a two-peak reading or a one peak reading, in my opinion — and from what i have observed:  1) the CRD on either end, which can be folded and bent to expose part of one or two molecules of the trimer, 2) the N termini junction where there is indication that variations in binding might leave a “valley” between the 4 trimers, 3) the glycosylation sites, where (also my opinion) one two or three molecules might be attached in a lumpy manner causing a broad and lumpy plot.

At that point, i  think it will be pretty easy to “teach” a signal processing program what to look for in a symmetrical array of very varying peaks, peak heights and widths.  At least that is the plan.

One thing for sure,  LOL,  i will likely NOT use people to pick peaks, and will omit a couple image processing programs (Inkscape and paint.net) which have “cute” filters and masks, but are not what provides a clearer picture of the micrographs.  The highest number of peaks came from one program where the filter was “roughen edges”  which indeed it did and cause 23 peaks to appear along the hexamer.

In terms of image processing, Gwyddion, Photoshop, and CorelDRAW, ImageJ and maybe GIMP, come out as being the very easiest to use and provide the best enhancement of the images. The filters include the most common (gaussian blur, median blur, limitrange, contrast enhancement, resampling). Likely only Gwyddion, Photoshop, and ImageJ will be used for the 100 other dodecamers.

In terms of signal processing, my favorite so far is an excel function (PeakValleyDetectionTemplate (offered by Thomas O’Haver) which is utterly simple to use and is an interface (unlike Octave) with which most are somewhat familiar. I found for my purposes the smooth 11 was best, but that would be entirely dependent upon each person’s choice. I will use batch process (lag 5, threshold 1 and influence .0 5- one setting)(app provided by Aaron Miller) and scipy (sci/py-P0.5D15W10T0H0 or W5)(app provided by Daniel Miller,  (one setting for Octave (ipeak x,y,100) (4 signal processing programs)

Syncing the x and y axis was convenient on two ways (batch process, provided by Aaron Miller) and just plain old assigning the x and y axes a graphic standard (using CorelDRAW) where aligning, superimposing, assigning peaks to one of the four possible domains of the surfactant protein D trimer was easiest for me in a vector program (CorelDRAW). There are so many examples of that vector program in this blog that it is not necessary to state that any further.

Excel has been used for assembling the metadata, and is used with online calculators (but could be done with a formula in excel).  Means for peak widths, and heights will be found….peak area?