Category Archives: surfactant proteins A and D

SP-D poster images

The image below is the result of a test of whether the image filters available in various free and paid programs made much of a difference in the detection of brightness peaks (incidence, height, valley, width). The answer, it seemed to me, was that a rational application of many filters did very little to change the raw image, and even after radical filtering, such  as “posterizing” (red and yellow images below)  conveyed the same SP-D structure.

Programs used to score image filtering ranged from  “paid” Photoshop 2021, CorelDRAW 19 (also with the built-in raster editing program of the latter), older purchased Photoshop (6) and CorelDRAW (x5)(also with a raster editing program), and “free” programs with image filters such as ImageJ, Gwyddion, Inkscape, GIMP, and Paint, as well as several image filtering options in “free” Octave.  Below are samples of all of the above for checked uniformity in their individual application of filtering algorithms, using a single dodecamer as a test photo.

That photo was derived from a screen print from Arroyo, et al. Easily identified are the N term junction of the four trimers (bright center*), just lateral to that the glycosylation site (each of the four trimers shows some degree of brightness*, there are at lest three bright peaks found lateral to the glycosylation peak (as of now, not named and with no known function but highly repeatable peaks are found in literally hundreds of plots of dodecamers and separately as trimers), and on the ends of the trimers, the carbohydrate recognition domains*..  which typically have several peaks combined (which is consistent with that domain being modeled on RCSB as three flexible and floppy  globular formations. Just before the CRD domain peaks,  is the neck domain, which may or not be visible as a “peak” depending upon how the molecule is arranged during processing. (nb, the * denotes known peaks).

One image filter (Gwyddion, image presentation filter) (center image bottom row) probably does the best job of maximizing the appearance of bright spots (peaks).

The three posterized yellow images were used to test (using the same settings) whether various programs would produce identical results, which actually did appear to be true. The reasoning behind this test was that the old Adobe Photoshop 6, well outdated, but free and easy to use, was compared with the same filters in the paid version of Photoshop 2021. Similarly, CorelDRAW x5, also old, was no different in application of imaging filters with the same settings as CorelDRAW 19.  This opens opportunity for reliable image filtering to be had from existing, familiar and free programs with easy to use formats.

Image Filters and programs (out of the sample of 100 in the image below) that will continue to be used for peak finding are:

1: no processing (as a control)
2: Gaussian blur (2px, 5px (10px in one extremely pixelated image)(CorelDRAW, Photoshop)
3: Limit range 100-255 (Gwyddion)
4: Gaussian blur plus 250 highpass (Photoshop)
5: Gaussian blur plus 50-50-50 unsharp mask (Photoshop)
6: Median filter 10px (Photoshop)

This turns out to be 6 imaging filters, and 6 signal processing functions to be applied to peak finding.

Peak number comparison for SP-D trimers: 17+2 trimers, 6 peak counting apps, 2 image filters

Peak number comparison for SP-D trimers: 17+2 trimers, 6 peak counting apps (link to list below), 2 image filters (no processing, gaussian blur). No significant difference when comparing the two datasets (no processing and gaussian blur) was found in a two tailed t test.
No processing, all image and signal processing apps together

Gaussian blur, all image and signal processing apps together

 

 

Previous list of signal processing programs used with constant settings

SP-D trimer peak count along segmented tracing from N to CRD

Bright peaks (grayscale 0-255) counted along a segmented line drawn linearly (see image for one such actual trace) through the middle-width of images of SP-D molecules (AFM) show that the “number” of peaks will likely be a match for similar assessment of peaks along a hexamer, that is 8 peaks which is a number that exceeds what has been published so far by 5 additional peaks.  The data below is for two sets of peak counts of (17+2 – the latter are duplicates from a different image), the first set without processing, the second set with gaussian blur. Typically the blurs were exceuted at the minimum level to reduce pixelation in the images. Most common blur was 2px, next most common, 5px, and in one case a 10px blur was applied.

Peaks under the column of signal processing include 5 functions, frequently mentioned, and with identical settings described before where peak count was strictly adhered to, though sometimes the peaks identified by those functions were difficult to comprehend. Some peaks overlooked, some over reported (LOL) but those data were not changed.

my new favorite quote (since learning about signal processing) was given one post before…

“all models are wrong, but some are useful”

all models are wrong, but some are useful

“all models are wrong, but some are useful”

i think penned by George Box??? love it, certainly is relevant for all the plots of surfactant protein D trimers and dodecamers I have made, there is not really one model that i feel is really good (out of six models, chosen from different contributors, Github, Scipy, Octave (ipeak-M80.m and autofindpeaks, xy) and an excel spreadsheet function by Tom O’Haver called PeakValleyDetectionTemplate.xlsx, and just my own observations). None really do what I think they should, and more importantly, i let them do it without my changing the basic functions to get what i think should be the number of peaks per trimer. This is in an attempt to understand them, and to be unbiased.
The usefulness all of the plots i have made can only be determined by the reliability of the data and value it might have in determining the molecular structure of trimers (and hexamers) of surfactant protein D.  Some may find it pedestrian, i find it very informative since the general outcome is that my eyes were just as good as these apps……!!!

Using functions (Octave (iPeak, autofindpeaks), excel templates, Python/scipy, and Github/Z-score)

Using functions (Octave (iPeak, autofindpeaks), excel templates, Python/scipy, and Github/Z-score) sometimes just find more, or miss peaks that any human would detect. Choosing a single function for any of these programs as a standard doesnt give very pleasing results, but on the other hand, adjusting them for every single different plot, is bias…. SO what is the answer,  — training?, how is training AI better than training a real live sentient viewer? The options are– accepting the vastly disparate peak numbers with a fixed functions, or to find something sensible, or just using one’s well trained eye.

One easy observation is that using a gaussian blur reduces the number of peaks plotted, per the increase in peak number when “no processed” images that are pixelated causes the number of peaks to be higher.  It is clear that the best images are high res and require no image processing filters, but the reality is that not all images are great.

to pixel gaussian blur AFM image of SP-D trimerAbove image is easly read as 7 peaks (minimum) (at least to me), but the range of peaks when using the programs and functions all along in the “peak finding for SP-D” blog that I have posted, has far too big an SD (again in my opinion). (7,11,14,6,8,15 is gaussian blur 10 px, and the latter plus the no processing (hence pixelated image) is 9,17,18,10,12,13. Data together is in the right hand column, gaussian blur is data in left hand column.

Two SP-D molecules, two different published images, two different image processing programs, 6 different signal processing functions (continued)

Two SP-D molecules, two different published images, two different image processing programs, 6 different signal processing functions (continued). Using only the peak finding functions (from the various programs listed in previous blog posts), one or two tailed t-test say there is not a significant difference in the number of peaks found between the “no processing” set, and the “gaussian blur” set of plots.  Column on left is no processing, column on right is gaussian blur.


Specifics of the plots used in the analysis above is given below.  Trimers are the same ones picture in the previous blog.  This set of data has NO counts made by me from the image, only counts made from the plots made in ImageJ then subjected to various peak finding programs. The molecules represent a pair, which were in two different images, and at two different resolutions.  No difference in the process was found between these two sets.

The total number of peaks is a little bit shy of what of what i think they should be (that is,  N=8 peaks) but the comparison here is one to see what impact the original image has on peak counting outcome.

Two SP-D molecules, two different published images, two different image processing programs, 6 different signal processing functions

Two SP-D molecules, two different published images, two different image processing programs, 6 different signal processing functions — an attempt to see whether apps and image resolution cause huge differences in brightness peak determination using tracings through the middle of an AFM image, traced from N to CRD (in that direction, left to right, 1px line using ImageJ).

The summary below does not distinguish between signal processing apps…. the 6 different peak finding apps are summed into one value with the image processing filters. There is an N of 4 (two molecules, one set, trimers, 6, 7, 15, 16 with  no processing, and one set (the same 2 molecules filtered with a gaussian blur (trimer 6 and 7, gaussian blur 5px — trimer 15 and 16 gaussian blur 2px). The no processing image was lower resolution, the second set was an image at higher resolution.  Both sets were plotted for brightness peaks.

The two trimers are seen below.  A total of 48 peak finding plots, 6 plots each trimer with no-processing, 6 plots each trimer with gaussian blur filtering.  Trimer 6 and 15 are the same molecule, different images, as are trimers 7 and 16 the same molecule, different images.

In the summary below, data for the same molecule  were combined (6, and higher res duplicate 15)( 7 and higher res duplicate aka 16). Two categories of image processing (no processing, and gaussian blur) were applied.  Both molecules traced with “no processing” showed higher peak numbers (likely due to greater pixelation of the images. Gaussian blur even when sparingly applied decreased peak number.

However the mean of the group (7.98) is supports the idea that there are 8 peaks per trimer. This is what has been seen consistently using all 6 signal processing functions (all at the same settings) for peak finding.

Parallel plots of a trimer of SP-D, AFM and shadowed TEM

Parallel plots of a trimer of SP-D, AFM and shadowed TEM, took me about three seconds to pick two arms of SP-D, one from an AFM image one from a shadowed TEM image. No problem seeing similarities, the similarities in peaks were immediately apparent and similarities were pretty striking.  Even though the shadowed images have that typical lumpy background and it is a little difficult to ‘swallow’ the possibility of there being structural information in both images that confirm the number of peaks in a trimer of SP-D i think the evidence might be substantiated with more samples.  Is is very clear that there are many more than the reported 3 peaks along the AFM and the shadowed images. The three known peaks ( N termini junction, glycosylation peak(s) and the CRD peaks).

Images were adjusted so that these “cropped out trimers” have the N terminus peak on the left, CRD peak(s) on the right and a grayscale plot was made to compare peak heights widths valleys, and of course peak number. Top image AFM, bottom shadowed TEM.  In the two trimers below the N terminus peak is very bright since I cropped the trimers from dodecamers.

The difference between the top AFM image and the bottom that is immediately noticeable is (and confirmed in other plots) is that the trimer in the AFM is glycosylated, so next to the N peak on the left, the bright peak is the glycosylation peak.  The SP-D used for cropping out a trimer for the bottom part (shades of gray) of the shadowed image has not been shown to be glycosylated, so a glycosylation peak is not expected…… therefore that bright spot in the AFM denoting glycosylation is NOT more a “low peak”. Other peaks along the respective plots are very similar.

The biggest differences are 1) the tiny peak in the valley by the N term peak is prominent, 2) there is likely no glycosylation (or minimal) in the SP-D batch that was shadowed (i have an AFM of that and will compare). 3) See the red circle for glycosylation peak on AFM image and not that bright peak where glycosylation would occur on the shadowed image (dotted line circle). There is a definite “foot” bend in the N term portion of the shadowed image, but also somewhat seen in the AFM image . The plots are very comperable 4) and the low brightness between the N and collagen-like domain is really prominent, and means something important in my opinion.

A new plot and set of images (3 this time) are below.

Since the biggest difference between these two arms, besides the methods of imaging, is the peak that is supposed to indicate glycosylation is different in all three.  The top image below is definitely glycosylated (Arroyo et al, 2018) the AFM image at the bottom of the figure below is “possibly” partly glycosylated from the same authors.  There is only ONE image of their deglycosylated SP-D dodecamers that I could find, from which the bottom SP-D trimer was cropped for comparison. The other three arms of that particular dodecamer labeled as deglycosylated SP-D had varying peak heights at the alleged glycosylation site. Whether one, two or three strands of a trimer are glycosylated seems to be an open question, and all the AFM images seem to indicate this since there is consistent peak brightness (height).

The particular trimeric arm of the deglycosylated SP-D molecule (so not a separate trimer, hence the high N term peak brightness), was selected not because of the glycosylation peak absence just because it had similar angles and would fit nicely one above the other. So the orientation was the primary selection bias, not the glycosylation peak height.

An assembled graphic (below) was prepared with molecules with an approximate shape, size and curvature to show three things: 1) the deepness of the valley between the N term and collagen like domain peaks, 2) the difference in the shadowed image (for which dodecamer the glycosylation state has NOT been determined, but it appears as if it is NOT glycosylated) and 3, the similarities of the peaks (excepting the variable glycosylation peak(s) of the three trimer sections of dodecamers with the two methods, and two separate preparations of SP-D.  The grayscale plots on the right hand side of this image below shows the exact tracelines and the resulting plots for each trimer (obtained in ImageJ, exported to csv, plotted in excel).  The exciting thing is that the peaks in the two prep methods (AFM and shadowed) are just really wonderfully similar.

I had been skeptical of cropping out any SP-D arms to trace for grayscale levels in the shadowed image and just gave it a try… lest anyone accuse me of “doctoring” data…. no data have been changed, just the outlines.   AND one really nice results is that the tiny peak that i have mentioned hundreds of times, that lies in the valley between the N terminus peak and the glycosylation peak is prominent in the shadowed image.  If you look at the middle plot left hand side, you see the first tall Nterm peak, and in the shadowed image, a smaller peak on the downslope shoulder.

Little “feet” at the N term side of SP-D trimers

I have looked at these image for years, just now thinking that the bend that is sometimes seen on the N terminus side of the SP-D trimer has an impact on how those multiple N termini junctions might assemble to make multimers with different attributes at that junction: whether end to end, one top the other, or bending for form a ring as is seen more often in multimers with many trimers. So while the flexibility of the CRD regions of a trimer, which are easily seen in AFM image of SP-D are pretty obvious, finding a low spot, a reasonably “thin” part of the molecule seemingly just between the N and collagen-like domains, I have seen on hundreds of plots, however, that it has a “flexibility” is certainly something to think about (per image below).

Two trimers from the same AFM image are shown below, rings are drawn around the “bend”, arrows point to a very low point in the grayscale scan which is a very common feature of SP-D near the N term bright peak.

AFM images of SP-D trimeranother image with an N term “foot” like bend

another trimer with an N term “foot” like bend

…and another one

I looked up all images in Arroyo 2018 and totalling up 50 trimers, only 5 had that little crooked hook at the junction of the N term domain and the collagen-like domain.