Daily Archives: April 19, 2022

Point anomalies, pattern anomalies ??

There are programmers who have interests in biology, I am finding, that understand the need for signal processing of repeating and symmetrical pattern-containing signals.  It is an important issue, as there are times when i look at the peaks defined by algorithms (such as Lag, Threshold, Influence) which just dont do what I would like them to do, and as well, there are examples of peak finding (and peak ignoring) which just dont make visual sense to me.  See the plot below (end to end tracing (grayscale plot) of a surfactant protein D hexamer) that has peaks detected using the LTI values using an app made expressly for me by Aaron Miller. It detects lots of the peaks that are obvious, but I am pointing out in the middle and lower images, those peaks which because of the previous values are just ignored, while other peaks which are just tiny bumps in a larger peak, are tagged as a separate peaks.

Top image: blue line is the grayscale plot; boxes are the peaks widths (marked as valleys on either side of the algorithm’s detection of peaks (purple lines); Grayscale axis (y) normalized to 100. This set of peaks in this particular plot (representing one CRD-CRD segmented line drawn through the center of a single dodecamer of surfactant protein D (image is from Arroyo et al). Overall the plot is not very different in terms of peak number than that ascribed to the plot by “citizen scientists” (friends and family) “my counts” (about 500 of them) and various signal processing programs (Octave/matlab peak detection functions); Scipy app (from Daniel Miller), excel peak and valley detection templates (Thomas O’Haver); and an LTI app (from Aaron Miller).

Here are two instances where i don’t like the peaks that are flagged. This sample is from the LTI app (A Lag of 5 will use the last 5 observations to smooth the data. A threshold of 1 will signal if a datapoint is 1 standard deviations away from the moving mean. And an influence of 0.5 gives signals half of the influence that normal datapoints have.) I have put into the link the LTI values for this particular plot.  Two specific instances where i disagree are shown in the plots below, each an excerpt from the complete plots above. Plot excerpt on the left shows one peak NOT detected ( fat red line above the undetected peak), and on the right shows a nonsense (in my opinion) peak (tiny thin red bar above the peak).

RED BARS are over the peak on the left i would LIKE to have detected, and red bar over the peak on the right seems like it should not have been detected. The challenge is to find a model plot and compare the “real plots” to back to the model thus allowing for the extraordinary discrepancies in peak height and width to be tagged, and not removed in moving averages.  The same issues exist in image processing…. but one ends up using judgement, but then, with judgment comes bias.

It is these irregularities that are causing me to go into signal processing for biology with much disappointment.