<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>AlgoOne Data Quality Assurance on Sulprobil</title>
    <link>https://www.sulprobil.de/tags/algoone-data-quality-assurance/</link>
    <description>Recent content in AlgoOne Data Quality Assurance on Sulprobil</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sun, 10 May 2026 15:44:00 +0100</lastBuildDate>
    <atom:link href="https://www.sulprobil.de/tags/algoone-data-quality-assurance/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Compare Correlation Matrices (Perl)</title>
      <link>https://www.sulprobil.de/compare_correlation_matrices_en/</link>
      <pubDate>Sun, 10 May 2026 15:44:00 +0100</pubDate>
      <guid>https://www.sulprobil.de/compare_correlation_matrices_en/</guid>
      <description>&lt;p&gt;&lt;strong&gt;&amp;ldquo;Remember, my friend, that knowledge is stronger than memory, and we should not trust the weaker.&amp;rdquo; [Bram Stoker]&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;h2 id=&#34;abstract&#34;&gt;Abstract&lt;/h2&gt;&#xA;&lt;p&gt;Some years ago I developed a Perl program for an Algorithmics client.&#xA;Over time I enhanced this program and I made it read the &lt;em&gt;RMLinks.cfg&lt;/em&gt; file&#xA;so that new risk factors would be included automatically.&lt;/p&gt;&#xA;&lt;h2 id=&#34;implementation-approach&#34;&gt;Implementation Approach&lt;/h2&gt;&#xA;&lt;p&gt;My implementation approach was:&lt;/p&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;1. Read first matrix&#xA;    Checks:&#xA;    Matrix quadratic?&#xA;    Risk factor order left-&amp;gt;right (top row) == top-&amp;gt;bottom (leftmost column)?&#xA;    Diagonals == 1 (warning)?&#xA;    No NC category (warning if there is)?&#xA;    Matrix symmetric: M(i,j) == M(j,i) for all i,j?&#xA;    [Not for DC files because not given there.]&#xA;&#xA;2. Read second matrix&#xA;    Checks identical to above&#xA;&#xA;3. Risk factors in both matrices identical?&#xA;    Warn about risk factors which are in first matrix but not in second and vice versa&#xA;    Highlight outliers per category&#xA;    Highlight outliers per currency&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;parameters&#34;&gt;Parameters&lt;/h2&gt;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;b - breaches: do not report differences between the two input matrices but breaches beyond tolerances.&#xA;d - debug [level] gives debugging information at detail level level&#xA;    level 1: -&#xA;    level 2: -&#xA;    level 3: Print all elements of matrices 1 and 2&#xA;f - read deviation file [-f needs to be followed by a valid filename]&#xA;    Reads min and max values for all slices for differences which should&#xA;    be ignored during comparison. See option -w to get format example&#xA;h - help: list parameters and their explanation&#xA;i -  ignore risk factors in a given file [-i needs to be followed by a valid filename]&#xA;m - set max rank index [default is 6 (=return highest 3&#xA;    and lowest 3 of each slice); m needs to be even and &amp;gt;= 4 !&#xA;n - tolerate risk factor category NC&#xA;r - set Algo risk factor category file [default is ./RMLinks.cfg&#xA;s - summarize findings, no detailed warnings or error messages&#xA;t - read file with tolerated changes for each matrix element and apply tolerance check&#xA;v - print version&#xA;w - write deviation file with min and max values of all slices.&#xA;    This file is comma-separated to be easily readable via Excel.&#xA;    It can be amended and used with option -f later&#xA;    [-w needs to be followed by a valid filename, preferrably ending with .csv&#xA;x - read translation table [-x needs to be followed by a valid filename].&#xA;    Risk factor names of matrix 1 will be translated by second name in comma-separated row&#xA;&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;program-call-example&#34;&gt;Program Call Example&lt;/h2&gt;&#xA;&lt;p&gt;A typical call of this program from a Shell script could look like:&lt;/p&gt;</description>
    </item>
    <item>
      <title>sbDataStats (VBA)</title>
      <link>https://www.sulprobil.de/sbdatastats_en/</link>
      <pubDate>Sun, 10 May 2026 15:44:00 +0100</pubDate>
      <guid>https://www.sulprobil.de/sbdatastats_en/</guid>
      <description>&lt;p&gt;&lt;strong&gt;&amp;ldquo;Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.&amp;rdquo; [Aaron Levenstein]&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;h2 id=&#34;abstract&#34;&gt;Abstract&lt;/h2&gt;&#xA;&lt;p&gt;Of course you could write a data checking program for any specified input.&lt;/p&gt;&#xA;&lt;p&gt;But what if you would like to throw any arbitrary data (given in a csv file!)&#xA;into a general data analyzer?&lt;/p&gt;&#xA;&lt;p&gt;For numerical data a general analysis could easily produce minimum, average,&#xA;and maximum information and also warn if any extreme value differs from the&#xA;average by more than 2.5 standard deviations, for example. For text data an&#xA;analysis program could print text frequency and character frequency information.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
