<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://winsh.me/feed.xml" rel="self" type="application/atom+xml" /><link href="http://winsh.me/" rel="alternate" type="text/html" /><updated>2025-11-30T18:59:16+00:00</updated><id>http://winsh.me/feed.xml</id><title type="html">Kjell’s Page</title><subtitle>Personal page</subtitle><entry><title type="html">What Is a Suckless Program and How Can It Improve Your Health?</title><link href="http://winsh.me/suckless/software/open-source/philosophy/health/%22software/design%22/2022/11/08/what-is-suckless-software-and-why-i-like-it.html" rel="alternate" type="text/html" title="What Is a Suckless Program and How Can It Improve Your Health?" /><published>2022-11-08T00:00:00+00:00</published><updated>2022-11-08T00:00:00+00:00</updated><id>http://winsh.me/suckless/software/open-source/philosophy/health/%22software/design%22/2022/11/08/what-is-suckless-software-and-why-i-like-it</id><content type="html" xml:base="http://winsh.me/suckless/software/open-source/philosophy/health/%22software/design%22/2022/11/08/what-is-suckless-software-and-why-i-like-it.html"><![CDATA[<p>In this blog post you will learn what the term suckless program comes from, and what properties suckless software have.
I will also discuss why I like to use suckless programs.
Last but not least, I will discuss why I think suckless programs can improve your mental and physical health.</p>

<p>UPDATE: I have made <a href="https://www.youtube.com/watch?v=WGXue_lu4e0&amp;t=84s">a video version of this article</a>.</p>

<h2 id="background">Background</h2>

<p>The term suckless software has come out of the suckless organization. The suckless organization is a network of people centered around <a href="https://suckless.org/">suckless.org</a>, its mailing list, and the founder of the suckless organization <a href="https://garbe.ca/">Anselm R. Garbe</a>. Anselm has explained why he started the suckless organization in <a href="https://twit.tv/shows/floss-weekly/episodes/355?autostart=false">an episode of the FLOSS weekly podcast</a>. I recommend this podcast to anyone who wants to get insight into the suckless organization from Anselm’s point of view. Anselm is also the author of a popular tiling window manager called <a href="https://dwm.suckless.org/">dwm</a>. The suckless organization publishes and maintains several software programs and has a <a href="https://suckless.org/philosophy/">philosophy document</a> explaining its core beliefs related to software.</p>

<h2 id="properties-of-suckless-software">Properties of Suckless Software</h2>

<p>First, it makes sense to call the software published and promoted by the suckless organization suckless software. It is a very subjective matter what a suckless computer program is. However, based on the texts and programs published by the suckless organization, one can distill a few properties that a program should have to be called suckless. I would summarize the necessary properties of a suckless program like this:</p>

<ul>
  <li>The Program follows the UNIX philosophy for software design. The program should focus on doing one task and doing that task well.</li>
  <li>The number of features should be limited, and features that can easily be accomplished in some other way (e.g., by piping the output of a program to some UNIX utility) should be excluded.</li>
  <li>The program’s source code should be as easy to understand as possible. Few lines of code usually mean that it is easy to understand what the program is doing.</li>
  <li>The program should also keep things simple and easy to understand regarding the build and configuration process. Typically suckless programs are configured by editing a source code file (commonly named <code class="language-plaintext highlighter-rouge">config.h</code> if it is a program written in C).</li>
  <li>It is okay to depend on other software and libraries if this makes things simpler, easier to understand, and maintain. However, it is positive if the program has few dependencies, making it less likely to break due to a change in some dependency.</li>
  <li>A suckless program’s user interface should primarily focus on technically skilled users such as programmers. The creator should take great care to make the user interface efficient for experienced users, but user-friendliness for users who don’t have a software engineering background is less important.</li>
  <li>Last but not least, the source code of a suckless program should be freely available, and modifications should be permitted.</li>
</ul>

<p>Programs that don’t follow the properties outlined above might suck (be bad) because they are/have:</p>

<ul>
  <li>Difficult to use, extend and maintain due to the complexity of the code and the build system.</li>
  <li>Hard to combine with other programs to extend functionality.</li>
  <li>Have many bugs due to unnecessary complexity.</li>
  <li>Easily break if external dependencies change or have bugs.</li>
</ul>

<h2 id="example-of-a-suckless-program">Example of a Suckless Program</h2>

<p>An example of a suckless program is a slide show presentation program called <a href="https://tools.suckless.org/sent/">Sent</a>. Sent takes a plain text file as input and displays a window with slides derived from the text file. An empty line signal that a new slide begins. A slide consists entirely of unformatted text or an image. Slides are automatically scaled so the content fits the window. To write a text slide, one writes text, and to create an image slide, one writes a line with the format <code class="language-plaintext highlighter-rouge">@path_to_image_file.png</code>. Sent’s source code is approximately 1000 lines of C code, including whitespaces and comments. One can contrast this with LibreOffice’s roughly 10 million lines of code. LibreOffice is an office suite containing a presentation tool, so it can do more than just presentations. The comparison is thus not fair, but since Sent’s code size is only about 0.01% of LibreOffice’s code size, it still shows a big difference in complexity. This difference means a lot since a programmer could easily fully understand the code of Sent in less than a day, while it would probably take a year or more to understand LibreOffice’s source code to a similar depth.</p>

<p>The program Sent can be so simple as it leverages the fact that many experienced computer users are confident and efficient at editing text in at least one text editor and one image editor. With such external tools, Sent can provide similar functionality as LibreOffice’s presentation tool.
Additionally, Sent is more user-friendly and flexible than LibreOffice’s presentation in several ways. Users can edit the Sent presentation in Emacs, Vim, GEdit, or whatever text editor they like. One can create images for image slides in Gimp, Krita, Kolourpaint, Incskape, or whatever fits the particular picture or user. Last but not least, plain text files are straightforward to manipulate programmatically. For example, if a user wants to insert a series of slides for animation or something similar. Sent and LibreOffice are open source, but due to the complexity of LibreOffice’s source code and build process, a user is far more likely to be able to improve or fix a bug in Sent than in LibreOffice. Sent has the suckless properties listed above, and LibreOffice presentation doesn’t.</p>

<h2 id="why-do-i-like-suckless-software">Why Do I Like Suckless Software</h2>

<p>I like Suckless programs because they give me a sense of control. Furthermore, they also give me the power to combine tools to create new tools that I also have a sense of control over. Since I create these new tools, I can also make them fit precisely how I prefer to do things. Let me explain why suckless software gives me control and power. By being small and easy to change, I can fully understand suckless programs, fix bugs, or add extra functionality by myself. Therefore, I can be in control of suckless programs without depending on an external entity. As suckless programs are designed to be easily combined with other software, it is easy to combine them to create new efficient tools. For example, one can easily combine <a href="https://dwm.suckless.org/"><code class="language-plaintext highlighter-rouge">dwm</code></a>, <a href="https://tools.suckless.org/dmenu/"><code class="language-plaintext highlighter-rouge">dmenu</code></a>, <a href="https://github.com/jordansissel/xdotool"><code class="language-plaintext highlighter-rouge">xdotool</code></a>, and a text editor to create a text template system that works across all applications.</p>

<p>Still, not all applications can easily have all of the suckless properties. Take advanced 3D modeling software like Blender, for example. This type of software requires sufficiently advanced algorithms and data structures so that they are not easy to understand or modify. However, after spending hundreds of hours with the program, a professional 3D designer can still feel in control of Blender and know its most important functionality. However, many other types of programs, like keeping bookmarks synchronized across computers, can be made much simpler and easier to understand than most bookmarking applications by combining several suckless programs as <a href="https://www.youtube.com/watch?v=d_11QaTlf1I">Luke Smith explains in one of his videos</a>. So even though suckless programs might not be the best for every computing task, one can embrace the suckless way to increase the sense of control and power.</p>

<h2 id="suckless-software-might-improve-your-physical-and-mental-health">Suckless Software Might Improve Your Physical and Mental Health</h2>

<p>Several <a href="https://pubmed.ncbi.nlm.nih.gov/33989673/">academic studies</a> suggest that the sense of control is correlated with several positive mental and physical health measurements. I have argued in this article that suckless programs give you a better sense of control. Given that this is true, it is reasonable to believe that using suckless programs might improve your mental and physical health. Given that one takes the research on the sense of control seriously, it is worth contemplating if computer programs that are buggy and overly complex are harmful to your health. Based on my own experience and the people I have seen around me, I think the answer to this is yes.</p>]]></content><author><name></name></author><category term="suckless" /><category term="software" /><category term="open-source" /><category term="philosophy" /><category term="health" /><category term="&quot;software" /><category term="design&quot;" /><summary type="html"><![CDATA[In this blog post you will learn what the term suckless program comes from, and what properties suckless software have. I will also discuss why I like to use suckless programs. Last but not least, I will discuss why I think suckless programs can improve your mental and physical health.]]></summary></entry><entry><title type="html">New Blog</title><link href="http://winsh.me/writing/blog/2022/06/20/new-blog.html" rel="alternate" type="text/html" title="New Blog" /><published>2022-06-20T00:00:00+00:00</published><updated>2022-06-20T00:00:00+00:00</updated><id>http://winsh.me/writing/blog/2022/06/20/new-blog</id><content type="html" xml:base="http://winsh.me/writing/blog/2022/06/20/new-blog.html"><![CDATA[<p>This is the first post on this blog, so I think it is appropriate to give you
some information about what this blog will be about. First of all, this blog
was created mainly for selfish reasons. I want to use it to develop my
abilities to express my thoughts and improve my writing skills. It might also
be helpful as a place to store things that I might otherwise forget in the
future. I see it as a bonus if something on this blog will also be helpful for
others.</p>

<p>My current idea for this blog is to primarily write about the following topics:</p>

<ul>
  <li>Programming - This is my primary source of income, and I spend a substantial
amount of my free time trying to become a better programmer.</li>
  <li>Software - I also plan to write about the software programs I use and like. I
mainly use open-source software. I have also started to use more software
that follows the <a href="https://suckless.org/">suckless</a> spirit in the last couple
of years. Suckless programs generally try to do one thing and do that thing
well and make it easy to combine computer programs with other programs to
perform more advanced tasks. I believe that although suckless programs might
often have a higher barrier to entry than other programs, they usually give
you greater flexibility and a sense of control once you know them well.</li>
  <li>Computer science - I think this is an interesting field with many fascinating
ideas and insights. I hold a Ph.D. degree, and the main subject of my
<a href="http://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1220366&amp;dswid=2903">dissertation</a>
was concurrent data structures, so I might write some posts that are related
to that subject.</li>
  <li>Physical exercise - This is something that I think a lot of people in modern
society do far too little of (programmers are not an exception).
<a href="https://www.youtube.com/watch?v=fL_AKb6m0wM">This video</a>, which my brother
created, explains why physical exercise is generally a good thing. Physical
exercise and sports have been a part of my life since I was very young. I
competed in cross-country skiing at a pretty high level in Sweden until I was
20 years old. After that, there have been many periods when I have done too little
exercise, but I have always started to do more again when I have realized that I
feel much better physically and mentally if I exercise a couple of days per
week.</li>
  <li>Life balance - Getting a healthy balance between family, work, physical
exercise, friends, and hobbies is far from trivial in modern developed
societies. I believe this is something that has got increasingly difficult
with an increased rate of technological and social change. I have struggled
to get a good balance and am still trying to improve.</li>
</ul>

<p>I think the list above should give you a good idea for what this blog will be
about. I hope that the blog will be helpful for you even though this is not its
primary purpose, as I have described above.</p>]]></content><author><name></name></author><category term="writing" /><category term="blog" /><summary type="html"><![CDATA[This is the first post on this blog, so I think it is appropriate to give you some information about what this blog will be about. First of all, this blog was created mainly for selfish reasons. I want to use it to develop my abilities to express my thoughts and improve my writing skills. It might also be helpful as a place to store things that I might otherwise forget in the future. I see it as a bonus if something on this blog will also be helpful for others.]]></summary></entry><entry><title type="html">The New Scalable ETS ordered_set</title><link href="http://winsh.me/ets/ordered_set/scalability/2020/02/16/the-new-scalable-ets-ordered_set.html" rel="alternate" type="text/html" title="The New Scalable ETS ordered_set" /><published>2020-02-16T17:31:48+00:00</published><updated>2020-02-16T17:31:48+00:00</updated><id>http://winsh.me/ets/ordered_set/scalability/2020/02/16/the-new-scalable-ets-ordered_set</id><content type="html" xml:base="http://winsh.me/ets/ordered_set/scalability/2020/02/16/the-new-scalable-ets-ordered_set.html"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>The scalability of ETS tables of type <code class="language-plaintext highlighter-rouge">ordered_set</code> with the
<code class="language-plaintext highlighter-rouge">write_concurrency</code> option is substantially better in Erlang/OTP 22
than earlier releases. In some extreme cases, you can expect
more than 100 times better throughput in Erlang/OTP 22 compared to
Erlang/OTP 21. The cause of this improvement is a new data structure
called <a href="https://doi.org/10.1016/j.jpdc.2017.11.007">the contention adapting search tree</a> (CA tree
for short). This blog post will give you insights into how the CA tree
works and show you benchmark results comparing the performance of ETS
<code class="language-plaintext highlighter-rouge">ordered_set</code> tables in OTP 21 and OTP 22.</p>

<h2 id="try-it-out">Try it Out!</h2>

<p><a href="/code/insert_disjoint_ranges.erl">This escript</a> makes it convenient for you
to try the new <code class="language-plaintext highlighter-rouge">ordered_set</code> implementation on your own machine with
Erlang/OTP 22+ installed.</p>

<!-- 
<figure class="highlight"><pre><code class="language-erlang" data-lang="erlang"> <span class="o">--&gt;</span>

<span class="o">&lt;!--</span> <span class="o">-</span><span class="nf">module</span><span class="p">(</span><span class="n">insert_disjoint_ranges</span><span class="p">).</span>
<span class="p">-</span><span class="ni">mode</span><span class="p">(</span><span class="n">compile</span><span class="p">).</span>
<span class="p">-</span><span class="ni">export</span><span class="p">([</span><span class="n">bench</span><span class="o">/</span><span class="mi">3</span><span class="p">,</span> <span class="n">main</span><span class="o">/</span><span class="mi">1</span><span class="p">]).</span>

<span class="nf">inserter</span><span class="p">(_</span><span class="nv">Table</span><span class="p">,</span> <span class="nv">RangeEnd</span><span class="p">,</span> <span class="nv">RangeEnd</span><span class="p">,</span> <span class="nv">P</span><span class="p">)</span> <span class="o">-&gt;</span>
    <span class="nv">P</span> <span class="o">!</span> <span class="n">done</span><span class="p">;</span>
<span class="nf">inserter</span><span class="p">(</span><span class="nv">Table</span><span class="p">,</span> <span class="nv">RangeStart</span><span class="p">,</span> <span class="nv">RangeEnd</span><span class="p">,</span> <span class="nv">P</span><span class="p">)</span> <span class="o">-&gt;</span>
    <span class="nn">ets</span><span class="p">:</span><span class="nf">insert</span><span class="p">(</span><span class="nv">Table</span><span class="p">,</span> <span class="p">{</span><span class="nv">RangeStart</span><span class="p">}),</span>
    <span class="nf">inserter</span><span class="p">(</span><span class="nv">Table</span><span class="p">,</span> <span class="nv">RangeStart</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="nv">RangeEnd</span><span class="p">,</span> <span class="nv">P</span><span class="p">).</span>

<span class="nf">bench</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="nv">NrOfProcs</span><span class="p">,</span> <span class="nv">NrOfItems</span><span class="p">)</span> <span class="o">-&gt;</span>
    <span class="nv">N</span> <span class="o">=</span> <span class="nv">NrOfItems</span> <span class="ow">div</span> <span class="nv">NrOfProcs</span><span class="p">,</span>
    <span class="nv">Parent</span> <span class="o">=</span> <span class="nf">self</span><span class="p">(),</span>
    <span class="p">{</span><span class="nv">Time</span><span class="p">,</span> <span class="p">_}</span> <span class="o">=</span>
        <span class="nn">timer</span><span class="p">:</span><span class="nf">tc</span><span class="p">(</span>
          <span class="k">fun</span><span class="p">()</span> <span class="o">-&gt;</span>
                  <span class="c">% Spawn inserters
</span>                  <span class="p">[</span> <span class="nb">spawn</span><span class="p">(</span>
                      <span class="k">fun</span><span class="p">()</span><span class="o">-&gt;</span>
                              <span class="nv">End</span> <span class="o">=</span> <span class="k">case</span> <span class="p">(</span><span class="nv">P</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">=:=</span> <span class="nv">NrOfProcs</span> <span class="k">of</span>
                                        <span class="n">true</span> <span class="o">-&gt;</span> <span class="nv">NrOfItems</span><span class="p">;</span>
                                        <span class="n">false</span> <span class="o">-&gt;</span> <span class="p">(</span><span class="nv">P</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span><span class="nv">N</span>
                                    <span class="k">end</span><span class="p">,</span>
                              <span class="nf">inserter</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="nv">P</span><span class="o">*</span><span class="nv">N</span><span class="p">,</span> <span class="nv">End</span><span class="p">,</span> <span class="nv">Parent</span><span class="p">)</span>
                      <span class="k">end</span><span class="p">)</span>
                    <span class="p">||</span> <span class="nv">P</span> <span class="o">&lt;-</span> <span class="nn">lists</span><span class="p">:</span><span class="nf">seq</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nv">NrOfProcs</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">],</span>
                  <span class="c">% Wait for inserters to finish'
</span>                  <span class="p">[</span> <span class="k">receive</span> <span class="n">done</span> <span class="o">-&gt;</span> <span class="n">ok</span> <span class="k">end</span> <span class="p">||</span> <span class="p">_</span> <span class="o">&lt;-</span> <span class="nn">lists</span><span class="p">:</span><span class="nf">seq</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nv">NrOfProcs</span><span class="p">)]</span>
          <span class="k">end</span><span class="p">),</span>
    <span class="nn">io</span><span class="p">:</span><span class="nf">format</span><span class="p">(</span><span class="s">"Time: </span><span class="si">~p</span><span class="s"> seconds </span><span class="si">~p~n</span><span class="s">"</span><span class="p">,</span> <span class="p">[</span><span class="nv">Time</span> <span class="o">/</span> <span class="mi">1000000</span><span class="p">,</span> <span class="nn">ets</span><span class="p">:</span><span class="nf">info</span><span class="p">(</span><span class="nv">T</span><span class="p">,</span> <span class="nb">size</span><span class="p">)]).</span>

<span class="nf">main</span><span class="p">([</span><span class="nv">Type</span><span class="p">,</span> <span class="nv">NrOfProcs</span><span class="p">,</span> <span class="nv">Size</span><span class="p">])</span>
  <span class="k">when</span> <span class="nb">is_integer</span><span class="p">(</span><span class="nv">Size</span><span class="p">)</span> <span class="ow">andalso</span> <span class="nb">is_integer</span><span class="p">(</span><span class="nv">Size</span><span class="p">)</span> <span class="ow">andalso</span> <span class="nv">Size</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="ow">andalso</span> <span class="nv">NrOfProcs</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">-&gt;</span>
    <span class="nv">Settings</span> <span class="o">=</span>
        <span class="k">case</span> <span class="nv">Type</span> <span class="k">of</span>
            <span class="s">"old"</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="n">public</span><span class="p">,</span> <span class="n">ordered_set</span><span class="p">];</span>
            <span class="s">"new"</span> <span class="o">-&gt;</span> <span class="p">[</span><span class="n">public</span><span class="p">,</span> <span class="n">ordered_set</span><span class="p">,</span> <span class="p">{</span><span class="n">write_concurrency</span><span class="p">,</span> <span class="n">true</span><span class="p">}]</span>
        <span class="k">end</span><span class="p">,</span>
    <span class="nf">bench</span><span class="p">(</span><span class="nn">ets</span><span class="p">:</span><span class="nf">new</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="nv">Settings</span><span class="p">),</span> <span class="nv">NrOfProcs</span><span class="p">,</span> <span class="nv">Size</span><span class="p">);</span>
<span class="nf">main</span><span class="p">([</span><span class="nv">V</span><span class="p">,</span> <span class="nv">NrOfProcsStr</span><span class="p">,</span> <span class="nv">SizeStr</span><span class="p">])</span> <span class="k">when</span> <span class="ow">not</span> <span class="nb">is_integer</span><span class="p">(</span><span class="nv">SizeStr</span><span class="p">)</span> <span class="o">-&gt;</span>
    <span class="nf">main</span><span class="p">([</span><span class="nv">V</span><span class="p">,</span> <span class="k">catch</span> <span class="nb">list_to_integer</span><span class="p">(</span><span class="nv">NrOfProcsStr</span><span class="p">),</span> <span class="k">catch</span> <span class="nb">list_to_integer</span><span class="p">(</span><span class="nv">SizeStr</span><span class="p">)]);</span>
<span class="nf">main</span><span class="p">(_)</span> <span class="o">-&gt;</span>
    <span class="nn">io</span><span class="p">:</span><span class="nf">format</span><span class="p">(</span><span class="s">"usage: escript </span><span class="si">~s</span><span class="s"> (new|old) NrOfProcesses Size</span><span class="si">~n</span><span class="s">"</span><span class="p">,</span> <span class="p">[</span><span class="nn">escript</span><span class="p">:</span><span class="nf">script_name</span><span class="p">()]).</span>

 <span class="o">--&gt;</span>

<span class="o">&lt;!--</span> </code></pre></figure>
 -->

<p>The escript measures the time it takes for <code class="language-plaintext highlighter-rouge">P</code> Erlang processes to
insert <code class="language-plaintext highlighter-rouge">N</code> integers into an <code class="language-plaintext highlighter-rouge">ordered_set</code> ETS table, where <code class="language-plaintext highlighter-rouge">P</code> and <code class="language-plaintext highlighter-rouge">N</code>
are parameters to the escript. The CA tree is only utilized when the
ETS table options <code class="language-plaintext highlighter-rouge">ordred_set</code> and <code class="language-plaintext highlighter-rouge">{write_concurrency, true}</code> are
active. One can, therefore, easily compare the new data structure’s
performance with the old one (an <a href="https://en.wikipedia.org/wiki/AVL_tree">AVL tree</a> protected by a
single readers-writer lock). The <code class="language-plaintext highlighter-rouge">write_concurrency</code> option had no
effect on <code class="language-plaintext highlighter-rouge">ordered_set</code> tables before the release of Erlang/OTP 22.</p>

<p>I got the following results when I ran the escript on my laptop with
two cores (Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz):</p>

<figure class="highlight"><pre><code class="language-console" data-lang="console"><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl old 1 10000000
<span class="go">Time: 3.352332 seconds
</span><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl old 2 10000000
<span class="go">Time: 3.961732 seconds
</span><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl old 4 10000000
<span class="go">Time: 6.382199 seconds
</span><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl new 1 10000000
<span class="go">Time: 3.832119 seconds
</span><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl new 2 10000000
<span class="go">Time: 2.109476 seconds
</span><span class="gp">$</span><span class="w"> </span>escript insert_disjoint_ranges.erl new 4 10000000
<span class="go">Time: 1.66509 seconds</span></code></pre></figure>

<p>We see that in this particular benchmark, the CA tree has superior
scalability to the old data structure. The benchmark ran about twice
as fast with the new data structure and four processes as with the old
data structure and one process (remember that the machine only has two
cores). We will look at the performance and scalability of the new CA
tree-based implementation in greater detail later after describing how
the CA tree works.</p>

<h2 id="the-contention-adapting-search-tree-in-a-nutshell">The Contention Adapting Search Tree in a Nutshell</h2>

<p>The key feature that distinguishes the CA tree from other concurrent
data structures is that the CA tree dynamically changes its
synchronization granularity based on how much contention is detected
inside the data structure. This way, the CA tree can avoid the
performance and memory overheads that come from using many unnecessary
locks without sacrificing performance when many operations happen in
parallel. For example, let us imagine a scenario where the CA tree is
initially populated from many threads in parallel, and then it is only
used from a single thread. In this scenario, the CA tree will adapt to
use fine-grained synchronization in the population phase (when
fine-grained synchronization reduces contention). The CA tree will then change
to use coarse-grained synchronization in the single-threaded phase
(when coarse-grained synchronization reduces the locking and memory
overheads).</p>

<p>The structure of a CA tree is illustrated in the following
picture:</p>

<p><img src="/img/ca_tree/ca_tree_9.png" alt="alt text" title="Contention Adapting Search Tree Structure" /></p>

<p>The actual items stored in the CA tree are located in
sequential data structures in the bottom layer. These
sequential data structures are protected by the locks in the base
nodes in the middle layer. The base node locks have counters
associated with them. The counter of a base node lock is increased when
contention is detected in the base node lock and decreased when no
such contention is detected. The value of this base node lock counter
decides if a split or a join should happen after an operation has been
performed in a base node. The routing nodes at the top of the picture
above form a binary search tree that directs the search for a
particular item. A routing node also contains a lock and a flag. These
are used when joining base nodes. The details of how splitting and
joining work will not be described in this article, but
the interested reader can find a detailed description in this <a href="https://doi.org/10.1016/j.jpdc.2017.11.007">CA tree
paper</a> (<a href="http://winsh.me/papers/catree_jpdc_paper.pdf">preprint PDF</a>). We will now
illustrate how the CA tree changes its synchronization granularity by
going through an example:</p>

<ol>
  <li>
    <p>Initially, a CA tree only consists of a single base node with a
sequential data structure as is depicted in the picture below:</p>

    <p><img src="/img/ca_tree/ca_tree_1.png" alt="alt text" title="Initial Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>If parallel threads access the CA tree, the value of a base node’s
counter may eventually reach the threshold that indicates that the
base node should be split. A base node split divides the items in a
base node between two new base nodes and replaces the original base
node with a routing node where the two new base nodes are
rooted. The following picture shows the CA tree after the base node
pointed to by the tree’s root has been split:</p>

    <p><img src="/img/ca_tree/ca_tree_2.png" alt="alt text" title="First Split Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>The process of base node splitting will continue as long as there
is enough contention in base node locks or until the max depth of the
routing layer is reached. The following picture shows how the CA
tree looks like after another split:</p>

    <p><img src="/img/ca_tree/ca_tree_3.png" alt="alt text" title="Second Split Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>The synchronization granularity may differ in different parts of a
CA tree if, for example, a particular part of a CA tree is accessed
more frequently in parallel than the rest. The following picture
shows the CA tree after yet another split:</p>

    <p><img src="/img/ca_tree/ca_tree_4.png" alt="alt text" title="Third Split Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>The following picture shows the CA tree after the forth split:</p>

    <p><img src="/img/ca_tree/ca_tree_5.png" alt="alt text" title="Forth Split Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>The following picture shows the CA tree after the fifth split:</p>

    <p><img src="/img/ca_tree/ca_tree_6.png" alt="alt text" title="Fifth Split Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>Two base nodes holding adjacent ranges of items can be joined. Such
a join will be triggered after an operation sees that a base
node counter’s value is below a certain threshold. Remember that a
base node’s counter is decreased if a thread does not experience
contention when acquiring the base node’s lock.
<!--The conters The likelihood that
a join will be triggered in a certain base node gets higher when
the probablity of contention that does not detect contention in the
base node lock is high. The likelihood that two base nodes are
joined is also increased if operations that require both base nodes
happens often enough (to reduce the overhead of acquiring locks).--></p>

    <p><img src="/img/ca_tree/ca_tree_7.png" alt="alt text" title="Join of two base nodes in a  Contention Adapting Search Tree" /></p>
  </li>
  <li>
    <p>As you might have noticed from the illustrations above, splitting
and joining results in that old base nodes and
routing nodes gets spliced-out from the tree. The memory that these
nodes occupy needs to be reclaimed, but this can not happen directly
after they have got spliced-out as some threads might still be
reading them. The Erlang run-time system has a mechanism called
<a href="https://github.com/erlang/otp/blob/d6285b0a347b9489ce939511ee9a979acd868f71/erts/emulator/internal_doc/DelayedDealloc.md">delayed
dealloc</a>,
which the ETS CA tree implementation uses to reclaim these nodes
safely.</p>
  </li>
</ol>

<h2 id="benchmark">Benchmark</h2>

<p>The performance of the new CA tree-based ETS <code class="language-plaintext highlighter-rouge">ordered_set</code>
implementation has been evaluated in a benchmark that measures the
throughput (operations per second) in many scenarios. The
benchmark lets a configurable number of Erlang processes perform a
configurable distribution of operations on a single ETS table. The
curious reader can find the source code of the benchmark in the <a href="https://github.com/erlang/otp/blob/ba2c374d3d6fcba479bb542eb6ecd5d8216ce84b/lib/stdlib/test/ets_SUITE.erl#L7623">test
suite for
ETS</a>.</p>

<p>The following figures show results from this benchmark on a machine
with two Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (32 cores in total
with hyper-threading). The average set size in all scenarios was
about 500K. More details about the benchmark machine and configuration
can be found on <a href="http://blog.erlang.org/bench/ets_ord_set_21_vs_22/21_vs_22.html">this
page</a>.</p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_1.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_2.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_3.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_4.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_5.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_6.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_7.png" alt="alt text" title="benchmark results" /></p>

<p><img src="/bench/ets_ord_set_21_vs_22/plot_8.png" alt="alt text" title="benchmark results" /></p>

<p>We see that the throughput of the CA tree-based <code class="language-plaintext highlighter-rouge">ordered_set</code> improves
when we add cores all the way up to 64 cores, while the old
implementation’s throughput often gets worse when more processes are
added. The old implementation’s write operations are serialized as the
data structure is protected by a single readers-writer lock. The
slowdown of the old version when adding more cores is caused by
increased communication overhead when more cores try to acquire the
same lock and by the fact that the competing cores frequently
invalidate each other’s cache lines.</p>

<h2 id="further-reading">Further Reading</h2>

<p>The following paper describes the CA tree and some optimizations in much more detail than this blog post. The paper also includes an experimental comparison with related data structures.</p>

<ul>
  <li><em><a href="https://doi.org/10.1016/j.jpdc.2017.11.007">A Contention Adapting Approach to Concurrent Ordered Sets</a> (<a href="http://winsh.me/papers/catree_jpdc_paper.pdf">preprint</a>). Journal of Parallel and Distributed Computing, 2018. Konstantinos Sagonas and Kjell Winblad</em></li>
</ul>

<p>There is also a lock-free variant of the CA tree that is described in the following paper. The lock-free CA tree uses immutable data structures in its base nodes to substantially reduce the amount of time range queries, and similar operations can conflict with other operations.</p>

<ul>
  <li><em><a href="https://doi.org/10.1145/3210377.3210413">Lock-free Contention Adapting Search Trees</a> (<a href="http://winsh.me/papers/spaa2018lfcatree.pdf">preprint</a>). In the proceedings of the 30th Symposium on Parallelism in Algorithms and Architectures (SPAA 2018). Kjell Winblad, Konstantinos Sagonas, and Bengt Jonsson.</em></li>
</ul>

<p>The following paper, which discusses and evaluates a prototypical CA tree implementation for ETS, was the first CA tree-related paper.</p>

<ul>
  <li><em><a href="http://dl.acm.org/citation.cfm?id=2633455">More Scalable Ordered Set for ETS Using Adaptation</a> (<a href="http://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1220366&amp;dswid=6575">preprint</a>). In Thirteenth ACM SIGPLAN workshop on Erlang (2014). Konstantinos Sagonas and Kjell Winblad</em></li>
</ul>

<p>It might also be interesting to look at the <a href="http://uu.diva-portal.org/smash/record.jsf?pid=diva2%3A1220366&amp;dswid=6575">author’s Ph.D. thesis</a> if you want to get more links to related work or want to know more about the motivation for concurrent data structures that adapt to contention.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The Erlang/OTP 22 release introduced a new ETS <code class="language-plaintext highlighter-rouge">ordered_set</code>
implementation that is active when the <code class="language-plaintext highlighter-rouge">write_concurrency</code> option is
turned on. This data structure (a contention adapting search tree) has
superior scalability to the old data structure in many different
scenarios and a design that gives it excellent performance in a variety
of scenarios that benefit from different synchronization
granularities.</p>]]></content><author><name></name></author><category term="ETS" /><category term="ordered_set" /><category term="scalability" /><summary type="html"><![CDATA[Introduction]]></summary></entry></feed>