{"id":842,"date":"2021-04-07T07:53:26","date_gmt":"2021-04-07T02:23:26","guid":{"rendered":"https:\/\/cenacle.io\/?p=630"},"modified":"2021-04-07T07:53:26","modified_gmt":"2021-04-07T02:23:26","slug":"thread-ring-benchmark-c-implementation-with-piped-threads","status":"publish","type":"post","link":"https:\/\/gk.palem.in\/articles\/thread-ring-benchmark-c-implementation-with-piped-threads\/","title":{"rendered":"Thread-Ring benchmark C++ implementation with Piped threads"},"content":{"rendered":"\n<p>The thread-ring is a standard benchmark challenge to validate the performance of <em>sequential<\/em> communication among parallel threads.<\/p>\n\n\n\n<p>The challenge mandates that, each program should<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>create 503 linked pre-emptive threads (named 1 to 503)<\/li><li>thread 503 should be linked to thread 1, forming an unbroken ring<\/li><li>pass a token to thread 1<\/li><li>pass the token from thread to thread N times<\/li><li>print the name of the last thread (1 to 503) to take the token<\/li><\/ul>\n\n\n\n<p>Some of the benchmark results can be seen here: <a href=\"http:\/\/benchmarksgame.alioth.debian.org\/u32\/performance.php?test=threadring\" target=\"_blank\" rel=\"noreferrer noopener\">Thread-ring benchmark results<\/a>.<\/p>\n\n\n\n<p>While there are many elegant solutions with C++ on achieving the best results for this benchmark, in this post we will try to demonstrate how to solve this in a basic way using <em>native pipes<\/em> as a communication channel among threads. Note that, many faster implementations that you see on the benchmark results page do not really create 503 real OS threads (including the Haskel and C++ Boost Asio implementation) &#8211; rather what they do is, create a handful of worker threads (e.g. thread pool of 4 or 8 OS threads) and keep reusing them, which works fine in this case, because only one thread is active at any one point of time.<\/p>\n\n\n\n<p>In our code below, we rather stick to the basic guideline of the challenge and create 503 real OS threads and make them communicate with pipes. Each thread holds two pipe handles (one for reading data from and one to write data to). The pipes are connected in such a way that the <span class=\"has-inline-color has-vivid-red-color\">thread[i]<\/span>&#8216;s written data goes to the <span class=\"has-inline-color has-vivid-red-color\">thread[i+1]<\/span>&#8216;s read end (looped over at the end to close the cycle to the beginning).<\/p>\n\n\n\n<p>When all the pipe connections are made, we inject the token on the last thread&#8217;s write pipe handle (which is the same as the first thread&#8217;s read pipe end) and start the process. Here is the code, run it on your system, and see the timings. Post your results here if you like what you see.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\n\/\/\/ (c) 2014. Cenacle Research\n\/\/\/ Basic version of ThreadRing benchmark implementation with Pipes between threads\n\/\/\/\n\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\n\n#define _GNU_SOURCE\n#include &lt;pthread.h&gt;\n#include &lt;sched.h&gt;\n#include &lt;stdio.h&gt;\n#include &lt;stdlib.h&gt;\n#include &lt;string.h&gt;\n#include &lt;unistd.h&gt;\n#include &lt;inttypes.h&gt;\n#include &lt;time.h&gt;\n#include &lt;fcntl.h&gt;\n\n#if defined(__MINGW_H)\n#define pipe(fds) _pipe(fds, 32, _O_BINARY)\n#endif\n\n#define NUMBER_OF_THREADS 503\n\npthread_mutex_t cv_mutex = PTHREAD_MUTEX_INITIALIZER;\npthread_cond_t cv_main = PTHREAD_COND_INITIALIZER;\npthread_cond_t *cvs = NULL;\nuint32_t token = 0;\nuint32_t token_count = 1000;\nuint32_t threads_started = 0;\nuint32_t number_of_cpus = 0;\n\n\nstruct _tio\n{\n        int readFd;\n        int writeFd;\n        short thread_id;\n};\n\n\/* POSIX Thread requires this signature*\/\nvoid* threadEntryProc(void* pArg)\n{\n        _tio* tio = (_tio*)pArg;\n        int tokenMax = token_count + 1;\n        int token;\n        do\n        {\n                if (read(tio-&gt;readFd, &amp;token, sizeof(token)) != sizeof(token))\n                        perror(\"read failure inside thread\");\n\n                if (write(tio-&gt;writeFd, &amp;++token, sizeof(token)) != sizeof(token))\n                        perror(\"write failure inside thread\");\n\n                if (token == tokenMax)\n                {\n                        printf(\"%d\\n\", tio-&gt;thread_id+1); break;\n                }\n\n                if (token &gt; tokenMax) break;\n\n        } while (1);\n\n        pthread_exit(NULL);\n}\n\nvoid ThreadRing()\n{\n        _tio tio&#91;NUMBER_OF_THREADS];\n        int fd&#91;NUMBER_OF_THREADS]&#91;2];\n\n        int nLastThread = NUMBER_OF_THREADS - 1;\n        int nTokenStart = 0;\n\n        \/* create up the pipes *\/\n        {\n                for (int i = 0; i &lt; NUMBER_OF_THREADS; ++i)\n                {\n                        if (pipe(fd&#91;i]))\n                                exit(fprintf(stderr, \"\\nUnable to create Pipe for %d\\n\", i));\n                }\n        }\n        \/* setup the pipe ends *\/\n        {\n                for (int i = 0; i &lt; NUMBER_OF_THREADS-1; ++i)\n                {\n                        tio&#91;i].readFd = fd&#91;i]&#91;0];\n                        tio&#91;i].writeFd = fd&#91;i + 1]&#91;1];\n                        tio&#91;i].thread_id = i;\n                }\n                {\n                        tio&#91;nLastThread].readFd = fd&#91;nLastThread]&#91;0];\n                        tio&#91;nLastThread].writeFd = fd&#91;0]&#91;1];\n                        tio&#91;nLastThread].thread_id = nLastThread;\n                }\n        }\n\n        pthread_t threadObj&#91;NUMBER_OF_THREADS];\n        \/* Create the threads *\/\n        {\n                for (int i = 0; i &lt; NUMBER_OF_THREADS; ++i)\n                if (pthread_create(&amp;threadObj&#91;i], NULL, threadEntryProc, &amp;tio&#91;i]))\n                        exit(fprintf(stderr, \"\\nUnable to create thread %d\\n\", i));\n        }\n\n        \/* we send on last thread's writing fd, because thats where the first thread is listening*\/\n        if (write(tio&#91;nLastThread].writeFd, &amp;nTokenStart, sizeof(nTokenStart)) != sizeof(nTokenStart))\n                perror(\"token send initiation failure\");\n\n        \/* wait for all threads to finish*\/\n        {\n                for (int i = 0; i &lt; NUMBER_OF_THREADS; ++i)\n                if (pthread_join(threadObj&#91;i], NULL))\n                        perror(\"error while waiting for threads to finish\");\n        }\n\n        return;\n}\n\nint freeRun()\n{\n        int token = 0;\n        for (int i = 0; i &lt; token_count; ++i)\n                ++token;\n        return token;\n}\n\nint main(int argc, char **argv)\n{\n        clock_t start, end;\n\n        if (argc &gt; 1)\n                token_count = strtol(argv&#91;1], NULL, 0);\n\n        start = clock();\n        argc = freeRun(); \/* just storing it some var so that gcc wont optimize it out *\/\n        end = clock();\n        printf(\"\\nFreeRun: time start=%ld, end=%ld, spent: %f\\n\", start, end, (double)(end - start) \/ CLOCKS_PER_SEC);\n\n        start = clock();\n        ThreadRing();\n        end = clock();\n        printf(\"\\nThreadRing: time start=%ld, end=%ld, spent: %f\\n\", start, end, (double)(end - start) \/ CLOCKS_PER_SEC);\n}\n\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>The thread-ring is a standard benchmark challenge to validate the performance of sequential communication among parallel threads.<\/p>\n<p>Here is the code, run it on your system, and see the timings. Post your results here if you like what you see.<\/p>\n","protected":false},"author":1,"featured_media":847,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_cloudinary_featured_overwrite":false,"fifu_image_url":"https:\/\/res.cloudinary.com\/cenacle-cdn\/image\/upload\/v1468892114\/pandas-datascience_tqkjxq.png","fifu_image_alt":"Thread-Ring benchmark C++ implementation with Piped threads","footnotes":""},"categories":[69],"tags":[79,80],"class_list":["post-842","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-benchmark","tag-c"],"jetpack_featured_media_url":"https:\/\/res.cloudinary.com\/cenacle-cdn\/images\/v1715779378\/gk.palem.in\/wp_assets\/pandas-datascience_tqkjxq\/pandas-datascience_tqkjxq.png?_i=AA","jetpack-related-posts":[],"jetpack_shortlink":"https:\/\/wp.me\/pfLaRd-dA","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/posts\/842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/comments?post=842"}],"version-history":[{"count":1,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/posts\/842\/revisions"}],"predecessor-version":[{"id":889,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/posts\/842\/revisions\/889"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/media\/847"}],"wp:attachment":[{"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/media?parent=842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/categories?post=842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gk.palem.in\/articles\/wp-json\/wp\/v2\/tags?post=842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}