CppCMS Blog http://blog.cppcms.com/ A blog on CppCMS - C++ Web Development Framework Web Server Upgrade http://blog.cppcms.com/post/128 http://blog.cppcms.com/post/128 <div style="direction:ltr"> <p><a href="http://cppcms.com">http://cppcms.com</a> and <a href="http://blog.cppcms.com">http://blog.cppcms.com</a> were migrated to new ec2 Ubuntu 20.04 server. If you notice any issues update please.</p> <p>Known Issues: there are problems with wikipp table of content. I'll handle it later. <strong>Fixed</strong></p> </div> CppCMS 2.0.0.beta1 Released http://blog.cppcms.com/post/127 http://blog.cppcms.com/post/127 <div style="direction:ltr"> <p>CppCMS moves to C++11 and cleanups code that was workaround for missing standard C++ libraries features.</p> <p>Full changes can be read there: <a href="http://cppcms.com/wikipp/en/page/cppcms_2_0_whats_new">http://cppcms.com/wikipp/en/page/cppcms_2_0_whats_new</a></p> <p>This is beta release so more backward incompatible modifications may be presented.</p> </div> Next CppCMS Major Feature Poll http://blog.cppcms.com/post/126 http://blog.cppcms.com/post/126 <div style="direction:ltr"> <p>I'm looking into implementing next major feature that I can complete with reasonable effort. I have a list of following topics:</p> <ol> <li>HTTP/1.1 support - it would allow keep alive support and better performance overall for multiple requests. It would allow future implementation of web sockets.</li> <li>Multiple-Event-Loop support - today there is only 1 event loop for the service and it can be a bottleneck for small requests on systems with high core count.</li> <li>SSL support - for embedded systems that need proper web server.</li> <li>C++11 cleanup - replace all booster primitives that exist in C++11 with standard (<code>shared_ptr</code>, <code>thread</code>, <code>mutex</code>, etc) provide move constructors for many objects, replace <code>auto_ptr</code> by <code>unique_ptr</code>.</li> </ol> <p>Feel free to suggest other ideas that you can think of that would be beneficial.</p> <p>Comment below or sent e-mail to cppcms-users mailing list.</p> </div> Modern AI and Deep Learning on ZX Spectrum http://blog.cppcms.com/post/125 http://blog.cppcms.com/post/125 <div style="direction:ltr"> <p><a href="https://en.wikipedia.org/wiki/ZX_Spectrum">ZX Spectrum</a> was my first computer. Me and my brother got one when we were kids. I learning programming using one. I learned to write both BASIC and machine code on it. To this day, I know <a href="https://en.wikipedia.org/wiki/Zilog_Z80">Z80</a>'s assembly and machine code better than of any other processor. I learned to do some system programming, working with interrupts and some graphics using this simple but genius machine.</p> <p>Back than I studied at a school with strong emphasis on math and physics. I used to write some simple and not so simple simulations using this amazing machine. Even my brother had written some computational tasks during has physics degree at the university he attended.</p> <p>Today I use much more powerful hardware I do lots of work in field of <a href="https://en.wikipedia.org/wiki/Deep_learning">Deep Learning</a> professionally. I run computations using powerful GPU's consuming huge amount of fast memory and measuring computational power in Terra FLOPS.</p> <p>Recently I stumbled upon an interesting YouTube channel <a href="https://www.youtube.com/channel/UC8uT9cgJorJPWu7ITLGo9Ww">the 8-bit guy</a> that talks a lot about "retro" hardware. It reminded me my first "computing love" that small 8 bit machine I used to study and play with it a lot. So I installed the <a href="http://fuse-emulator.sourceforge.net/">simulator</a> and started playing with it.</p> <p>Than a crazy thought had came to my mind: can some of the state-of the AI art techniques that require enormous computational power be done on this simple 8-bit machine with 48KB of memory and 3.5MHz CPU? What was the simplest project I could start with?</p> <p>There is a "Hello World" for AI: it is the <a href="https://en.wikipedia.org/wiki/MNIST_database">MNIST</a> challenge - hand written digits recognition.</p> <h2>Challenge and Network</h2> <p>MNIST database consists of 60,000 images of hand written digits 28x28 pixels size and 8 bit depth. It was clear that it wouldn't be possible to do the task as-is.</p> <p>First of all I made smaller images - it will help both with data-memory and network size:</p> <ul> <li>Remove 2 pixels padding that MNIST does to 24x24</li> <li>Rescale it to 1:3 - so we get an image of 8x8 for each digit</li> <li>Keep only 1 bit per pixel (it will help us later for computations)</li> <li>Use our 6KB of video memory as training data buffer.</li> </ul> <p>This way total train data-set would consist of 640 samples, test data set will be loaded from tape after training is completed and contain same amount of data</p> <p>And this is how the training data set looks like:</p> <p><img src="https://user-images.githubusercontent.com/14816918/71544586-11423a00-298a-11ea-8b78-aa74138957cf.png" alt="mnist" /></p> <p>Now I designed a simplest network that could fit the bill.</p> <p><strong>Getting Machine Learning Technical Alert {</strong></p> <p>The simplest network to implement was a <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron">Multilayer perceptron</a>. However in order to get reasonable accuracy I needed to have enough hidden layers that made the network quite big for the RAM.</p> <p>So I finally decided to implement more complex convolutional network like this:</p> <ul> <li>Convolution layer with N kernels of 3x3 with bias, input 1x8x8, output Nx6x6</li> <li>MAX Pooling 2x2: input Nx6x6, output Nx3x3</li> <li>ReLU</li> <li>Fully connected layer with bias giving M classes output</li> <li>Euclidean Loss - since it is simpler and faster to implement than standard SoftMax and logistic loss.</li> </ul> <p>So for 10 classes I used N=12 kernels with 1210 trainable parameters.</p> <p><strong>} End of Machine Learning Technical</strong></p> <h2>Language</h2> <p>I decided to check first if the BASIC was feasible for the task? I tested simple matrix (72 by 10) by vector (72) multiplication. It took around 20 seconds. It clearly wasn't the way to go.</p> <p>So I looked somewhere else and discovered <a href="http://www.z88dk.org">z88dk</a> C compiler for Z80 and ZX Spectrum that comes with decent set of tools. To my surprise the project was up-to-date and provided very good C compiler. Same test took around 1 second using C compiler, so I started with it.</p> <p>I implemented first version and got around 2-hours training per epoch for 10 digits samples and the accuracy was around 77% - it is not high but not poor considering small samples set and low resolution it was OK.</p> <p>But did we need full floating point computations to do all the calculations? It is known that deep learning can be done using half float (16 bit) operations and inference (the prediction only part) works with 8 bit integer operations as well?</p> <p>So I rewritten the code for fixed point computations. With 1 bit for the sign, 3 bits for the integer part and 12 bits for the rational.</p> <p>For example 1.5 is represented as <code>1.5*4096 = 6144</code>, and 1.25 represented as <code>1.25*4096 = 5120</code>. In order to compute <code>1.5*1.25</code> we calculate in integer numbers <code>(6144 * 5120) / 4096 = 7680</code> equivalent of 1.875 in fixed representation. Since operation of <code>/ 4096</code> is basically shift operation in integers it is highly effective method of handling real numbers in integer representation.</p> <p>However there is a catch, on 8-bit systems the typical integer size is 16 bit. So when you calculate <code>6144 * 5120</code> you get <code>31457280</code> that is larger than 16 bit. So if you write: <code>int xy = x*y&gt;&gt;12;</code> you will not get the result you are expecting. So the correct solution is to cast numbers to 32 bit variable: <code>int xy=(long)x*y&gt;&gt;12;</code> However than the costly multiplication becomes much heavier 32 bit operation.</p> <p>So I found a <a href="http://map.grauw.nl/sources/external/z80bits.html">sample assembly code pattern</a> for multiplication of two 16 bit registers and getting 32 bit result. I adapted it for singed case, added shift and rounding. It boosted the performance even more and finally I managed to perform the training using fixed point computing.</p> <p>Additional memory space if using 16 bit computations allowed me later to increase number of kernels number to 20 and parameters number to 2010.</p> <p><img src="https://user-images.githubusercontent.com/14816918/71548763-9db92080-29bb-11ea-82a9-34cd1a510c26.png" alt="mnist2" /></p> <h2>Back to BASICs</h2> <p>However back than I din't have a good C compiler that allowed creation of an optimized C code with floating point support. So wasn't BASIC that unfeasible?</p> <p>I decided to simplify the problem: train on 2 digits: 0 and 1 instead of all 10, reduce number of kernels to 4 and see what can we do in terms of performance. Another benefit of it got much higher accuracy - since it was much simpler to distinguish between "0" and "1" I got almost always 99% of accuracy.</p> <p>Since I already had debugged C code it was quite easy to rewrite it in BASIC. I found a great program called <a href="https://github.com/andybalaam/bas2tap.git">bas2tap</a> that significantly simplified my code typing allowing to create tape files directly from text source files outside ZX Spectrum and saving me lots of troubles typing the code without an original keyboard. The Sinclair basic had a unique feature: a single stoke on a key brought an entire keyword, for example pressing P lead to PRINT keyword to be inserted to the code. This method increased both typing speed and reduced code memory footprint, but on the other hand typing the code without the original keyboard with all keyword marks was quite hard.</p> <p>So I had rewritten the code for BASIC and did 2 class training. It took around 10 hours for single epoch. I added some simple profiling and managed to cut the time in half to 5 hours per epoch. It was painful to train even with the emulator that allowed to increase the emulation speed by 100 times.</p> <p>So I felt that BASIC was rather unfeasible and since back than I didn't have such a good C compiler, the entire concept of the project felt a little bit over-optimistic.</p> <p>Than I discovered this BASIC compiler: <a href="https://en.wikipedia.org/wiki/ToBoS-FP">ToBoS-FP</a>. One of its key advantages was highly effective implementation of floating point routines.</p> <p>So I compiled the code with it and it worked very fast! However it lacked documentation and I only managed to find some Russian source regarding compilation of big programs. Another problem was lack of "LOAD" function I related on to access the test data. But I found a workaround by simply calling to proper ROM routines and the problem was solved.</p> <h2>Performance</h2> <p>So how was the performance?</p> <p>Two digits training. Note: train time is per epoch, total is for 2 train epochs and 1 testing,</p> <pre><code> BASIC BASIC C/float C/fixed - ToBoSFP z88dk z88dk+asm Train: 5h20m 12m 3.7m 1.5m Test: 2h19 6m 1.2m 0.5m Total: 12h59m 30m 8.6m 3.5m </code></pre> <p>10 digits training for 5 epochs:</p> <pre><code> BASIC C/float C/fixed ToBoSFP z88dk z88dk+asm Train: 3h15m 2h16m 41.4m Test: 1h21m 41m 13.6m Total: 17h36m 12h01m 3h40m </code></pre> <p>So I was really surprised that BASIC compiler had gave such a good results and that training times were quite feasible.</p> <h2>Summary</h2> <p>I had lot of fun doing such a project. It reminded me how well ZX Spectrum was designed. It was an excellent educational tool.</p> <p>Now it is probably the time to find some real hardware and try it.</p> <p>The full code is posted on github:</p> <p><a href="https://github.com/artyom-beilis/zx_spectrum_deep_learning/">https://github.com/artyom-beilis/zx_spectrum_deep_learning/</a></p> </div> CppCMS 1.2.1 - security update was released today http://blog.cppcms.com/post/123 http://blog.cppcms.com/post/123 <div style="direction:ltr"> <p>Security Bug Fixes:</p> <ul> <li>Fixed security bug fix in JSON parser module that can lead to DOS</li> </ul> <p>Bugs Fixed:</p> <ul> <li>Fixed issues #36 - building with GZIP disabled</li> <li>Fixed issue #150 - incorrect parsing of multipart form</li> </ul> <p>Changes:</p> <ul> <li>By default CppCMS now uses OpenSSL instead of GNU-TLS if both available (you can change behavior back by adding <code>-DDISABLE_OPENSSL=ON</code> to cmake)</li> </ul> <p><strong>Special Thanks to Khaled Yakdan from code-intelligence.de for reporting this security issue</strong></p> </div> Stable Version CppCMS 1.2.0 was released http://blog.cppcms.com/post/122 http://blog.cppcms.com/post/122 <div style="direction:ltr"> <p>CppCMS 1.2.0 was released.</p> <p>It is same as 1.1.1 with exception of modification of the license to MIT instead of LGPL.</p> </div> CppCMS moved to MIT License http://blog.cppcms.com/post/121 http://blog.cppcms.com/post/121 <div style="direction:ltr"> <p>Today I decided that CppCMS 1.2 will be licensed under MIT license instead of LGPLv3.</p> <p>These are the goals of this move:</p> <ol> <li>Increase the CppCMS market share</li> <li>Bring more developers to CppCMS project itself</li> </ol> <p>Regards to all CppCMS users.</p> <p>Artyom</p> </div> CppCMS 1.1.1 Release Candidate 1 is Available http://blog.cppcms.com/post/120 http://blog.cppcms.com/post/120 <div style="direction:ltr"> <p>New version includes following changes:</p> <ul> <li><p>Nightly build system updated to moderns OSes/compilers:</p> <ol> <li>Windows XP -> Windows 7</li> <li>MSVC 2008 x86 to MSVC 2017 x86/x65</li> <li>MinGW GCC 4.5 x86 -> 7.1 x86/x64</li> <li>OpenSolars 2009 to Solaris 11</li> <li>FreeBSD 10 -> FreeBSD 11.1</li> <li>Added travis.yml for Mac OS X builds</li> </ol> </li> <li><p>Improved http timeouts handling on non Linux/Windows OSes.</p></li> <li>Fixed incorrect asynchronous IO handing in <code>*cgi</code> API.</li> <li>Added support of <code>SOL_SNDBUF/SOL_RCVBUF</code> to service configuration</li> <li>Fixed HTTP timeout handling on Solaris</li> <li>Fixed #24 failure to send large blocks asynchronously over FastCGI</li> <li>Fixied issue #21 Program produces 100% CPU load on one core - due to incorrect EOF handling</li> <li>Fixed icu backend test for ICU >= 60.1</li> <li>Fixed missing <code>getenv(std::string const &amp;)</code> issue #16</li> <li>Fixed issues with codecvt generation FreeBSD/clang</li> <li>Use Windows Vista/7 API by defaults since XP reached EOL.</li> <li>Fixed incorrect async connect error handling</li> <li>Lineup with Boost.Locale 1.65</li> <li>Updated session interface for external languages and unit tests</li> </ul> </div> CppCMS 1.1.0 Beta was released http://blog.cppcms.com/post/119 http://blog.cppcms.com/post/119 <div style="direction:ltr"> <p>After the goals for 1.2 were completed I announce official CppCMS 1.1.0 beta (stable will be 1.2.0)</p> <p>It is available on the usual place:</p> <p><a href="https://sourceforge.net/projects/cppcms/files/cppcms/1.1.0-beta/cppcms-1.1.0.tar.bz2">https://sourceforge.net/projects/cppcms/files/cppcms/1.1.0-beta/cppcms-1.1.0.tar.bz2</a></p> <p>It includes many new and important features:</p> <p><a href="http://cppcms.com/wikipp/en/page/cppcms_1_2_whats_new">http://cppcms.com/wikipp/en/page/cppcms_1_2_whats_new</a></p> <p>Now I ask the community to fully participate in beta testing so 1.2 will be released ASAP.</p> <h3>Goals for beta testing</h3> <h4>Framework Unit Test:</h4> <ol> <li>Download the beta version, build, run tests</li> <li>Report on what platform you tested: OS, Compiler version, standard library (libstdc++/libc++)</li> <li>Have you had any tests failed and if you had please attach Testing/Temporary/LastTest.log and CMakeCache.txt from your build directory</li> </ol> <p>I specially need tests on Mac OS X various versions, various ARM platforms like raspberry pi and Windows different compilers</p> <h4>Compatibility Test:</h4> <ol> <li>Try to build your existing applications with latest version, report any problems</li> <li>If you have been using CppCMS 1.0.5 till now please try to build CppCMS 1.1.0 and run existing programs with new shared objects/dll WITHOUT rebuilding your applications - it must work as is!</li> </ol> <h4>Feature Test:</h4> <p>Go to: <a href="http://cppcms.com/wikipp/en/page/cppcms_1_2_whats_new">http://cppcms.com/wikipp/en/page/cppcms_1_2_whats_new</a></p> <p>And try some of new features, report any issues with them or any problems with API design.</p> <p>If all goes smoothly I'll release 1.2.0 - official stable version.</p> </div> CppCMS code migrated to GitHub http://blog.cppcms.com/post/118 http://blog.cppcms.com/post/118 <div style="direction:ltr"> <p>After multiple requests and my final decision the CppCMS web framework code migrated to GitHub</p> <p><a href="https://github.com/artyom-beilis/cppcms">https://github.com/artyom-beilis/cppcms</a></p> <p>Please note:</p> <ol> <li>Only CppCMS framework migrated, other subprojects like CppDB or Wikipp are still pending conversion</li> <li>The main bug tracker is still on source-forge - however I'll relate to issues opened on GitHub</li> </ol> </div>