OpenSSL better than Intel IPP

Recently I was in a need to generate a lot of RSA keys. As a true scientist in the making, I decided to do my research, and investigate the prior work. The simple solution was to use OpenSSL and use the builtin RSA_generate_key / RSA_generate_key_ex functions. However, I needed the generation of RSA keys to be as fast as possible.

As my use case for generating RSA keys was not imposing any dependencies between the generated keys, I classified my problem as embarrassingly parallel and I even considered GPUs as a possible possible alternative. The Fair Comparison between CPU and GPU, at the time of the writing, suggests that a hexa-core CPU could match the performance of GTX580. Since on my disposal I have an Intel(R) Xeon(R) CPU E3-1285L v3 @ 3.10GHz and an NVIDIA GT640, I decided to go with a CPU solution. The same article (also at the time of the writing) suggests that Intel IPP implementation would beat everything else, by a huge margin. Therefore I stared the first implementation using Intel IPP Cryptography Libraries.

Intel IPP is quite modular, and the implementation for generating RSA private and public key pair is not as simple as the one of OpenSSL, requiring a deep dive into the documentation provided by Intel. After finding my way through initializing a Pseudo Random Generator, Prime Number Generator, and calculating the bit sizes of the P and Q factors, I finally got to the initial implementation.

While in OpenSSL, RSA keys can be generated using:

RSA_generate_key(bitsRSA, publicExponent, NULL, NULL);

.. the corresponding Intel IPP version using ippsRSA_GenerateKeys goes like this:

Note that Intel IPP does not provide the implementation for conversion of the private and public keys into PEM format, and if one wants to implement those, must use external software such as ASN.1 Compiler to use the DER encoding to produce the PEM file. Therefore, the version of RSA key generation above, is incomplete at best.

Once we have a hands-on version of RSA generation, it is time for reckoning. I was interested to see how fast would Intel IPP be over OpenSSL (assuming that IPP wins). To be fair, I decided to measure the performance in cycles in a very fine-grained fashion, testing only ippsRSA_GenerateKeys and only RSA_generate_key, skipping any form of initialization.

A short description of the methodology goes as follows:

  • Compile the code using icpc composer_xe_2015.0.077 on OS X Yosemite 10.10.5, running Intel(R) Core(TM) i7-3720Q CPU @ 2.60 GHz (Ivy Bridge). OpenSSL version is 1.0.2d (9 Jul 2015).
  • Compile the code using icpc composer_xe_2015.0.090 on Debian 8.3, running Intel(R) Xeon(R) CPU E3-1285L v3 @ 3.10GHz. OpenSSL version is 1.1.0-pre4 (beta) 16 Mar 2016
  • Compile the code using -O3 -xHost -no-multibyte-chars, and conisder single core implementation only.
  • Use RDTSC as a timing infrastructure
  • Run each of the RSA generation functions 10 times, and average the runtimes.
  • Repeat each 10 runs for 10 times, and take the median runtime as the final measure.

The obtained results (less is better) are given as follows (y-axis is logarithmic):

RSA Key Generation on Haswell

RSA Keys Generation on IvyBridge

It seems that Intel IPP is only better generating keys which are less than 512 bits of length, while OpenSSL outperforms Intel IPP in almost all cases above 512 bits, generating the keys in less amount of CPU cycles. Considering that RSA keys of less than 1024 bytes are considered as insecure, these results, to my surprise, render the implementation of RSA key pair generation of Intel IPP obsolete.

To be fair, while I use the latest version of OpenSSL, this comparison does not reflect the latest version of Intel IPP. Also to note, I did not look into details to investigate the internal differences between the RSA key generation function in OpenSSL and Intel IPP. My observation is done strictly from the point of view of Intel IPP and OpenSSL user, where the only requirement is key pair generation.

For full transparency of this short experiment, the code for timing the two functions is available here, and the raw data (uses R to generate the plots) is available here.



Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters

This post is long over due. The slides bellow are part of my talk given on ARRAY 2014 workshop, collocated with PLDI 2014 in Edinburgh, UK.


The full paper of this project can be found here, and abstract is attached bellow:

We present FGen, a program generator for high performance convolution operations (finite-impulse-response filters). The generator uses an internal mathematical DSL to enable structural optimization at a high level of abstraction. We use FGen as a testbed to demonstrate how to provide modular and extensible support for modern SIMD vector architectures in a DSL-based generator. Specifically, we show how to combine staging and generic programming with type classes to abstract over both the data type (real or complex) and the target architecture (e.g., SSE or AVX) when mapping DSL expressions to C code with explicit vector intrinsics. Benchmarks shows that the generated code is highly competitive with commercial libraries.

An Epic Ride: Lausanne – Zurich

In Switzerland its almost impossible to stay away from sports. Good news is that I love biking, which makes it a double win. And I just broke my suffer score record on Strava. Additional extras: winning the bet consisted of a crate of Erdinger. Thanks to Darko Makreshanski for being awesome on the bike all the way to Zurich.


The trip was consisted of heavy rain, from Lausanne to Neuchatel, and burning sun for the rest of the trip. Although I cursed on several languages because I completely soaked, I soon understood that the rain was like a God given natural air-conditioning system that cools off your body. After 200km, and almost 12h on the bike, none of that matters anymore, except for the agonising pain caused by the bike seat.


Nevertheless, there are no words to explain the joy of touching Zurich canton and knowing that you have already made it.


And finally, thanks to all Italians from Sicily, for inventing the pasta. Eating it, while riding, feels like adding nitro in the car in Fast & Furious. Well maybe I was not fast, but I was furious for sure.

Ajenti + Nginx + Phalcon

Recently I was supposed to deploy a Phalcon code on an Ajenti based VPS. Seems that Google is not familiar with this type of configuration on internet, so I decided to drop few lines on setting up Ajenti. This short tutorial is tested on:

  • Ubuntu 14.04.1 LTS
  • Nginx 1.4.6
  • Ajenti
  • Phalcon 1.3.0

Step 1: Make sure dependencies are met.

Use Ajenti to open Terminal, and install Phalcon from the repository:

sudo apt-add-repository ppa:phalcon/stable
sudo apt-get update 
sudo apt-get install php5-phalcon

If not using Ubuntu, follow the tutorial here to compile and install Phalcon.

Step 2: Make sure Phalcon is present in the php.ini files. Running

cat /etc/php5/mods-available/phalcon.ini

should output the following:

extension =

Step 3: Setup Nginx to use Phalcon

Make sure that you point to the public folder of the Phalcon app as the root of the website. Once the website is created, add the following in Advanced -> Custom configuration :

try_files $uri $uri/ @rewrite;
location @rewrite {
 rewrite ^/(.*)$ /index.php?_url=/$1;

And this should do it.

Intel PCM in userspace Linux

Recently I have been using Intel® Performance Counter Monitor Tool quite heavily to obtain code performance. The tool has been designed to work with Nehalem, Westmere, Sandy Bridge and Ivy Bridge micro-architecture  as well as with Atom (R), and I have to admit, it does its job flawlessly. However, one annoying drawback of using this tool in Linux is the fact that one must run the tool as root, to get performance results. I decided to fix this problem, and make it possible to obtain all the results in user space.

The problem

In order for Intel PCM to obtain results directly from the CPU, it must be able to issue ioctl requests to the processor. This is done through the Linux kernel, by reading / writing to


The Linux kernel, comes with an MSR kernel module, responsible for quering the model-specific registers of the CPU which are control registers in the x86 instruction set used for debugging, program execution tracing, computer performance monitoring, and toggling certain CPU features. Ideally, executing:

chmod go+rw /dev/cpu/*/msr

should allow users which are not root to be able to read / write from the MSR devices, however, this is not the case, and executing the command above will result in reporting the same error from Intel PCM tool:

Try to execute 'modprobe msr' as root user and then
you also must have read and write permissions for /dev/cpu/*/msr devices (/dev/msr* for Android). The 'chown' command can help.
Access to Intel(r) Performance Counter Monitor has denied (no MSR or PCI CFG space access).

I have inspected the code of the msr.c kernel module, and I figure it out that the problem is in static int msr_open(struct inode *inode, struct file *file) function:

static int msr_open(struct inode *inode, struct file *file)
	unsigned int cpu = iminor(file_inode(file));
	struct cpuinfo_x86 *c;

	if (!capable(CAP_SYS_RAWIO))
		return -EPERM;

	if (cpu >= nr_cpu_ids || !cpu_online(cpu))
		return -ENXIO;	/* No such CPU */

	c = &cpu_data(cpu);
	if (!cpu_has(c, X86_FEATURE_MSR))
		return -EIO;	/* MSR not supported */

	return 0;

The function capable makes sure that the user is root, and root only, and therefore, no matter what kind of permissions are set to /dev/cpu/*/msr, this function will return it the user is any other user than root.

The Solution

The simplest solution is to comment the function capable and avoid the check for root user. This solution is not safe. Having access to users other than root, can impose a serious vulnerability to the system. However, I am the only user having access to my machine, and I prefer productivity over safety.

In order to make the change, I need to recompile the kernel module, and replace the old msr.ko module. Additionally I can also benefit from udev, to set the proper permissions to the kernel module, so I don’t have to redo them every time the machine boots up. The good thing is that I can put everything into one simple Makefile:

obj-$(CONFIG_X86_MSR) += msr.o

	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

install: all
	rm -rf /lib/modules/$(shell uname -r)/kernel/arch/x86/kernel/msr.ko
	cp msr.ko /lib/modules/$(shell uname -r)/kernel/arch/x86/kernel/msr.ko
	echo "KERNEL==\"msr[0-9]*\", NAME=\"cpu/%n/msr\", MODE=\"0666\"" > /etc/udev/rules.d/20-msr.rules

The module can be built and installed using (must be logged in as root, not sudo):

make install

And after executing modprobe msr, everything else should magically work in user-space.

OpenVPN to route all / selective traffic to a client

This post is inspired from my urge to watch Macedonian TV (for free). Broadcast of Macedonian television is actually available on internet via Web, however it is limited to Macedonian IP addresses only. Since this scenario for tv broadcast or any kind of services, which are limited to a particular IP country block, applies to many other people living abroad, I decided to make my nightly research/experiment public.

The standard way to use services that operates in one or several countries only is to use proxy servers. However, in some cases there are either no proxies available in the home country, the services that one is planing to use can not be bypassed with a simple proxy server, the proxy services are slow to transfer broadcasts, or one just does not want to pay for a proxy service. stream in Switzerland stream limited on Macedonian IP, accessed in Switzerland

Obviously from the picture above, I have already resolved the problem, by making my home computer (the one in Macedonia) an internet gateway. This is almost a trivial task when the home computer has a public IP address, but in most cases, this simply does not apply (such as my Macedonian IP). Assuming that one can get his hands on a computer with a public IP address, a cheap (almost free) solution is routing via OpenVPN. What you need is the following:

  1. A computer with a public IP address (the server), located anywhere in the world. 
  2. A computer in the home country (the gateway), with a decent internet connection.
  3. One or several clients (the client), with an Internet connection

As I currently live in Switzerland, I am more than happy to use the services of upc cablecom ISP, especially since they are so nice to provide me a public IP address. Therefore, I can simply setup an OpenVPN server in Switzerland, and use it as a tunnel to redirect traffic to the home computer. My current setup looks like this:

OpenVPN Setup

OpenVPN Setup

From the drawing above you can infer that the server is ch-server, the gateway is mk-gateway and finally the client is astojanov-mac. The gray lines and black labels represent the physical connections, and the red lines and red labels specify the OpenVPN network. 

  • is the public address in Switzerland, and the router behind it has port-forwarding to ch-server, port 1194 to the ch-server having local IP address
  • ch-server and astojanov-mac are behind the same router and are part of the network. Note that the client astojanov-mac can access the OpenVPN server from any network node on the Internet. Thus the route to access the ch-server goes through the Internet cloud.
  • mk-gateway is part of the local network in Macedonia and has no public IP address attached on the router.
  • The OpenVPN overlaid network is represented with The server has a static ip address:, as well as the gateway The client astojanov-mac as every other OpenVPN client are assigned dynamic ip address.

The first step is installing and setting up OpenVPN. In order to run the server, a very modest computer can fulfill the needs of a VPN server if less than 10 VPN connections are anticipated. Personally I am using Intel(R) Pentium(R) M processor 1200MHz, with 1.6 GB of RAM. OpenVPN is cross platform and has no OS requirements. ch-server runs Debian GNU/Linux 6.0.6 (squeeze) and I base my instruction on this distribution. Note that to run OpenVPN server / client there are many alternatives with less power consumption requirements. For example the Raspberry Pi, which has 700 Mhz ARM CPU and costs about $25, or your router if you can set up DD-WRT on it.

Installing OpenVPN sever and setting up server / client keys & certificates

Installing and setting up OpenVPN has almost cross-platform steps. Also there are many tutorial available out there. My setup is based on the following tutorial, but you can also find additional tutorial on Linux, Windows and Mac OS X. First install OpenVPN and OpenSSL:

apt-get install openvpn openssl

Now we need to create server certificates. Before that we edit the variables of the certificates we are about to create.

cd /etc/openvpn
cp -r /usr/share/doc/openvpn/examples/easy-rsa/2.0 ./easy-rsa
vim easy-rsa/vars

Edit accordingly:

export KEY_COUNTRY="CH" 
export KEY_CITY="Zurich" 
export KEY_ORG="AlenBlog" 
export KEY_EMAIL="me@myhost.mydomain"

Save and quit. Load the variables, clean, and build the server key and certificate:

source ./easy-rsa/vars
./easy-rsa/build-key-server ch-server

As soon as the server ca.crt and ch-server.key are set, we can proceed towards creating the client keys. Note that we will have one special client, namely the mk-gateway client. This step must be repeated for ever clients you want to allow on your VPN. Note that the clients are identified by the ‘Common Name’.

./easy-rsa/build-key mk-gateway
./easy-rsa/build-key astojanov-mac

Now let’s create Diffie Hellman parameters:


After this is done, we are ready to setup the OpenVPN configuration file.

OpenVPN server configuration

Since the main goal is to watch Macedonian TV, the router will be configured such that I do not have to turn on / off the VPN connection whenever I want to watch the TV stream. However, it was is also useful to have the opportunity to transfer the whole traffic to the mk-gateway. Therefore the client will make the decision of what routes will be redirected to the mk-gateway. In other words the OpenVPN will route complete or selective trafic to a client. The server configuration file is as simple as possible.

dev tun
proto udp
port 1194
ca /etc/openvpn/ca.crt
cert /etc/openvpn/ch-sever.crt
key /etc/openvpn/ch-server.key
dh /etc/openvpn/dh1024.pem
user nobody
group nogroup
status /var/log/openvpn-status.log
ifconfig-pool-persist /etc/openvpn/ipp.txt
client-config-dir /etc/openvpn/ccd
verb 3

The server uses UDP as a transport protocol (although less reliable than TCP, it is quite faster than TCP), running on port 1194. For in-depth knowledge for the rest of the directives, I strongly advise the OpenVPN manpage.

Note the client-config-dir directive. It provides the flexibility to add specific configurations to the clients. We configure the mk-gateway here.

mkdir -p /etc/openvpn/ccd
vim /etc/openvpn/ccd/mk-gateway

In order to make mk-gateway route any specific traffic, we use the iroute directive. Ideally we would like to route 0/1 to the client and set something like:


However, THIS DOES NOT WORK in OpenVPN. Good news is that instead of using one general route, we can set routes from to using netmask (thanks to fuzzie’s post for his insights) which will do exactly the same. We also like a static IP for the mk-gateway:

. . . . . . 


By now, the configuration for the OpenVPN server is complete. The server can be started with:

/etc/init.d/openvpn start

Setting up the remote gateway (mk-gateway). Optional DD-WRT setup.

Initially I configured the gateway to run on Windows 7 machine. Since this machine will be forwarding packets, the OS must be configured to enable forwarding. According to Bebop’s post, the following tweaks should do the job:

  • Start -> Right-click My Computer -> Manage -> Services. Right-click Routing and Remote Access -> Properties -> Automatic. Right-click Routing and Remote Access -> Start
  • Control Panel -> Network and Sharing Center -> Local Area Connection -> Properties -> Sharing. Tick the box “Allow other network users to connect through this computer’s Internet connection”. From the drop-down list select “Local Area Connection 2”, or whatever is the connection name of your TUN connection.
  • Run regedit. Navigate to HKEY_LOCAL_MACHINE and then to SYSTEM\CurrentControlSet\Services\Tcpip\Parameters. Change value of IPEnableRouter to 1.

OpenVPN GUI for Windows is a decent OpenVPN client for Windos, including GUI, as mentioned in its title. In order to set it up, download it, install it and copy the files /etc/openvpn/ca.crt, /etc/openvpn/mk-gateway.crt and /etc/openvpn/mk-gateway.key into C:\Program Files\Open VPN\config\ and finally create the config file config.opvn

dev tun
proto udp
remote ch-sever 1194
resolv-retry infinite
ca ca.crt
cert mk-gateway.crt
key mk-gateway.key
verb 3
route-method exe
route-delay 2

The GUI client will enable / disable the tun device and setup the routes in the system. Therefore in Windows 7 / Vista it need administrator permissions. Make sure that you right-click on the GUI executable in C:\Program Files\OpenVPN and check the ‘Run as administrator’ checkbox.

OpenVPN client running on DD-WRT 

Having a computer running 24/7 just for routing is not really desirable. With a decent router having OpenVPN support, one can bypass the need for an extra computer. I personally have WRT54GL in Macedonia. It is possible to install DD-WRT on the router and ‘unleash’ the OpenVPN client support in the router itself. Setting up DD-WRT in some cases is router specific, and its installation has been well documented on the DD-WRT wiki.

In order to setup DD-WRT on a router, one needs to flush the current firmware, and replace it with a DD-WRT, which might be a risky business. Note that the first step is flushing the router with the ‘mini firmware’ available at the DD-WRT website, and then the next step is installing the OpenVPN supported firmware. When the final firmware is installed, setting up the OpenVPN client can be done via the Web interface in Services -> VPN. The files /etc/openvpn/ca.crt, /etc/openvpn/mk-gateway.crt and /etc/openvpn/mk-gateway.key can simply be copy-pasted into the corresponding fields:

DD-WRT Setup

DD-WRT Setup

Depending on the router CPU, you can enable or disable the LZO Comporession. I have actually disabled it in my final configuration since WRT54GL has 200MHz CPU.

The above creates a connection to the OpenVPN server ch-server as soon as the router is rebooted. Before rebooting, packet forwarding within the router must be enabled. This can be done in Administration -> Commands, by setting the following commands for the Firewall.

iptables -I FORWARD 1 --source -j ACCEPT
iptables -I FORWARD -i br0 -o tun0 -j ACCEPT
iptables -I FORWARD -i tun0 -o br0 -j ACCEPT

By clicking Save Firewall, the commands will allow packets to flow to/from the ch-sever to the mk-gateway.

Setting up the client such that the whole traffic is redirected to the remote gateway

The client astojanov-mac, runs Mac OS X. I am using Tunnelblick as an OpenVPN client GUI. To set it up, the generated key & certificates: /etc/openvpn/ca.crt, /etc/openvpn/astojanov-mac.crt and /etc/openvpn/astojanov-mac.key must be copied to the client, and config.opvn file must be created:

dev tun
proto udp
remote ch-server 1194
resolv-retry infinite
ca ca.crt
cert astojanov-mac.crt
key astojanov-mac.key
verb 3
redirect-gateway def1
dhcp-option DNS

Note the redirect-gateway def1 directive. This directive forces the client to change its default gateway and redirect it to the OpenVPN server. Since the mk-gateway takes all the routes from to, the whole traffic will be redirected to mk-gateway.

Setting up the client to route selective traffic via a remote gateway

For this scenario, I use most of the previous settings for redirecting the whole traffic and Tunnelblick, with a modified config.opvn file. In order to perform selective routing, instead of redirecting the gateway, we need to rewrite the routing rules to the specific selective trafic that we are planning to redirect. I personally wanted scenario where all Macedonian web sites hosted in Macedonia will be redirected through the mk-gateway. To identify all Macedonian hosts, I used the NirSoft country based IP blocks. The blocks are given by their price ranges, and the number of hosts for each block. With a simple math, it is easy to derive the netmasks for each block taking into considerations the number of nodes.

Finally the config file looks like the following:

dev tun
proto udp
remote ch-server 1194
resolv-retry infinite
ca ca.crt
cert astojanov-mac.crt
key astojanov-mac.key
verb 3
route # Cabletel DOOEL Skopje
route # Cabletel DOOEL Skopje
route # Cabletel DOOEL Skopje
route # GIV Ivan LTD Gostivar
route # NETCETERA DOOEL Skopje
route # Inel Dooel Kavadarci
route # KDS-Kabel Net DOOEL
route # Macedonia On-Line
route # Makedonski Telekom
route # Makedonski Telekom
route # Makedonski Telekom
route # Makedonski Telekom
route # Makedonski Telekom
route # Makedonski Telekom
route # MEGANET
route # Miksnet
route # Miksnet
route # NEOTEL Skopje
route # NEOTEL Skopje
route # NEOTEL Skopje
route # NEOTEL Skopje
route # ONE Telecom
route # ONE Telecom
route # ONE Telecom
route # ONE Telecom
route # ONE Telecom
route # PET NET DOO Gevgelija
route # T-Mobile Makedonija
route # T-Mobile Makedonija
route # TRD " Net Kabel"
route # UltraNet d.o.o.
route # "Sv. Kiril i Metodij"

Linux clients

For the linux users, particularly, the linux clients, setting up openvpn in a client mode is straight forward. We use the same keys and certificates as explained above. The content of the config file remains the same and its renamed to client.conf. All the files should be placed into /etc/openvpn and the client is started with:

/etc/init.d/openvpn start client

Automating Return Oriented Attacks on x86 Architecture

After a long break, I decided to continue enriching the content of my blog. To conclude my investigation in the automation of Return Oriented Programming Attacks, I am publishing a short PDF presentation for my master thesis work.

Download link: Automating Return Oriented Programming Attacks Presentation

Return oriented programming (ROP) is an exploit technique which avoids code injection by reusing existing code to induce arbitrary behavior in a program. ROP attacks are conducted by chaining available instruction sequences (gadgets) ending in a “return” instruction. While the construction of ROP attacks has been automated, these approaches rely on searching gadgets using predefined sequences which operate on a fixed set of registers, on the grounds that large and widely distributed chunks of binary code are likely to contain them. As a result, libraries and operating system kernels have been targeted as gadget providers.

We propose an automatic gadget construction, targeting stand-alone executable, without relying on libraries or the system kernel. Due to the possible limit of available gadgets, stand-alone executables are likely to be restricted on instructions operating on distinct registers. Subsequently, chaining instructions so that the result of one instruction is used in the consecutive instructions can be achieved only by moving data across registers. For that purpose, we build a graph representing register manipulation instruction sequences (mov, xchg, add, sub, etc). Each register represents a node, and each data movement across registers represents an edge. The strongly connected components in the graph provide the available registers, and the shortest paths among those registers describe instruction chaining with minimal data movements. Customizing the gadget search to the available registers increases the flexibility when automatically constructing attacks, allowing the attacks to be applied on stand-alone executable, and minimal data movements help optimize the generated attacks.

Full text of master thesis: Automating Return Oriented Attacks on x86 Architecture