Jan 092015
 

i was recently trying to install the event module for php, and ran into a weird error:

$ pecl install event
...
configure: error: bufferevent_openssl_get_ssl not found in event_openssl library, or the library is not installed

after scratching my head for a while, i found out this was because i compiled libevent2 *before* installing openssl-devel. the error went away after a make clean && make && make install for libevent2.

Aug 122013
 

in an attempt to try out pthreads, i have gone on an adventure – compiling php by hand! i wanted to share my experience with the internet, as i had some trouble finding a good guide.

i opted to use the latest stable release of php 5.5 – but you can use most any version you want. it’s worth it to note that i am using centos 6.4

first thing’s first: let’s clean up any packages that would interfere, and install epel. we will need epel for a few extensions.

#remove old php packages
yum erase php*
#install epel - note, this link may be outdated! 
# see here for the latest: http://goo.gl/SAN1Qc
rpm -Uvh http://mirrors.servercentral.net/fedora/epel/6/i386/epel-release-6-8.noarch.rpm

now, we need to get the php source – i like to unpack it into /usr/src in case i need it in the future.

#download the php source - i used php 5.5.1
cd /usr/src
wget -Ophp-5.5.1.tar.bz2 "http://us3.php.net/get/php-5.5.1.tar.bz2/from/this/mirror"
#unpack the source
tar jxf php-5.5.1.tar.bz2
#move in to the source directory
cd php-5.5.1

now, this is the tricky part – usually, compiling is pretty simple. just ./configure, then make && make install. however, with php, there are many configure options you need to know about – and many of them are either undocumented, or hard to tack down. for me, i was mostly interested options required for pthreads, and some basic functionality i needed for my phpircd project. these are:

–prefix=/usr: use /usr as the prefix for installation – this will place the php executable in /usr/bin
–enable-maintainer-zts: enable zend thread safe – required for pthreads
–with-pear: build in pear (so we can install pthreads, and any other pear/pecl extension we need)
–with-config-file-path=/etc: this will place php.ini in /etc
–with-readline: this will enable readline, making the php interactive console much better
–with-mcrypt: builds in mcrypt hashing functions
–with-zlib: builds in zlib compression libraries
–enable-mbstring: builds in multi-byte string support
–with-curl: builds in curl functions
–with-bz2: builds in bzip2 support
–enable-zip: builds in zip support
–enable-sockets: builds in the socket extension
–enable-sysvsem: enables system v messages support
–enable-sysvshm: enables system v shared memory support
–with-mhash: builds in mhash hashing
–with-pcre-regex: builds in pcre regex support
–with-gettext: builds in gettext support
–enable-bcmath: enables the bcmath extension
–enable-libxml: enables in the xml extension
–enable-json: enables in the json extension
–with-openssl: builds in ssl support
–enable-pcntl: enables the pcntl extension

you will need some base build tools, and devel packages to use the above configure options. let’s install those:

yum install gcc make install zlib-devel readline-devel libxml12-devel \
openssl-devel libmcrypt-devel curl-devel bzip2-devel xml2-config 

now, configure and build php:

#configure
./configure --prefix=/usr --enable-maintainer-zts --with-pear \
--with-config-file-path=/etc --with-readline --with-mcrypt \
--with-zlib --enable-mbstring --with-curl --with-bz2 \
--enable-zip --enable-sockets --enable-sysvsem --enable-sysvshm \
--with-mhash --with-pcre-regex --with-gettext --enable-bcmath \
--enable-libxml --enable-json --with-openssl --enable-pcntl

#build!
make

#install!
make install

now, lets install pthreads using pecl:

pecl install channel://pecl.php.net/pthreads-0.0.44

after that, you’re good to go! use pthreads and your fancy new compiled php 5.5 :)

# php -a
Interactive shell

php > class MyThread extends Thread {
php {   function run(){
php {   }
php { }
php > $thread = new MyThread;
php > $thread->start();
php >
Dec 062011
 

compression algorithms and methods have been on my mind lately. i’m still not the best at c++, so please excuse my (horrible) php code- i know my way around php better than c++ at the moment.

i started out with another project i’m working on- a c++-based video streaming server, for netflix-like movie watching of whatever movies are on your hard drive. i was trying to figure out a good way to save bandwidth on the audio/video transfer. i decided to mess around with some base64-based compression. starting out with a directory structure like this:

[slowbro@alta cpp-stream-server]$ ll
total 4
-rwxrwxr-x 1 slowbro slowbro  3016 Dec  6 14:34 compress.php
-rwxrwxr-x 1 slowbro slowbro  9248 Dec  4 00:58 server

my goal is to compress the ‘server’ executable file, which is about 9kb. ‘compress.php’ is my program. first, i made a zip for comparison purposes, and a base-64 encoded version of ‘server':

[slowbro@alta cpp-stream-server]$ cat server | base64 -w0 > server.b64
[slowbro@alta cpp-stream-server]$ zip server.zip server
  adding: server (deflated 65%)
[slowbro@alta cpp-stream-server]$ ll
total 44
-rwxrwxr-x 1 slowbro slowbro  3016 Dec  6 14:34 compress.php
-rwxrwxr-x 1 slowbro slowbro  9248 Dec  4 00:58 server
-rw-rw-r-- 1 slowbro slowbro 12334 Dec  5 13:42 server.b64
-rw-rw-r-- 1 slowbro slowbro  3358 Dec  6 14:45 server.zip

so 65% is the number to beat- or match. approximately 3kb.

starting with the base64 file, i used the condense-repeating method to slim it down. this method replaces repeating characters with a 3-5 character representation. for example, AAAAAAAABBBBBB3FFF5kjdAAAA3v is replaced with A-8B-6|3FFF5kjdAAAA3v this is based on the following rules:

  • the repetition spans more than three characters
  • if the immediate next character after a replacement is a number, append a pipe (|) symbol to the end
  • the replacement must span less space than the original. for example ‘AAAA3′ was not replaced with ‘A-3|3′ because it uses the same amount of space

using these basic rules, i started out with the pass 1 code:

<?php
$rh = fopen("server.b64", 'r');
$wh1 = fopen("server.compress1", 'w+');
$wh2 = fopen("server.compress2", 'w+');
$curpos = 0;
$orig = filesize("server");
$end = filesize("server.b64");

#placeholder

echo "pass 1: condense repeating chars\n";
while($curpos < $end){
    $curpos += 1024;
    $data = trim(fread($rh, 1024));
    $dend = end(str_split($data));
    while($dend == ($d = trim(fread($rh, 1)))){
        $data .= $d;
        $curpos++;
    }
    fseek($rh, $curpos);
    echo "\tread ".strlen($data)." bytes ({$curpos}b read)\n";
    //condense repeating
    if(preg_match_all('{(.)\1{3,}}', $data, &$matches, PREG_OFFSET_CAPTURE)){
        $data = str_split($data);
        $osshift = 0;
        foreach($matches['0'] as $m){
            $l = strlen($m['0']);
            $r = $m['0'][0].'-'.$l.(is_numeric(@$data[$m['1']+$l+$osshift])?'|':'');
            if($r > $l)
                continue;
            array_splice($data, $m['1']-$osshift, $l, str_split($r));
            $osshift += ($l-strlen($r));
        }
        $data = implode("", $data);
    }
    fwrite($wh1, $data);
}

$end = $fsz = filesize("server.compress1");
echo "pass 1 completed: ".round((1-($fsz/$orig))*100,1)."% deflated ({$orig}b -> {$fsz}b)\n\n";

the output was okay- it achieved about 14% compression (based on the original, 9kb file- not the 12kb b64 file):

[slowbro@alta cpp-stream-server]$ ./compress.php
pass 1: condense repeating chars
        read 1036 bytes (1036b read)
        read 1024 bytes (2060b read)
        read 1025 bytes (3085b read)
        read 1024 bytes (4109b read)
        read 1097 bytes (5206b read)
        read 1024 bytes (6230b read)
        read 1025 bytes (7255b read)
        read 1024 bytes (8279b read)
        read 1040 bytes (9319b read)
        read 1024 bytes (10343b read)
        read 1024 bytes (11367b read)
        read 966 bytes (12391b read)
pass 1 completed: 14% deflated (9248b -> 7950b)

the next method i wanted to try was finding similar patterns. for simplicity in testing, i added this argv code where #placeholder is above:

//ARGV
$KEYWORD_LEN = 8;
$MAX_NUM_KW = 50;
$MIN_KW_NUM = 2;
array_shift($argv);
while(count($argv) != 0){
    switch(($p = array_shift($argv))){
        case '-l':
            $KEYWORD_LEN = array_shift($argv);
        break;
        case '-n':
            $MAX_NUM_KW = array_shift($argv);
        break;
        case '-k':
            $MIN_KW_NUM = array_shift($argv);
        break;
        default:
            echo "Unknown Parameter: $p\n";
            exit;
    }
}

i then added the pass 2 code. this code is a little.. messy. too many foreaches, but it’s not the final product and was just for testing. what it does is read from pass 1’s compressed b64, $KEYWORD_LEN bytes at a time, storing the number of times each of them occur in an associative array. it then iterates through that array, removing any that have less than $MIN_KW_NUM occurrences. it iterates through the array once more, adding up to $MAX_NUM_KW keywords to the translation table. finally, it writes the final file, $KEYWORD_LEN bytes at a time, replacing patterns with translations as it goes.

the code:

$curpos = 0;
rewind($wh1);

echo "pass 2: similar patterns\n";
$occ = array();
while($curpos < $end){
    $data = trim(fread($wh1, $KEYWORD_LEN));
    $curpos += strlen($data);
    if(isset($occ[$data]))
        $occ[$data] += 1;
    else
        $occ[$data] = 1;
}
$oc = count($occ);
echo "\t$oc keywords found\n";
arsort($occ);
foreach($occ as $k=>$v){
    if($v <= $MIN_KW_NUM)
        unset($occ[$k]);
}
echo "\t".($oc-count($occ))." keywords dropped\n";
echo "\tbuild translation table..";
$i = 256;
$stop = $i+$MAX_NUM_KW;
foreach($occ as $k=>&$v){
    if($i == $stop){
        unset($occ[$k]);
        continue;
    }
    fwrite($wh2, "$k:".chr($i).",");
    $v = $i;
    $i++;
}
fwrite($wh2, "\n");
echo " done\n";
echo "\ttranslating ".count($occ)." keywords..";
rewind($wh1);
$curpos = 0;
while($curpos < $end){
    $data = trim(fread($wh1, $KEYWORD_LEN));
    $curpos += strlen($data);
    if(isset($occ[$data]))
        fwrite($wh2, chr($occ[$data]));
    else
        fwrite($wh2, $data);
}
echo " done\n";
$fsz = filesize("server.compress2");
echo "pass 2 completed: ".round((1-($fsz/$orig))*100,1)."% deflated ({$orig}b -> {$fsz}b)\n\n";
fclose($wh1);
fclose($wh2);
fclose($rh);
?>

this produced pretty crappy results with the default parameters:

[slowbro@alta cpp-stream-server]$ ./compress.php
pass 1: condense repeating chars
        read 1036 bytes (1036b read)
        read 1024 bytes (2060b read)
        read 1025 bytes (3085b read)
        read 1024 bytes (4109b read)
        read 1097 bytes (5206b read)
        read 1024 bytes (6230b read)
        read 1025 bytes (7255b read)
        read 1024 bytes (8279b read)
        read 1040 bytes (9319b read)
        read 1024 bytes (10343b read)
        read 1024 bytes (11367b read)
        read 966 bytes (12391b read)
pass 1 completed: 14% deflated (9248b -> 7950b)

pass 2: similar patterns
        920 keywords found
        910 keywords dropped
        build translation table.. done
        translating 10 keywords.. done
pass 2 completed: 16.2% deflated (9248b -> 7753b)

…so it was testing time. i exited the script to only output the percentage times 10 (so 16.7 percent became 167), since bash doesn’t play nice with decimals. i then ran the following for bash code:

best=0;
for l in `seq 1 12`;do
  for k in `seq 2 5`; do
    for n in `seq 30 300`; do
      echo -en "$l\t$k\t$n\t";
      p=`./compress.php -l $l -k $k -n $n`;
      echo $p/10 | bc -l;
      if [[ "$p" -gt "$best" ]];then
        besttext="best is l:$l, k:$k, n:$n, p:$p";
        best=$p;
      fi;
    done;
  done;
done;
echo $besttext

after letting that chug for a while, i got:

best is l:2, k:2, n:137, p:311

meaning the best way to run it, at least for this file, was ./compress.php -l 2 -k 2 -n 137, resulting in a 31.1% compression. not bad for just the first two passes, i think. for time comparison, it takes about 0.076s to compress it:

[slowbro@alta cpp-stream-server]$ time ./compress.php -l 2 -k 2 -n 137
311
real    0m0.076s
user    0m0.061s
sys     0m0.015s

zip is a bit more optimized, lol:

[slowbro@alta cpp-stream-server]$ time zip server.zip server
  adding: server (deflated 65%)

real    0m0.002s
user    0m0.001s
sys     0m0.001s

that’s it for now. i’ll continue thinking of ways to compress it, and make a decompression tool to make sure it’s actually recoverable :)