machine learning in php php con poland
TRANSCRIPT
![Page 1: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/1.jpg)
Machine Learning in PHP
Poland, Warsaw, October 2016
"Learn, someday this pain will be useful to you"
![Page 2: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/2.jpg)
Agenda
• How to teach tricks to your PHP
• Application : searching for code in comments
• Complex learning
![Page 3: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/3.jpg)
Speaker
• Damien Seguy
• Exakat CTO
• Static analysis of PHP code
![Page 4: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/4.jpg)
Machine Learning
• Teaching the machine
• Supervised learning : learning then applying
• Application build its own model : training phase
• It applies its model to real cases : applying phase
![Page 5: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/5.jpg)
Applications
• Play go, chess, tic-tac-toe and beat everyone else
• Fraud detection and risk analysis
• Automated translation or automated transcription
• OCR and face recognition
• Medical diagnostics
• Walk, welcome guest at hotels, play football
• Finding good PHP code
![Page 6: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/6.jpg)
Php Applications
• Recommendations systems
• Predicting user behavior
• SPAM
• conversion user to customer
• ETA
• Detect code in comments
![Page 7: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/7.jpg)
Real Use Case
• Identify code in comments
• Classic problem
• Good problem for machine learning
• Complex, no simple solution
• A lot of data and expertise are available
![Page 8: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/8.jpg)
Supervised Training
Historydata Training
ModelReal data Results
![Page 9: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/9.jpg)
Supervised Training
Historydata Training
ModelReal data Results
![Page 10: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/10.jpg)
The Fann Extension
• ext/fann (https://pecl.php.net/package/fann)
• Fast Artificial Neural Network
• http://leenissen.dk/fann/wp/
• Neural networks in PHP
• Works on PHP 7, thanks to the hard work of Jakub Zelenka
• https://github.com/bukka/php-fann
![Page 11: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/11.jpg)
Neural Networks
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
![Page 12: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/12.jpg)
Neural Networks
• Imitation of nature
• Input layer
• Output layer
• Intermediate layers
![Page 13: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/13.jpg)
<?php
$num_layers = 1; $num_input = 5; $num_neurons_hidden = 3; $num_output = 1; $ann = fann_create_standard($num_layers, $num_input, $num_neurons_hidden, $num_output);
// Activation function fann_set_activation_function_hidden($ann, FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC);
Initialisation
![Page 14: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/14.jpg)
Preparing Data
Raw data Extract Filter Human review Fann ready
• Extract data from raw source
• Remove any useless data from extract
• Apply some human review to filtered data
• Format data for FANN
![Page 15: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/15.jpg)
Expert At Work// Test if the if is in a compressed format
// nie mowie po polsku
// There is a parser specified in `Parser::$KEYWORD_PARSERS`
// $result should exist, regardless of $_message
// TODO : fix this; var_dump($var);
// $a && $b and multidimensional
// numGlyphs + 1
//$annots .= ' /StructParent ';
// $cfg['Servers'][$i]['controlpass'] = 'pmapass';
// if(ob_get_clean()){
![Page 16: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/16.jpg)
Input Vector
• 'length' : size of the comment
• 'countDollar' : number of $
• 'countEqual' : number of =
• 'countObjectOperator' number of -> operator ($o->p)
• 'countSemicolon' : number of semi-colon ;
![Page 17: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/17.jpg)
Input Data
47 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...
Number Of Input Number Of Incoming Data Number Of Outgoing Data
* (at your option) any later version. * * Exakat is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Affero General Public License for more details. * * You should have received a copy of the GNU Affero General Public License * along with Exakat. If not, see <http://www.gnu.org/licenses/>. * * The latest code can be found at <http://exakat.io/>. * */
// $x[3] or $x[] and multidimensional
//if ($round == 3) { die('Round '.$round);}
//$this->errors[] = $this->language->get('error_permission');
![Page 18: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/18.jpg)
Black Magic
1 5 1 37 2 0 0 0 0
// $X[3] Or $X[] And Multidimensional
EXT/FANN
It's A Comment
![Page 19: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/19.jpg)
Training<?php
$max_epochs = 500000; $desired_error = 0.001;
// the actual training if (fann_train_on_file($ann, 'incoming.data', $max_epochs, $epochs_between_reports, $desired_error)) { fann_save($ann, 'model.out'); } fann_destroy($ann); ?>
![Page 20: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/20.jpg)
![Page 21: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/21.jpg)
![Page 22: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/22.jpg)
![Page 23: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/23.jpg)
Training
• 47 cases
• 5 characteristics
• 3 hidden neurons
• + 5 input + 1 output
• Duration : 5.711 s
![Page 24: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/24.jpg)
Application
Historydata Training
ModelReal data Results
![Page 25: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/25.jpg)
Application<?php
$ann = fann_create_from_file('model.out');
$comment = '//$gvars = $this->getGraphicVars();';
$input = makeVector($comment); $results = fann_run($ann, $input);
if ($results[0] > 0.8) { print "\"$comment\" -> $results[0] \n"; }
?>
![Page 26: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/26.jpg)
Results > 0.8
• Answer between 0 and 1
• Values ranges from -14 to 0,999
• The closer to 1, the safer. The closer to 0, the safer.
• Is this a percentage? Is this a carrots count ?
• It's a mix of counts…
![Page 27: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/27.jpg)
Scores Distribution
- 1 6
- 1 2
- 8
- 4
0
6 0 . 0 0 0 0 0 0
7 0 . 0 0 0 0 0 0
8 0 . 0 0 0 0 0 0
9 0 . 0 0 0 0 0 0
1 0 0 . 0 0 0 0 0 0
![Page 28: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/28.jpg)
Real Cases
• Tested on 14093 comments
• Duration 68.01ms
• Found 1960 issues (14%)
![Page 29: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/29.jpg)
0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';
0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();
/* 0.99999928 if (defined('SESSIONUPLOAD')) { // write sessionupload back into the loaded PMA session
$sessionupload = unserialize(SESSIONUPLOAD); foreach ($sessionupload as $key => $value) { $_SESSION[$key] = $value; }
// remove session upload data that are not set anymore foreach ($_SESSION as $key => $value) { if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX)) == UPLOAD_PREFIX && ! isset($sessionupload[$key]) ) {
![Page 30: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/30.jpg)
0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232
0.99361396 // We have server(s) => apply default configuration 0.98383027 // Duration = as configured
0.99999928 // original -> translation mapping
0.97590065 // = ( 59 x 84 ) mm = ( 2.32 x 3.31 ) in
![Page 31: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/31.jpg)
TRUE POSITIVE FALSE POSITIVE
TRUE NEGATIVE FALSE NEGATIVE
FOUND BY
FANN
(MACHINE
LEARNING)
TARGET (EXPERT WORK)
![Page 32: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/32.jpg)
TRUE
POSITIVE
FALSE
POSITIVE
TRUE
NEGATIVE
FALSE
NEGATIVE
FOUND BY
FANN
TARGET
0.99999923
0.73295981
0.99999851
0.2104115
// $cfg['Servers'][$i]['table_coords'] = 'pma__table_coords';
//(isset($attribs['height'])?$attribs['height']: 1);
// if ($key != null) did not work for index "0"
// the PASSWORD() function
![Page 33: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/33.jpg)
Results
• 1960 issues
• 50+% of false positive
• With an easy clean, 822 issues reported
• 14k comments, analyzed in 68 ms (367ms in PHP5)
• Total time of coding : 27 mins.
// = ( 59 X 84 ) Mm = ( 2.32 X 3.31 ) In /* Vim: Set Expandtab Sw=4 Ts=4 Sts=4: */
![Page 34: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/34.jpg)
Learn Better, Not Harder
• Better training data
• Improve characteristics
• Configure the neural network
• Change algorithm
• Automate learning
• Update constantly Real data
Historydata
Training
Model Results
Retroaction
![Page 35: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/35.jpg)
Better Training Data
• More data, more data, more data
• Varied situations, real case situations
• Include specific cases
• Experience is capital
• https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
![Page 36: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/36.jpg)
Improve Characteristics
• Add new characteristics
• Remove the one that are less interesting
• Find the right set of characteristics
![Page 37: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/37.jpg)
Network Configuration
• Input vector
• Intermediate neurons
• Activation function
• Output vector
0
5 0 0 0
1 0 0 0 0
1 5 0 0 0
2 0 0 0 0
1 2 3 4 5 6 7 8 9 1 0
1 layer 2 layers 3 layers 4 layers
Time Of Training (Ms)
![Page 38: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/38.jpg)
Change Algorithm
• First add more data before changing algorithm
• Try cascade2 algorithm from FANN
• 0.6 => 0 found
• 0.5 => 2 found
• Not found by the first algorithm
• Ant colony, genetics algorithm, gravitational search, artificial immune, nie mowie po polsku, annealing, harmony search, interior point search, taboo search
![Page 39: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/39.jpg)
Finding The Best
• Test with 2-4 layers10 neurons
• Measure results
0
2 2 5 0
4 5 0 0
6 7 5 0
9 0 0 0
1 2 3 4 5 6 7 8 9 1 0 11 1 2 1 3
1 layer 2 layers 3 layers 4 layers
![Page 40: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/40.jpg)
Deep Learning
• Chaining the neural networks
• Translators, scorers, auto-encoders
• Unsupervised Learning
![Page 41: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/41.jpg)
Other Tools
• PHP ext/fann
• Langage R
• https://github.com/kachkaev/php-r
• Scikit-learn
• https://github.com/scikit-learn/scikit-learn
• Mahout
• https://mahout.apache.org/
![Page 42: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/42.jpg)
Conclusion
• Machine learning is about data, not code
• There are tools to use it with PHP
• Fast to try, easy results or fast fail
• Use it for complex problems, that accepts error
![Page 43: Machine learning in php php con poland](https://reader035.vdocuments.net/reader035/viewer/2022062306/588a1bc31a28abb21f8b467b/html5/thumbnails/43.jpg)
H T T P : / / W W W. E X A K AT. I O
@ E X A K AT
H T T P : / / W W W. S L I D E S H A R E . N E T / D S E G U Y /
P H P 7 . 1 P R E PA R AT I O N W O R K S H O P
D z i ę k i C z e m u