machine learning in ruby
TRANSCRIPT
Machine Learning in Ruby
An Introduction
kNN Algorithm
Who am I?
Brad Arner
Head of Product
website: http://www.bradfordarner.com email: [email protected] twitter: @bradfordarner github: github.com/arnerjohn
The Goal
Make Machine Learning Less Scary
The Caveat
Machine Learning is a Huge Topic
I will talk about a general, workhorse algorithm
k Nearest Neighbor (kNN)
Fill in missing information
kNN Algorithm
What is the value of a home?
$250K
$275K
$280K
$240K
$235K
$215K
$240K
$195K
$300K
$210K
$250K
$????
1650 sq ft 2100 sq ft 1950 sq ft 1700 sq ft 2100 sq ft 1800 sq ft
2100 sq ft1700 sq ft2100 sq ft 1650 sq ft 1800 sq ft1950 sq ft
kNN Algorithm
What is in a name?
k =>
N =>
N =>
Number of Elements
Nearest / Distance
Neighbors / More Exist
kNN Algorithm
What is in a name?
k =>
N =>
N =>
Number of Elements
Nearest / Distance
Neighbors / More Exist
k = 3
Square Feet
Value
Rooms
kNN Algorithm
What is in a name?
k =>
N =>
N =>
Number of Elements
Nearest / Distance
Neighbors / More Exist
k = 5
Square Feet
Value
Rooms
kNN Algorithm
What is in a name?
k =>
N =>
N =>
Number of Elements
Nearest / Distance
Neighbors / More Exist
Square Feet
Value
Rooms
kNN Algorithm
What is in a name?
k =>
N =>
N =>
Number of Elements
Nearest / Distance
Neighbors / More Exist
Value: $200K
Value: $220K
Value: $210K
Value: $195K
Value: $215K
Estimated Value: $208K
The Code
github.com/arnerjohn/machine_learning_in_ruby
eventually @ http://www.machinelearninginruby.com
The Code
General Organizationclass Property
attr_accessor :rooms, :area, :type
def initialize(options = {}) options = {
rooms: 1, area: 500, type: false
}.merge(options)
@rooms = options[:rooms] @area = options[:area] @type = options[:type]
@neighbors = [] @distance = nil @guess = nil
end . . .
end
class Orchestrator def initialize(k)
@property_list = [] @k = k @rooms = { max: 0, min: 10000, range: 0 } @area = { max: 0, min: 10000, range: 0 }
end . . .
end
The Code
Program Execution
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
The Code
Load Data
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def load_training_data file = CSV.read(“data.csv”, { headers: true })
file.each do |line| property = Property.new({rooms: line[“rooms”].to_i, area: line[“area”].to_i, type: line[“type”]})
add(property) end
end
The Code
Load Data
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def scale_features rooms_array = self.filter_knowns.map do |p|
property.rooms end
area_array = self.filter_knowns.map do |p| property.area
end
@rooms[:min] = rooms_array.min @rooms[:max] = rooms_array.max @rooms[:range] = rooms_array.max - rooms_array.min
@area[:min] = area_array.min @area[:max] = area_array.max @area[:range] = area_array.max - area_array.min
end
The Code
Find Neighbors
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def filter_unknowns property_list.select do |property|
property.type == false end
end
def determine_unknowns self.filter_unknowns.each do |property|
property.neighbors = self.filter_knowns property.calculate_neighbor_distances(self.rooms[:range], self.area[:range])
property.guess_type(self.k) end
end
The Code
Calculate Distances
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def calculate_neighbor_distances(room_range, area_range)
@neighbors.each do |neighbor| rooms_delta = neighbor.rooms - self.rooms area_delta = neighbor.area - self.area rooms_delta = rooms_delta / room_range.to_f area_delta = area_delta / area_range.to_f
neighbor.distance = Math.sqrt(rooms_delta*rooms_delta + area_delta*area_delta)
end end
The Code
Guess Type
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def guess_type(k) guess_hash = gen_guess_hash(self.sort_neigbors_by_distance.take(k))
@guess = assign_guess(guess_hash)
msg = %Q{ Property attrs => rooms: #{ @rooms }, area: #{ @area } The property type is guessed to be: #{ @guess }
}
puts msg
return @guess end
The Code
Guess Type
property_list = Orchestrator.new(3)
property_list.load_training_data
property_list.scale_features
property_list.add( Property.new({ rooms: 2, area: 1550, type: false }) )
property_list.add( Property.new({ rooms: 4, area: 1800, type: false }) )
property_list.determine_unknowns
def gen_guess_hash(properties) guess_hash = Hash.new(0) properties.each do |property|
guess_hash[property.type] += 1 end
return guess_hash end
def assign_guess(guess_hash) highest = 0 guess = ""
guess_hash.each do |key, value| if value > highest
highest = value guess = key
end end
return guess end
Questions?
Thank You!