what lies beneath the beautiful code?
DESCRIPTION
Talk @ RubyConfIndia 2012. Ruby is a pure object oriented and really a beautiful language to learn and practice.But most of us do not bother to know or care about what happens behind the scene when we write some ruby code. Say creating a simple Array, Hash, class, module or any object. How does this map internally to C code ?Ruby interpreter is implemented in C and I will talk about the Interpreter API that we as ruby developersshould be aware of. The main purpose of the presentation is to understand the efforts and complexity behindthe simplicity offered. I would also like to touch upon the difference in implementation of some core data structuresin different ruby versions. Having known a part of C language implementation behind Ruby, I would also like to throw some light upon when and why would we need to write some ruby extensions in C.TRANSCRIPT
Ruby Conf India 2012
What lies beneath the beautiful
code?
Ruby Conf India 2012
Ruby Conf India 2012
self.inspect
{
:name => “Niranjan Sarade”,
:role => “ruby developer @ TCS”,
:blog => “http://niranjansarade.blogspot.com”
:tweet => “twitter.com/nirusuma”,
:github => “github.com/NiranjanSarade”
}
Ruby Conf India 2012
Ruby Conf India 2012
Beautiful
Pure object oriented
Interpreted
Ruby
Ruby Conf India 2012
Matz’s Ruby Interpreter (MRI)
Koichi’s Ruby Interpreter (KRI)
Ruby Conf India 2012
Why
should
we
know?
Ruby Conf India 2012
Let’s dive in!
Ruby Conf India 2012
Why C?
Ruby Conf India 2012
TPI Ruby 1.8
Parse(yacc)
Interpret
Series of tokens
AST
Ruby sourcecode
Tokenize(yylex)
Ruby Conf India 2012
Parse(Bison)
Compile(compile.c)
Interpret(YARV)
Series of tokens
AST
bytecode
Ruby sourcecode
Tokenize(yylex)
TPCI Ruby 1.9
Ruby Conf India 2012
bytecode
Parsed (AST)
Tokenized
Ruby Conf India 2012
Tokenized
Parsed (AST)
bytecode
Ruby Conf India 2012
# README.EXT
ruby language core
class.c : classes and modules error.c : exception classes and exception mechanism gc.c : memory management load.c : library loading object.c : objects variable.c : variables and constants
ruby syntax parser parse.y -> parse.c : automatically generated keywords : reserved keywords -> lex.c : automatically generated
Ruby Source Overview
Ruby Conf India 2012
ruby evaluator (a.k.a. YARV)
compile.c eval.c eval_error.c eval_jump.c eval_safe.c insns.def : definition of VM instructions iseq.c : implementation of VM::ISeq thread.c : thread management and context swiching thread_win32.c : thread implementation thread_pthread.c : ditto vm.c vm_dump.c vm_eval.c vm_exec.c vm_insnhelper.c vm_method.c
Ruby Conf India 2012
regular expression engine (oniguruma) regex.c regcomp.c regenc.c regerror.c regexec.c regparse.c regsyntax.c
utility functions debug.c : debug symbols for C debuggger dln.c : dynamic loading st.c : general purpose hash table strftime.c : formatting times util.c : misc utilities
Ruby Conf India 2012
ruby interpreter implementation
dmyext.c dmydln.c dmyencoding.c id.c inits.c main.c ruby.c version.c
multilingualization encoding.c : Encoding transcode.c : Encoding::Converter enc/*.c : encoding classes enc/trans/* : codepoint mapping tables
Ruby Conf India 2012
class library
array.c : Array bignum.c : Bignum compar.c : Comparable complex.c : Complex cont.c : Fiber, Continuation dir.c : Dir enum.c : Enumerable enumerator.c : Enumerator file.c : File hash.c : Hash io.c : IO
marshal.c : Marshal math.c : Math
numeric.c : Numeric, Integer, Fixnum, Float pack.c : Array#pack, String#unpack proc.c : Binding, Proc process.c : Process random.c : random number range.c : Range rational.c : Rational re.c : Regexp, MatchData signal.c : Signal sprintf.c : string.c : String struct.c : Struct time.c : Time
Ruby Conf India 2012
ruby.h
Struct Rbasic Struct RRegexp Struct RObject Struct RHash
Struct RClass Struct RFile
Struct RFloat Struct RBignum
Struct RString Struct RArray
Ruby Conf India 2012
RObject, RBasic and RClass
struct RObject { struct RBasic basic; union {
struct { long numiv; VALUE *ivptr;
struct st_table *iv_index_tbl; } heap;
} as;};
struct RBasic { VALUE flags; VALUE klass;};
struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;};
Ruby Conf India 2012
my_obj = Object.new
def my_obj.hello
p “hello”
end
my_obj.hello
#=> hello
Object.new.hello
# NoMethodError: # undefined method `hello' for #<Object:0x5418467>
Instance specific behavior
Ruby Conf India 2012
Conceptual sketch
Objectmy_obj
klass *m_tbl
my_obj
klass
Object
‘my_obj
*m_tbl
*m_tbl -hello
*super
Ruby Conf India 2012
#class.c
VALUEmake_singleton_class(VALUE obj){ VALUE orig_class = RBASIC(obj)->klass; VALUE klass = rb_class_boot(orig_class);
FL_SET(klass, FL_SINGLETON); RBASIC(obj)->klass = klass; return klass;}
Ruby Conf India 2012
Am I Immediate Object or Pointer ?
VALUE
Ruby Conf India 2012
C type for referring to arbitrary ruby objects
Stores immediate values of :-FixnumSymbolsTrueFalseNilUndef
Bit test :
If the LSB = 1, it is a Fixnum.
If the VALUE is equal to 0,2,4, or 6 it is a special constant: false, true, nil, or undef.
If the lower 8 bits are equal to '0xe', it is a Symbol.
Otherwise, it is an Object Reference
typedef unsigned long VALUE
Ruby Conf India 2012
RString
#1.8.7struct RString { struct RBasic basic; long len; char *ptr; union { long capa; VALUE shared; } aux;};
# 1.9.3#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))struct RString { struct RBasic basic; union { struct { long len; char *ptr; union { long capa; VALUE shared; } aux; } heap; char ary[RSTRING_EMBED_LEN_MAX + 1]; } as;};
Ruby Conf India 2012
Images created using wordle.net
Ruby Conf India 2012
Heap Strings
RString
char *ptr long len = 46
“This is a very very very very very long string”
Heap
str
str2
Ruby Conf India 2012
Ruby Conf India 2012
Ruby Conf India 2012
RStringchar *ptr
long len = 46VALUE shared
“This is a very very very very very long string”
RString
char *ptr long len = 46
str2
str
Heap
str = "This is a very very very very very long string"str2 = String.new(str)#str2 = str.dup
Shared Strings
Ruby Conf India 2012
Ruby Conf India 2012
RString
char *ptr long len = 46
RString
char *ptr long len = 46
str
str2str2
Heap
“This is a very very very very very long string”
“THIS IS A VERY VERY VERY VERY VERY LONG STRING”
str = "This is a very very very very very long string"str2 = str.dupstr2.upcase!
Copy on Write
Ruby Conf India 2012
Ruby Conf India 2012
RString
char *ptr long len = 46
Rstring
long len = 4char ary[] = “This”
str
str2str2
Heap
“This is a very very very very very long string”
str = "This is a very very very very very long string"str2 = str[0..3]#str2 = “This”
Embedded Strings
Ruby Conf India 2012
Ruby Conf India 2012
str = "This is a very very very very very long string"str2 = str[1..-1]#str2 = str[22..-1]# 0 <= start_offset < 46-23
Shared Strings with slice
RString
char *ptr long len = 46VALUE shared
RString
char *ptr long len = 45
str
str2str2
Heap
T h i . . i n g
Ruby Conf India 2012
Ruby Conf India 2012
String.new(“learning”)
Creating a string 23 characters or less is fastest
Creating a substring running to the end of the target string is also fast
When sharing same string data, memory and execution time is saved
Creating any other long substring or string, 24 or more bytes, is slower.
Ruby Conf India 2012
RHash
#1.8.7struct RHash { struct RBasic basic; struct st_table *tbl; int iter_lev; VALUE ifnone;};
struct st_table { struct st_hash_type *type; int num_bins; int num_entries; struct st_table_entry **bins;};
struct st_table_entry { st_data_t key; st_data_t record; st_table_entry *next;};
#1.9.3struct RHash { struct RBasic basic; struct st_table *ntbl; int iter_lev; VALUE ifnone;};
struct st_table { const struct st_hash_type *type; st_index_t num_bins; . . . struct st_table_entry **bins; struct st_table_entry *head, *tail;};struct st_table_entry { st_data_t key; st_data_t record; st_table_entry *next; st_table_entry *fore, *back;};
1.8.7 :002 > {1 => "a", "f" => "b", 2 => "c"} => {1=>"a", 2=>"c", "f"=>"b"}
1.9.3p0 :001 > {1 => "a", "f" => "b", 2 => "c"} => {1=>"a", "f"=>"b", 2=>"c"}
Ruby Conf India 2012
key1 value key3 value
key2 value
key4 value
st_table
num_entries = 4
num_bins = 5
**bins
st_table_entries
hash buckets - slots
x
x
x
RHash 1.8.7
Ruby Conf India 2012
key1 value
key2 value
key3 value
key4 value
32
3
243
4
31x
1x4x
4x
st_table
num_entries = 4
num_bins = 5
**bins
*head
*tail
st_table_entries
hash buckets - slots
RHash 1.9.3
Ruby Conf India 2012
Ruby Conf India 2012
C Extensions – why and when ?
Performance
Using C libraries from ruby applications
Using ruby gems with native C extensions
e.g. mysql, nokogiri, eventmachine, RedCloth, Rmagick, libxml-ruby, etc
Since ruby interpreter is implemented in C, its API can be used
Ruby Conf India 2012
My fellow ist
Patrick Shaughnessy
Ruby Conf India 2012
Image Credits
http://pguims-random-science.blogspot.in/2011/08/ten-benefits-of-scuba-diving.html
http://www.istockphoto.com/stock-illustration-7620122-tree-roots.php
http://horror.about.com/od/horrortoppicklists/tp/familyfriendlyhorror.01.htm
http://www.creditwritedowns.com/2011/07/european-monetary-union-titanic.html
http://irvine-orthodontist.com/wordpress/for-new-patients/faqs
Ruby Conf India 2012
Thank you all for being patient and hearing me out !
Hope this helps you !