full text search in django with postgres
DESCRIPTION
There are number of players that provide full text search feature, starting from embedded search to dedicated search servers [solr, sphinx, elasticsearch etc], but setting up and configuring them is a time consuming process and requires considerable knowledge of the tools. What if we could get comparable search results using full text search capabilities of Postgres. Developers already have the working knowledge of the database, so this should come natural. In addition to that, it will be one less tool to manage. Code: https://github.com/Syerram/postgres_searchTRANSCRIPT
![Page 1: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/1.jpg)
Full Text SearchDjango + Postgres
![Page 2: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/2.jpg)
Search is everywhere
Search expectations● FAST● Full Text search● Linguistic support (“craziness | crazy”)● Ranking● Fuzzy Searching● More like this
![Page 3: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/3.jpg)
Django
● SLOW● `icontains` is dumbed down version of
search● Searching across tables is pain● No relevancy, ranking or similar words
unless done manually● No easy way for fuzzy searching
![Page 4: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/4.jpg)
Other Alternatives
● Solr● ElasticSearch● AWS CloudSearch● Sphinx● etc*
If you’re using any of the above, use Haystack
![Page 5: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/5.jpg)
Postgres Search
● FAST● Simple to implement● Supports Search features like Full Text,
Ranking, Boosting, Fuzzy etc..
![Page 6: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/6.jpg)
Django
Live Example● Search Students by name or by course● Use South migration to create tsvector
column● Store title in Search table● Update Search table via Celery on Save of
Student data
https://github.com/Syerram/postgres_search
![Page 7: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/7.jpg)
GIN, GIST
● GIST is Hash based, GIN is B-trees● GINs = GISTs * 3 , s = Speed● GINu = GISTu * 3 , u = update time● GINkb = GISTkb * 3, kb = sizeA gin indexCREATE INDEX student_index ON students USING gin(to_tsvector('english'name));
Source http://www.postgresql.org/docs/9.2/static/textsearch-indexes.html
![Page 8: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/8.jpg)
Full Text Search● All text should be preprocessed using tsvector and queried using tsquery
● Both reduce the text to lexemesSELECT to_tsvector('How much wood would a woodchuck chuck If a woodchuck could chuck wood?')"'chuck':7,12 'could':11 'much':2 'wood':3,13 'woodchuck':6,10 'would':4"
● Both are required for searching to work on normal text
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ 'chucks' -- False
SELECT to_tsvector('How much wood would a woodchucks chucks If a woodchucks could chucks woods?') @@ to_tsquery('chucks') -- True
![Page 9: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/9.jpg)
Full Text Search (Contd.)
● Technically you don’t need index, but for large tables it will be slow
SELECT * FROM students where to_tsvector('english', name) @@ to_tsquery('english', 'Kirk')
● GIN or GIST IndexCREATE INDEX <index_name> ON <table_name> USING gin(<col_name>);
● Expression BasedCREATE INDEX <index_name> ON <table_name> USING gin(to_tsvector(COALESCE(col_name,'') || COALESCE(col_name,'')));
![Page 10: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/10.jpg)
Boosting
● Boost certain results over others● Still matching● Use ts_rank to boost resultse.g.…ORDER BY ts_rank(document, to_tsquery('python')) DESC
![Page 11: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/11.jpg)
Ranking● Importance of search term within documente.g.Search term found in title > description > tag
● Use setweight to assign importance to each field when preparing Document
e.g.setweight(to_tsvector(‘english’, post.title), 'A') || setweight(to_tsvector(‘english’, post.description), 'B') || setweight(to_tsvector('english', post.tags), 'C'))...--In search query use ‘ts_rank’ to order by ranking
![Page 12: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/12.jpg)
Trigram
● Group of 3 consecutive chars from String● Similarity between strings is matched by # of
trigrams they sharee.g. "hello": "h", "he", "hel", "ell", "llo", "lo", and "o”
"hallo": "h", "ha", "hal", "all", "llo", "lo", and "o”Number of matches: 4
● Use similarity to find related terms. Returns value between 0 to 1 where 0 no match and 1 is exact match
![Page 13: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/13.jpg)
Soundex/Metaphone
● Oldest and only good for English names● Converts to a String of Length 4. e.g. “Anthony == Anthoney” => “A535 == A535”
● Create index itself with Soundex or Metaphone
e.g. CREATE INDEX idx_name ON tb_name USING GIN(soundex(col_name));
SELECT ... FROM tb_name WHERE soundex(col_name) = soundex(‘...’)
![Page 14: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/14.jpg)
Pro & Con
Pros● Quick implementation● Lot easier to change document format and call refresh index● Speed comparable to other search engines● Cost effective
Cons● Not as flexible as pure search engines, like Solr● Not as fast as Solr though pretty fast for humans● Tied to Postgres● Indexes can get pretty large, but so can search engine indexes
![Page 15: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/15.jpg)
Django ORM
● Implements Full text Searchclass StudentCourse(models.Model): ... search_index = VectorField() objects = SearchManager( fields = ('student__user__name', 'course__name'), config = 'pg_catalog.english', # this is default search_field = 'search_index', # this is default auto_update_search_field = True )● StudentCourse.objects.search("David")
https://github.com/djangonauts/djorm-ext-pgfulltext
![Page 16: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/16.jpg)
Next Steps
● Add Ranking, Boosting, Fuzzy Search to djorm pgfulltext
e.g. StudentCourse.objects.search("David & Python").rank("Python")StudentCourse.objects.fuzzy_search("Jython").rank("Python")StudentCourse.objects.soundex("Davad").rank("Java") & More
● Continue to add examples to postgres_search
![Page 17: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/17.jpg)
Tips● Use separate DB if necessary or use
Materialized Views● Don’t index everything. Limit your
searchable data● Analyze using `Explain` and ts_stat● Create indexes on fly using concurrently● Don’t pull Foreign Key objects in search
![Page 18: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/18.jpg)
Code
• https://github.com/Syerram/postgres_search
• Stack• AngularJS, Django, Celery, Postgres
• Feel free to Fork, Pull Request
![Page 19: Full Text search in Django with Postgres](https://reader035.vdocuments.net/reader035/viewer/2022073017/545d3051af7959c8098b4ab3/html5/thumbnails/19.jpg)
@agileseeker, github/syerram, syerram.silvrback.com/
Sai