Website Search

Searching in Frappe is managed by the Search module. It is a wrapper for Whoosh a full text search library written in Python.

You can extend the FullTextSearch class to create a search class for a specific requirement. For example the WebsiteSearch is a wrapper for indexing public facing web pages and exposing a search.

The FullTextSearch

class

Each FullTextSearch (FTS) instance holds a Schema defined by the class itself. That means, a specific FTS implementation will have it's specific schema. You can create a new implementation if you wish to index with a different schema. Along with this the FTS class has other controllers to facilitate creating, updating and querying the index.

Extending the FTS class

When initializing a FTS based class, you need to provide an index name. On instantiation, the following params are initialized - index_name : name of the index provided. - index_path : path of the index in the sites folder - schema : return by the get_schema function - id : id used to recognize the document in the index

Once instantiated you can run the build function. It gets all the documents from get_items_to_index , the documents are a list of frappe._dict (frappe dicts) conforming to the defined schema. These documents are then added to the index and written to the file.

You can search the index using the search method of the FTS class. These functions are documented in the API reference here .

An example implementation for blog will look like the following:

class BlogWrapper(FullTextSearch):
 # Default Schema
 # def get_schema(self):
 # return Schema(name=ID(stored=True), content=TEXT(stored=True))

 # def get_id(self):
 # return "name"

 def get_items_to_index(self):
 docs = []
 for blog_name in get_all_blogs():
 docs.append(get_document_to_index(blog_name))
 return docs

 def get_document_to_index(self, name):
 blog = frappe.get_doc("Blog Post", name)
 return frappe._dict(name=name, content=blog.content)

 def parse_result(self, result):
 return result["name"]

get_items_to_index : Get all routes to be indexed, this includes the static pages in www/ and routes from published documents
get_document_to_index : Render a page and parse it using BeautifulSoup
parse_result : all the search results are parsed using this function