This is a package that allows indexing of django models in elasticsearch. It is built as a thin wrapper around elasticsearch-dsl-py so you can use all the features developed by the elasticsearch-dsl-py team.
- Based on elasticsearch-dsl-py so you can make queries with the Search class.
- Django signal receivers on save and delete for keeping Elasticsearch in sync.
- Management commands for creating, deleting, rebuilding and populating indices.
- Elasticsearch auto mapping from django models fields.
- Complex field type support (ObjectField, NestedField).
- Requirements
- Django >= 1.8
- Python 2.7, 3.4, 3.5, 3.6
- Elasticsearch >= 2.0 < 6.0
Install Django Elasticsearch DSL:
pip install django-elasticsearch-dsl # Elasticsearch 5.x pip install 'elasticsearch-dsl>=5.0,<6.0' # Elasticsearch 2.x pip install 'elasticsearch-dsl>=2.0,<3.0'
Then add django_elasticsearch_dsl
to the INSTALLED_APPS
You must define ELASTICSEARCH_DSL
in your django settings.
For example:
ELASTICSEARCH_DSL={
'default': {
'hosts': 'localhost:9200'
},
}
ELASTICSEARCH_DSL
is then passed to elasticsearch-dsl-py.connections.configure
(see here).
Then for a model:
# models.py
class Car(models.Model):
name = models.CharField()
color = models.CharField()
description = models.TextField()
type = models.IntegerField(choices=[
(1, "Sedan"),
(2, "Truck"),
(4, "SUV"),
])
To make this model work with Elasticsearch, create a subclass of django_elasticsearch_dsl.DocType
and create a django_elasticsearch_dsl.Index
to define your Elasticsearch indices, names, and settings. This classes must be
defined in a documents.py
file.
# documents.py
from django_elasticsearch_dsl import DocType, Index
from .models import Car
# Name of the Elasticsearch index
car = Index('cars')
# See Elasticsearch Indices API reference for available settings
car.settings(
number_of_shards=1,
number_of_replicas=0
)
@car.doc_type
class CarDocument(DocType):
class Meta:
model = Car # The model associated with this DocType
# The fields of the model you want to be indexed in Elasticsearch
fields = [
'name',
'color',
'description',
'type',
]
# Ignore auto updating of Elasticsearch when a model is saved
# or deleted:
# ignore_signals = True
# Don't perform an index refresh after every update (overrides global setting):
# auto_refresh = False
# Paginate the django queryset used to populate the index with the specified size
# (by default there is no pagination)
# queryset_pagination = 5000
To create and populate the Elasticsearch index and mapping use the search_index command:
$ ./manage.py search_index --rebuild
Now, when you do something like:
car = Car(
name="Car one",
color="red",
type=1,
description="A beautiful car"
)
car.save()
The object will be saved in Elasticsearch too (using a signal handler). To get an elasticsearch-dsl-py Search instance, use:
s = CarDocument.search().filter("term", color="red")
# or
s = CarDocument.search().query("match", description="beautiful")
for hit in s:
print(
"Car name : {}, description {}".format(hit.name, hit.description)
)
The previous example returns a result specific to elasticsearch_dsl, but it is also possible to convert the elastisearch result into a real django queryset, just be aware that this costs a sql request to retrieve the model instances with the ids returned by the elastisearch query.
s = CarDocument.search().filter("term", color="blue")[:30]
qs = s.to_queryset()
# qs is just a django queryset and it is called with order_by to keep
# the same order as the elasticsearch result.
for car in qs:
print(car.name)
Once again the django_elasticsearch_dsl.fields
are subclasses of elasticsearch-dsl-py
fields. They just add support for retrieving data from django models.
Let's say you don't want to store the type of the car as an integer, but as the corresponding string instead. You need some way to convert the type field on the model to a string, so we'll just add a method for it:
# models.py
class Car(models.Model):
# ... #
def type_to_string(self):
"""Convert the type field to its string representation
(the boneheaded way).
"""
if self.type == 1:
return "Sedan"
elif self.type == 2:
return "Truck"
else:
return "SUV"
Now we need to tell our DocType
subclass to use that method instead of just
accessing the type
field on the model directly. Change the CarDocument to look
like this:
# documents.py
from django_elasticsearch_dsl import DocType, fields
# ... #
@car.doc_type
class CarDocument(DocType):
# add a string field to the Elasticsearch mapping called type, the
# value of which is derived from the model's type_to_string attribute
type = fields.StringField(attr="type_to_string")
class Meta:
model = Car
# we removed the type field from here
fields = [
'name',
'color',
'description',
]
After a change like this we need to rebuild the index with:
$ ./manage.py search_index --rebuild
Sometimes, you need to do some extra prepping before a field should be saved to
Elasticsearch. You can add a prepare_foo(self, instance)
method to a DocType
(where foo is the name of the field), and that will be called when the field
needs to be saved.
# documents.py
# ... #
class CarDocument(DocType):
# ... #
foo = StringField()
def prepare_foo(self, instance):
return " ".join(instance.foos)
For example for a model with ForeignKey relationships.
# models.py
class Car(models.Model):
name = models.CharField()
color = models.CharField()
manufacturer = models.ForeignKey('Manufacturer')
class Manufacturer(models.Model):
name = models.CharField()
country_code = models.CharField(max_length=2)
created = models.DateField()
class Ad(models.Model):
title = models.CharField()
description = models.TextField()
created = models.DateField(auto_now_add=True)
modified = models.DateField(auto_now=True)
url = models.URLField()
car = models.ForeignKey('Car', related_name='ads')
You can use an ObjectField or a NestedField.
# documents.py
from django_elasticsearch_dsl import DocType, Index
from .models import Car
car = Index('cars')
car.settings(
number_of_shards=1,
number_of_replicas=0
)
@car.doc_type
class CarDocument(DocType):
manufacturer = fields.ObjectField(properties={
'name': fields.StringField(),
'country_code': fields.StringField(),
})
ads = fields.NestedField(properties={
'description': fields.StringField(analyzer=html_strip),
'title': fields.StringField(),
'pk': fields.IntegerField(),
})
class Meta:
model = Car
fields = [
'name',
'color',
]
related_models = [Manufacturer] # Optional: to ensure the Car will be re-saved when Manufacturer is updated
def get_queryset(self):
"""Not mandatory but to improve performance we can select related in one sql request"""
return super(CarDocument, self).get_queryset().select_related(
'manufacturer'
)
def get_instances_from_related(self, related_instance):
"""If related_models is set, define how to retrieve the Car instances from the related model."""
return related_instance.car_set.all()
Most Elasticsearch field types are supported. The attr
argument is a dotted
"attribute path" which will be looked up on the model using Django template
semantics (dict lookup, attribute lookup, list index lookup). By default the attr
argument is set to the field name.
For the rest, the field properties are the same as elasticsearch-dsl fields.
So for example you can use a custom analyzer:
# documents.py
# ... #
html_strip = analyzer(
'html_strip',
tokenizer="standard",
filter=["standard", "lowercase", "stop", "snowball"],
char_filter=["html_strip"]
)
@car.doc_type
class CarDocument(DocType):
description = fields.StringField(
analyzer=html_strip,
fields={'raw': fields.StringField(index='not_analyzed')}
)
class Meta:
model = Car
fields = [
'name',
'color',
]
Simple Fields
- BooleanField(attr=None, **elasticsearch_properties)
- ByteField(attr=None, **elasticsearch_properties)
- CompletionField(attr=None, **elasticsearch_properties)
- DateField(attr=None, **elasticsearch_properties)
- DoubleField(attr=None, **elasticsearch_properties)
- FileField(attr=None, **elasticsearch_properties)
- FloatField(attr=None, **elasticsearch_properties)
- IntegerField(attr=None, **elasticsearch_properties)
- IpField(attr=None, **elasticsearch_properties)
- GeoPointField(attr=None, **elasticsearch_properties)
- GeoShapField(attr=None, **elasticsearch_properties)
- ShortField(attr=None, **elasticsearch_properties)
- StringField(attr=None, **elasticsearch_properties)
Complex Fields
- ObjectField(properties, attr=None, **elasticsearch_properties)
- NestedField(properties, attr=None, **elasticsearch_properties)
Elasticsearch 5 Fields
- TextField(attr=None, **elasticsearch_properties)
- KeywordField(attr=None, **elasticsearch_properties)
properties
is a dict where the key is a field name, and the value is a field
instance.
To define an Elasticsearch index you must instantiate a django_elasticsearch_dsl.Index
class and set the name
and settings of the index. This class inherits from elasticsearch-dsl-py Index.
After you instantiate your class, you need to associate it with the DocType you
want to put in this Elasticsearch index.
# documents.py
from django_elasticsearch_dsl import DocType, Index
from .models import Car, Manufacturer
# The name of your index
car = Index('cars')
# See Elasticsearch Indices API reference for available settings
car.settings(
number_of_shards=1,
number_of_replicas=0
)
@car.doc_type
class CarDocument(DocType):
class Meta:
model = Car
fields = [
'name',
'color',
]
@car.doc_type
class ManufacturerDocument(DocType):
class Meta:
model = Car
fields = [
'name', # If a field as the same name in multiple DocType of
# the same Index, the field type must be identical
# (here fields.StringField)
'country_code',
]
When you execute the command:
$ ./manage.py search_index --rebuild
This will create an index named cars
in Elasticsearch with two mappings:
manufacturer_document
and car_document
.
Delete all indices in Elasticsearch or only the indices associate with a model (--models):
$ search_index --delete [-f] [--models [app[.model] app[.model] ...]]
Create the indices and their mapping in Elasticsearch:
$ search_index --create [--models [app[.model] app[.model] ...]]
Populate the Elasticsearch mappings with the django models data (index need to be existing):
$ search_index --populate [--models [app[.model] app[.model] ...]]
Recreate and repopulate the indices:
$ search_index --rebuild [-f] [--models [app[.model] app[.model] ...]]
Default: True
Set to False
to globally disable auto-syncing.
Default: {}
Additional options passed to the elasticsearch-dsl Index settings (like number_of_replicas
or number_of_shards
).
Default: True
Set to False
not force an [index refresh](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html) with every save.
This (optional) setting controls what SignalProcessor class is used to handle Django's signals and keep the search index up-to-date.
An example:
ELASTICSEARCH_DSL_SIGNAL_PROCESSOR = 'django_elasticsearch_dsl.signals.RealTimeSignalProcessor'
Defaults to django_elasticsearch_dsl.signals.RealTimeSignalProcessor
.
You could, for instance, make a CelerySignalProcessor
which would add
update jobs to the queue to for delayed processing.
You can run the tests by creating a Python virtual environment, installing
the requirements from requirements_test.txt
(pip install -r requirements_test
):
$ python runtests.py
Or:
$ make test $ make test-all # for tox testing
For integration testing with a running Elasticsearch server:
$ python runtests.py --elasticsearch [localhost:9200]
- Add support for --using (use another Elasticsearch cluster) in management commands.
- Add management commands for mapping level operations (like update_mapping....).
- Dedicated documentation.
- Generate ObjectField/NestField properties from a DocType class.
- More examples.
- Better
ESTestCase
and documentation for testing