There was a requirement here to stream data from mongodb to Elasticsearch, and I decided to research the existing tools.
We could use elasticsearch-river-mongodb, implement a stream process in node, some payed solutions... then I found monstache - a stream implementation in golang.
The good parts
- Monstache is just a single binary without dependencies on runtimes like Ruby, Python or PHP - a scratch image should do the work
- It does accept transformations
- working sample here
- Mappings
- working sample here
- Versioning? - working sample here
- Fetch references? check embedding documents
- A lot of mongo/elastic configurations
- e.g. replay, resume, resume-strategy
- The docker image is less than 15 mb (compressed size)
Ok, how can I make it faster?
From the docs:
It is HIGHLY recommended to use a golang plugin in production over a javascript plugin due to performance differences. Currently, golang plugins are orders of magnitude faster than javascript plugins. This is due to concurrency and the need to perform locking on the javascript environment. Javascript plugins are very useful for quickly prototyping a solution, however at some point it is recommended to convert them to golang plugins
For a golang working sample, check my repo
The bad parts
It uses streams, and it seems it is not performant as using oplog (ref)
Conclusion
I would give it a try and perform additional performance tests prior to go to prod.
It is nice it does support multiple workers, High Availability, etc
It has almost 9yrs since first commit, 1.3k starts on github, videos on youtube