Mention is a real-time monitoring application used to track and analyze trends and e-reputation in a very complete and intuitive way…

Mention is a real-time monitoring application used to track and analyze trends and e-reputation in a very complete and intuitive way. Co-founded in 2010, it now counts 400,000 users across the world. This impressive growth rate not only implies great marketing talent but also impressive technical achievements. Arnaud le Blanc, Mention’s co-founder and CTO, tells us about how he got Mention there.

You gotta love technical challenges

“When we developed Mention, our main competitor was Google Alerts, which feels a little like being David versus Goliath at first. But then it became one of the reasons I like my job: it is very challenging. Getting to this point of a product development involves a true entrepreneurial mindset. Before being Mention’s CTO, I was a developer and then the lead developer working on the media monitoring feature at Pressking. eFounders offered me to become Mention’s co-founder. I might not have thought of myself as an entrepreneur before, although today, I really feel like Mention is my product: I’ve built it together with my co-founders.

Building an application like Mention is full of technical challenges. We handle a large amount of data, crawl thousands of web page per second, built an open API for our own apps, and use lots of third-party APIs.”

FOCUS: a challenge overcome when developing Mention’s web crawler Since Mention crawls a large amount of web pages in parallel, there is a point when the crawler reaches a high concurrency level and stresses the OS: that is when we got to the limit of our system. In our case, these limits entailed the appearance of a race condition in the libc’s DNS resolver. libc was sending over DNS queries on random unrelated file descriptors, which was leading to weird behaviors. The bug was reported and fixed a few months later, but in the meantime, we had to use the plain-Go DNS resolve, which happened to work really well.

“At Mention, several millions of Mentions are inserted in user feeds per day (see infographics above). Therefore, storage of data is an important issue with our product and entails. We chose MySQL as our main storage system, which is well understood and scalable. We are also using Redis, as a cache, and Kafka for messaging.”

FOCUS: Mention’s main storage runs on Percona MySQL servers:
MySQL servers have a small layer of SSD caches (using flashcache)
Incremental backups using Percona Xtrabackup
Backups automatically checked everyday (loaded in a mysql server, queries ran against it)
Sharding for horizontal scalability
Sharding is done at the application level
Replication for high availability
Applications never read on slave servers — eventual consistency makes code eventually complex
Occasionally using slaves for large queries, dumps, etc
One of the shard groups stores 4.5B rows, 5TB of compressed data
Redis for small caches or data structures
40-nodes Cassandra cluster for the Crawler’s internal state

“Building a product that works in the long term is key to success, according to me. That is why we focused on that from the beginning, especially in the choice of our technologies and the architecture. We have chosen proven technologies for the most critical parts of our infrastructure. For instance we choose MySQL for our main storage, which is proven to work well for I/O bound workloads — that impacts the volume of data our tool is capable of handling — and has well-understood performance characteristics.

On the other hand, sometimes we needed to use more bleeding-edge technologies such as Golang. We like to use the right techno for the right task, and Golang is outstanding at writing highly concurrent code doing lots of I/O. Choosing bleeding-edge technologies is risky because you never known how it will evolve, or if it will still be alive in 2 years. For the anecdote: when we built Mention, AngularJS or React — today’s references — virtually didn’t exist. Therefore, we are using Backbone.js for our frontend: among all the libraries that existed at the time, it is basically the only one that remains alive.”

GNU/Linux Debian distro.
RAID everywhere. RAID1 for normal servers and RAID6 for backup.
MySQL, Kafka, Redis
PHP, Golang, Python, NodeJs
We are Monitoring them using StatsD, Graphite, Zabbix.

This article is part of the publication Unexpected Token powered by eFounders — startup studio. All eFounders’ startups are built by extraordinary CTOs. If you feel like it could be you, apply here.