Interesting Data Gigs # 2: Data Engineer at Grammarly

Why you need to follow Daliana Liu (AWS) and Zach Wilson (Airbnb)...

Welcome to the Natural Language Processing edition of Interesting Data Gigs

A good friend of mine asked me about this very interesting field called Natural Language Processing, and after our conversation, I knew I had to write about it.

That’s why in this second edition of IDG I wanted to write about it.

First, because it’s a very challenging but at the same time rewarding field, and with tons of opportunities out there if you want to work full-time on it.

Let’s start with the featured job of this edition: Data Engineer, Data Platform at Grammarly.

This organization relies very heavily on NLP algorithms and systems, so this is the perfect company to talk about it.

Why NLP?

Enterprises are increasing their investments in natural language processing (NLP), the subfield of linguistics, computer science, and AI concerned with how algorithms analyze large amounts of language data. According to a new survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

It’s a field that is growing exponentially without any sign of slowing down.

Let’s talk about the featured job of today then.

My favorite part of the job description is?

Data engineers are responsible for making disparate data centrally available to Grammarly’s team members while ensuring query efficiency and efficacy.

The technical requirements of this role include a deep understanding of the querying engine of Spark, the performance (read/write) optimization of Delta tables, data modeling, data transformation at scale, Python or Scala coding abilities, and advanced SQL skills.

The backbone of a Data Engineer at Grammarly’s success will be communication, stakeholder management, and a passion for data.

Time to make some research about the company.

1. Research the company you are applying for

First, let me put right away: the early Grammarly team (Alex Shevchenko, Max Lytvyn, and Dmytro Lider) built a profitable machine for 8 years, and then in 2017, took the external investment for $110M.

Then they raised $90M and finally $200M, for total funding of $400M and a valuation of $13 Billion.

But the keyword here is: PROFITABLE.

According to GrowJo, the company is making $170.4M in annual revenue, and it has more than 850 employees. I don’t know if these figures are true or not, so these could be good questions to ask if you secure an interview with them.

This is a true unicorn: a company valued above $1 Billion, profitable, and growing like crazy year after year.

Right now, more than 30 Million people and 30k teams use Grammarly to improve their writing efforts (myself included), and the opportunity here is huge.

2. Grammarly’s Data Tech Stack

Grammarly has a very interesting tech stack, especially on the Data platform front:

  • Data Stream: AWS Kinesis / Kafka

  • Compute: AWS EMR / YARN / Spark (microbatching)

  • Storage: AWS S3 (Parquet) / Cassandra

  • Data Access and Analytics Engine: React / Spark SQL

  • Data Lake: Databricks / Delta

  • Language: Scala, Akka

  • Frontend: React

  • Backend/SSR: React Server, Scala, Akka

  • Compute: AWS ECS (Fargate)

  • RDBMS: AWS Aurora

  • NoSQL and Experiments: AWS DynamoDB

  • Messaging: RabbitMQ

Some ideas for improvements here? Analyze where the company could be using the new instance classes on AWS based on AWS Graviton 3.

This could potentially save a lot of money for the company.

Watch James Hamilton (VP and Distinguished Engineer at Amazon) talking about the new Amazon EC2 C7g instances powered by AWS Graviton3:

They have written several times about machine learning, NLP, but the most interesting post from my perspective is this one

One of my favorite parts of the article is this one:

Solving these problems would mean identifying brand-new applications and research directions—not just improving existing models (although we care about that, too).

Fortunately, we can prioritize zero-to-one projects and go from concept to launch quickly because we’ve spent over a decade developing an ecosystem of linguistic and machine learning tools.

We have well-established practices for accessing and curating data, mitigating bias in our features, and protecting privacy and security. We have built and continue to invest in ML infrastructure to leverage the latest developments across techniques like Transformer-based sequence-to-sequence models, neural machine translation, and massive pre-trained language models.

They are truly on the edge of the Natural Language Processing field, so you will a unique opportunity to make history here.

Next thing? Networking, networking, networking.

3. Some Engineering people at Grammarly that could be your close colleagues

Other Interesting Data Gigs of the week

People you must follow now: Daliana Liu (Predibase) and Zach Wilson (Data@Airbnb)

Daliana Liu has created some of the best podcasts (Apple, Spotify)/YouTube channels out there when we talk about Data Science, how to be a Data Scientist, and more.

For example, in her last interview, she talked with Laura Gabrysiak, a Sr. Manager of Data Products and Solutions at Visa; where she shared so much value on it that I shared with my entire team at Riot Games.

You can watch the entire video here:

She always is providing a ton of value in every post she writes on LinkedIn, so you have to follow her.

Some of the other content shared by Daliana:

and another of my favorite videos on her channel, chatting with Nick Handle (a former Data Scientist at Airbnb and now co-founder and CEO of Transform, the first centralized 'metrics store' that empowers data analysts to deliver insights):

In the case of Zach, if you are a Data Engineer and don’t live under a rock; you have heard the name of Zach Wilson.

He is always sharing a lot of great advice on LinkedIn, and now on YouTube as well, so don’t waste the opportunity to learn from an incredible and experienced practitioner in the Data Engineering field.

Some of my favorite content produced by Zach are:

Some videos to expand precisely on that:

Interesting resources to read/watch/listen to keep improving your Natural Language Processing and Data Analytics knowledge in general

Final words

If you what you read here, consider sharing the newsletter with anyone you know, especially if they are a Data Engineer.

Thanks a lot for that.

Join the conversation

or to participate.