Interesting Data Gigs by Marcos Ortiz
Posts
Interesting Data Gigs # 2: Data Engineer at Grammarly

Interesting Data Gigs # 2: Data Engineer at Grammarly

Why you need to follow Daliana Liu (AWS) and Zach Wilson (Airbnb)...

Welcome to the Natural Language Processing edition of Interesting Data Gigs

A good friend of mine asked me about this very interesting field called Natural Language Processing, and after our conversation, I knew I had to write about it.

That’s why in this second edition of IDG I wanted to write about it.

First, because it’s a very challenging but at the same time rewarding field, and with tons of opportunities out there if you want to work full-time on it.

Let’s start with the featured job of this edition: Data Engineer, Data Platform at Grammarly.

This organization relies very heavily on NLP algorithms and systems, so this is the perfect company to talk about it.

Why NLP?

Enterprises are increasing their investments in natural language processing (NLP), the subfield of linguistics, computer science, and AI concerned with how algorithms analyze large amounts of language data. According to a new survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

It’s a field that is growing exponentially without any sign of slowing down.

Let’s talk about the featured job of today then.

Featured Job: Data Engineer, Data Platform at Grammarly

My favorite part of the job description is?

Data engineers are responsible for making disparate data centrally available to Grammarly’s team members while ensuring query efficiency and efficacy.

The technical requirements of this role include a deep understanding of the querying engine of Spark, the performance (read/write) optimization of Delta tables, data modeling, data transformation at scale, Python or Scala coding abilities, and advanced SQL skills.

The backbone of a Data Engineer at Grammarly’s success will be communication, stakeholder management, and a passion for data.

Time to make some research about the company.

1. Research the company you are applying for

First, let me put right away: the early Grammarly team (Alex Shevchenko, Max Lytvyn, and Dmytro Lider) built a profitable machine for 8 years, and then in 2017, took the external investment for $110M.

Then they raised $90M and finally $200M, for total funding of $400M and a valuation of $13 Billion.

But the keyword here is: PROFITABLE.

According to GrowJo, the company is making $170.4M in annual revenue, and it has more than 850 employees. I don’t know if these figures are true or not, so these could be good questions to ask if you secure an interview with them.

This is a true unicorn: a company valued above $1 Billion, profitable, and growing like crazy year after year.

Right now, more than 30 Million people and 30k teams use Grammarly to improve their writing efforts (myself included), and the opportunity here is huge.

2. Grammarly’s Data Tech Stack

Grammarly has a very interesting tech stack, especially on the Data platform front:

Data Stream: AWS Kinesis / Kafka
Compute: AWS EMR / YARN / Spark (microbatching)
Storage: AWS S3 (Parquet) / Cassandra
Data Access and Analytics Engine: React / Spark SQL
Data Lake: Databricks / Delta
Language: Scala, Akka
Frontend: React
Backend/SSR: React Server, Scala, Akka
Compute: AWS ECS (Fargate)
RDBMS: AWS Aurora
NoSQL and Experiments: AWS DynamoDB
Messaging: RabbitMQ

Some ideas for improvements here? Analyze where the company could be using the new instance classes on AWS based on AWS Graviton 3.

This could potentially save a lot of money for the company.

Watch James Hamilton (VP and Distinguished Engineer at Amazon) talking about the new Amazon EC2 C7g instances powered by AWS Graviton3:

They have written several times about machine learning, NLP, but the most interesting post from my perspective is this one

One of my favorite parts of the article is this one:

Solving these problems would mean identifying brand-new applications and research directions—not just improving existing models (although we care about that, too).

Fortunately, we can prioritize zero-to-one projects and go from concept to launch quickly because we’ve spent over a decade developing an ecosystem of linguistic and machine learning tools.

We have well-established practices for accessing and curating data, mitigating bias in our features, and protecting privacy and security. We have built and continue to invest in ML infrastructure to leverage the latest developments across techniques like Transformer-based sequence-to-sequence models, neural machine translation, and massive pre-trained language models.

They are truly on the edge of the Natural Language Processing field, so you will a unique opportunity to make history here.

Next thing? Networking, networking, networking.

3. Some Engineering people at Grammarly that could be your close colleagues

Joe Xavier (VP of Engineering)
Artem Kolomeetc (Software Engineer, Data Platform)
Michael Keba (Senior Data Engineer)
Anton Terekhov (Data Engineer)
Christian Acuña (Senior Data Infrastructure Engineer)

Other Interesting Data Gigs of the week

Staff Software Engineer/Tech Lead - Matching at People.AI. Chat with Andrey Akselrod (CTO at People.AI). Perhaps my good friends Andy or Jacob could give you a warm intro here.
Senior Manager, Analytics Engineering at 1Password. Chat with Andrew Beyer (Senior Engineering Manager)
Senior Staff Data Engineer - Host Central Data at Airbnb. Perhaps you could discuss this with Zach (more later)
Engineering Manager, ML at Truebill

People you must follow now: Daliana Liu (Predibase) and Zach Wilson (Data@Airbnb)

Daliana Liu has created some of the best podcasts (Apple, Spotify)/YouTube channels out there when we talk about Data Science, how to be a Data Scientist, and more.

For example, in her last interview, she talked with Laura Gabrysiak, a Sr. Manager of Data Products and Solutions at Visa; where she shared so much value on it that I shared with my entire team at Riot Games.

You can watch the entire video here:

She always is providing a ton of value in every post she writes on LinkedIn, so you have to follow her.

Some of the other content shared by Daliana:

and another of my favorite videos on her channel, chatting with Nick Handle (a former Data Scientist at Airbnb and now co-founder and CEO of Transform, the first centralized 'metrics store' that empowers data analysts to deliver insights):

In the case of Zach, if you are a Data Engineer and don’t live under a rock; you have heard the name of Zach Wilson.

He is always sharing a lot of great advice on LinkedIn, and now on YouTube as well, so don’t waste the opportunity to learn from an incredible and experienced practitioner in the Data Engineering field.

Some of my favorite content produced by Zach are:

Should Data Engineers learn Scala? Hint: A big f… YES.
How to learn Data Engineering in 2022?
How is data engineering different between Airbnb, Facebook, and Netflix? This video will blow up your mind, especially after learning how Airbnb does Data Engineering

Some videos to expand precisely on that:

Reviewing Airbnb's Data Engineering Strategy - Why Airbnb Hired More Data Engineers, by Benjamin Rogojan ( aka Seattle Data Guy)
Data Engineering Career Tips By Airbnb Data Engineer | Part I, a conversation between codebasics and Zach
Data Engineering Career Tips By Airbnb Data Engineer | Part II

Interesting resources to read/watch/listen to keep improving your Natural Language Processing and Data Analytics knowledge in general

[ARTICLE] Andrew Ng wrote this very insightful article about foundational Machine Learning algorithms
[VIDEO] Introduction to Transformers and BERT on Amazon SageMaker, Suman Debnath, Principal Developer Advocate at AWS
[ARTICLE] Amazon EMR Serverless Now Generally Available – Run Big Data Applications without Managing Servers, by Channy Yun, Principal Developer Advocate at AWS based in South Korea
[BOOK] Natural Language Processing with Transformers: Building Language Applications with Hugging Face, by Lewis Tunstall (Machine Learning Engineer at Hugging Face), Leandro Von Werra (Machine Learning Engineer at Hugging Face), and Thomas Wolf (Chief Science Officer at Hugging Face). Source of the book on Github
[BOOK] Data Science on AWS, by Chris Fregly and Antje Barth from Amazon Web Services
[WEBSITE] Deep Learning Monitor, is a very cool website to keep updated with the last papers focused on these fields. Created by Raphael Shu
[VIDEO] Apache Spark NLP Extending Spark ML to Deliver Fast, Scalable, and Unified Natural Language Process, by David Talby, CTO at John Snow Labs, and Alexander Thomas, Principal Data Scientist at Wisecube
[PROJECT] YouTube Clickbait Data Analysis, by Alex Zavalny a Computer Science student from Drexel University
[PROJECT] Flair, a state-of-the-art framework for NPL

Final words

If you what you read here, consider sharing the newsletter with anyone you know, especially if they are a Data Engineer.

Thanks a lot for that.

Join the conversation

or to participate.