Interesting Data Gigs by Marcos Ortiz
Posts
Interesting Data Gigs # 4: Data Engineer at Monte Carlo Data

Interesting Data Gigs # 4: Data Engineer at Monte Carlo Data

Why you must follow Barr Moses (CEO and co-founder, Monte Carlo Data) and Cassie Kozyrkov (Chief Decision Scientist at Google)

If you are not a subscriber of Interesting Data Gigs, join 1150 other Data geeks that receive it directly in their inbox each week — it’s free.

This edition is sponsored by Braintrust.

If you are looking for a job, Braintrust can provide you with access to incredible positions with the world’s leading enterprises like Nike, NASA, TaskRabbit, Nestle, Porsche, Wayfair, Experian, and many more.

If you are a company, looking to hire top talent globally, Braintrust can assist as well in your talent search.

Hi Data Geeks.

This is one of my favorite moments of the week: having the privilege to write to all of you and helping you with ideas on how to find your dream Data-Driven gig.

Every day, we are seeing a lot of unfortunate events called layoffs, and to be honest; I’m sad and angry at the same time, because like I said here:

Companies experiencing capital-efficient growth don’t do layoffs. Simple as that.

Some companies are doing layoffs, and other companies are growing like crazy in terms of monthly revenue growth and people growth as well, but at a sane and healthy pace for the company and the business.

One of my favorite companies doing that is Monte Carlo Data, and the role that we will be talking about today is this one:

Data Engineer at Monte Carlo Data

So, what is Monte Carlo Data?

Monte Carlo Data describes itself in a very interesting and distinctive way:

As businesses increasingly rely on data to power digital products and drive better decision-making, it’s mission-critical that this data is accurate and reliable. Monte Carlo, the data reliability company, is the creator of the industry’s first end-to-end Data Observability platform.

Named an Enterprise Tech 30 company, a 2021 IDC Innovator, an Inc. Best Place Workplace for 2021, and a “New Relic for data” by Forbes, we’ve raised $236M from Accel, ICONIQ Growth, GGV Capital, Redpoint Ventures, and Salesforce Ventures. Monte Carlo works with such data-driven companies as Fox, Affirm, Vimeo, ThredUp, PagerDuty, and other leading enterprises to help them achieve trust in data.

Monte Carlo Data was co-founded by Barr Moses (BM_datadowntime on Twitter) and Lior Gavish (CTO), and I have to thank them personally here because they gave me an incredible level of access to actual Data Engineers inside the company.

What is it like to be a Data Engineer at Monte Carlo Data

I wanted to try something different for this edition: the best ones to explain how it’s to work as a Data Engineer in an organization, it’s not myself, are the actual folks that are doing the same job you will be applying for.

So, when I shared this post on LinkedIn, Lior answered it and he said that he wanted to put me in contact with some folks of the Data Engineering crew at Monte Carlo Data:

So, a big thank you for that, Lior, and all the members of the team that provided incredible insights to this publication.

Quickly, I prepared some questions for them, shared a Google Doc with all of them, and they worked as a team (very fast, BTW in the middle of the weekend) to answer them.

Here are some of those questions, and the answers from Prateek Chawla (Founding Engineer at Monte Carlo Data), and Elya Pardes (Data Engineer at Monte Carlo Data)

What is it like to be a Data Engineer at Monte Carlo Data

Monte Carlo data engineers are responsible for building our end-to-end data observability platform, as well as helping us build the ML infrastructure to power our anomaly detection models.

Our platform is built on Snowflake, Databricks, dbt, Looker, and Airflow, so data engineers have an opportunity to work with some of the most popular tools in the modern data stack.

Given our speed of growth and expanding use cases, we get the opportunity to work with all sorts of interesting open-source frameworks, languages, and tools. Moreover, the end-to-end nature of our platform makes our work challenging and integration rich - but endlessly rewarding.

We partner closely with the product management and data science team to develop new features and functionalities that help customers reduce time to detection and resolution for data incidents at each stage of the pipeline.

A recent example of a thorny engineer problem we solved was our tool’s field-level lineage, which we wrote about in InfoQ.

As an added bonus of being a data engineer at Monte Carlo - we’re also our ideal customer, so we get to dogfood our product and leverage it in our own data pipelines which is awesome!

What can you extract for this?

They shared the “Modern Data Stack“ they are using in order to build Monte Carlo Data’s products:

Snowflake for Data Warehousing
Databricks for everything related to Data Engineering. I assume that PySpark as well, so you need to have a middle to advanced knowledge working with Apache Spark
dbt for ETLs and data modeling
Looker for data visualization and dashboards
Apache Airflow for orchestration
and according to the job description, they are using Amazon Web Services, so a quick win here could be to test some parts of the data infrastructure hosted on Databricks with the new AWS Graviton-based new clusters, which could provide a potential 40% decrease in price. Read more here

And another important point here: they are using actively their own product every single day, which to me it’s the biggest selling point why you need to join this team.

What is the favorite thing you love to be part of this team?

One of the best parts about working at Monte Carlo is the collaborative nature of our work; Monte Carlo’s engineering team is split into three distinct engineering groups, but we partner closely given the nature of our product.

For instance, the team responsible for scaling our lineage features will still get a chance to dive deep into other integrations and elements of the product.

However, my favorite part about my job is the people. You can’t ask for a more intelligent, humble, and passionate group of engineers, product managers, and data scientists to work with.

As a data engineer at Monte Carlo, you’re pushed to do your best work, and sign on every day excited to deliver impact for the customer.

They are a very tight group of folks working on a very interesting problem, and it’s amazing to see how they are operating on a daily basis.

Monte Carlo has raised a lot of money these days, but for many of us, the main motivation is not money. What’s yours? What drives you every single day to work at Monte Carlo?

Aside from getting to work with our awesome team, I’m driven by two things: building a product that’s creating a category - data observability - and delivering immediate value to other data teams.

It’s exciting and powerful to know that your engineering work is driving significant change in the modern data ecosystem by making data analytics more reliable and tackling one of the most underserved areas of the stack: data quality.

As a Data Engineer myself, I’ve had to deal with this problem so many times that I truly understand the value of the product they are building here.

Doing my own research about the company and the big opportunity they have ahead

First, is the layoff probability test.

It could be very rare, not only because it’s well funded ($236 Million to be exact), but it’s still a small team (120 people), with a strong set of clients (JetBlue, Affirm, Masterclass, PageDuty, Auth0), and a very good line of partners as well that fuel the growth of the company (Snowflake, dbt Labs, Databricks).

I don’t know if the company is profitable at this stage, but it could be, especially after reading some insights from Alex Wilhelm (TechCrunch):

Recall that when Monte Carlo raised a $60 million Series C in August 2021, we reported that it had “doubled its ARR in each of the last four quarters.” Since that round, the company said in a release that it has “more than doubled revenue every single quarter, with an 800 percent increase in revenue year-over-year.” If you grow that quickly, yes, you can raise capital for your software business like it’s still 2021. (How many unicorns meet that bar? Data aren’t clear, but we’re not wildly optimistic that it’s a majority.)

What will Monte Carlo do with its new capital? Lior Gavish, the company’s co-founder and CTO, told TechCrunch in an interview that his startup spent aggressively since its Series C, but that it still had cash in the bank when its latest round came together. The new cash, per Gavish, will be invested “across the board,” with the founding exec citing upcoming investments in engineering, data, product, and go-to-market work in the near future.

And of course, reading the press release on the website, provided some unique insights as well:

Over the past 20 months, Monte Carlo has grown from 20 to 120 people and raised four rounds of funding, signifying the exponential growth of the data observability market and the company at large. With their Series D, Monte Carlo has achieved a $1.6B valuation, a testament to the market enthusiasm for the category and the company’s commitment to making data more reliable for their customers.

Since their Series C announcement in August 2021, Monte Carlo more than doubled revenue every single quarter and achieved 100 percent customer retention in 2021.

Over the past six months, Monte Carlo has brought on new customers, including JetBlue, Affirm, CNN, MasterClass, Auth0, and SoFi, with hundreds of customers paying for and driving value from the platform.

And one particular thing that I observed here is the participation of Accel as an investor.

Why? Because this firm has a very particular set of rules when they invest, especially in capital-efficient companies like they did before with 1Password and Atlassian.

So, this is a very good sign that Monte Carlo Data is doing remarkable things.

Let’s discuss some ideas on how to approach this job application (THE REAL MEAT)

First, you must read the Data Quality Fundamentals Book Preview. This is not a should-read, this is a must-read. Believe me: you will love it
Second, I already shared before that you could create a Proof of Concept with AWS Graviton-based Databrick clusters, and how this could help the company to save $$$ of dollars in the process every month
Third: if you are working in a big organization as a Data Engineer, why not present Monte Carlo Data’s product to your peers and your boss. This could have a double benefit for you: if you are applying for this job at Monte Carlo Data, this could initiate the path for a potential client for the company. What’s an amazing way to start in a new org, bringing a client with you. Or if you want to stay in your company, this could help the organization tackle the data quality mess inside it. A win-win in every scenario
Fourth: Monte Carlo Data is using Apache Airflow. So, another quick win could be to take the amazing insights that Marc Lamberti (Head of Customer Education at Astronomer) shares every day on LinkedIn, and evaluate this could help to build more efficient DAGs at Monte Carlo Data.

Here’s an example:

So, take these ideas, expand with your own, and make sure to send the job application to join this crew.

Monte Carlo Data is a category-building company, so I know for sure you will be in good hands here.

BTW, the hiring manager for this position is Uri Shahar (Head of Data at Monte Carlo)

🚨 Join the Interesting Data Gigs Talent Network 🚨

It’s the perfect time to be part of The Interesting Data Gigs Talent Network, where you will find amazing Data Analytics jobs from companies like Netflix, Apple, Consensys, and many more.

Let’s change the game together: Instead of people applying to companies, companies will pitch to you, so don’t wait any other moment and join today.

Other featured jobs of the Talent Network

People to follow: Barr Moses and Cassie Kozyrkov

In the case of Barr, she is always sharing a lot of insightful posts related to data quality, data downtime, and more.

So, if you are in the Data Analytics space, you must follow her on LinkedIn and Twitter.

And if you are working in the field of Artificial Intelligence or Data Science, you must follow Cassie as well.

She is always sharing amazing articles and research about these topics, so you will enjoy a lot of her content on LinkedIn.

Interesting resources

Even more pi in the sky: Calculating 100 trillion digits of pi on Google Cloud, by Emma Haruka Iwao (Developer Advocate at Google)
How McAfee Leverages Databricks on AWS at Scale, by Hashem Raslan (Manager, Platform & Software Engineering at McAfee) and Kanishk Mahajan (Principal, Solutions Architect at AWS)
Data Observability: Monte Carlo: Use Cases, by John Steinmetz (VP of Data and Analytics at shiftkey)
Gokul Rajaram on designing your product development process, when and how to hire your first PM, a playbook for hiring leaders, getting ahead in your career, how to get started angel investing, more, by Lenny Ratchitsky
BigQuery’s HyperLogLog++ as a Snowflake Java UDTF, by Felipe Hoffa (Data Cloud Advocate at Snowflake)I just wanted to thank all the Monte Carlo Data for their time and incredible inputs to this publication.From the bottom of my heart, thank you Barr, Molly, Lior, Biswaroop, Uri, Elya, Prateek, and everyone involved in this.

If you are not a subscriber of Interesting Data Gigs, join 1150 other Data geeks that receive it directly in their inbox each week — it’s free.

Join the conversation

or to participate.