Author

editor

Browsing

By George Sanders

What are the sorts of problems a B2B marketing agency can help you fix? In the first of a mini-series, George Sanders of Earnest goes back to basics.

When you’re thinking about employing the services of a B2B marketing agency, one of your first questions should be, “Do I actually need one?” Frustratingly, there’s often no one to ask, so here’s a guide on what to consider for B2B brand marketers left feeling stranded.

The right agency can do incredible things for your brand by providing experience and expertise that supplements, augments and amplifies your own, and solving the problems that you can’t internally. But you need to know what those problems are first; you don’t want to approach an agency with a vague, “how can you help me?” If you say to a car salesperson, “I think I need a car,” they’ll probably reply, “here’s our fastest, most expensive and reddest one.”

By first understanding and defining your goals, requirements, priorities and budgets, you can ensure you find the suitable agency partner to meet your needs efficiently, effectively and enjoyably – and avoid succumbing to greasy, glad-handing hustlers.

Defining your objectives

What are your needs and objectives? Think about both business objectives and marketing objectives. The former should already exist as a commercial strategy, and the latter should be captured in a marketing plan, defining opportunities, challenges and requirements to define the agency brief.

If you don’t have a marketing plan, create one yourself or employ the services of a freelance marketing consultant. To progress without one risks directionless activity, no clear line to commercial objectives, and wasted time and budget.

Business objectives

Business objectives are too often overlooked, forgotten or unknown by marketing teams. You should find them in company strategy presentations or board/investor reports, and might focus on international expansion, improving profitability or growing product lines.

By clearly connecting them to marketing strategy, marketers can ensure their work is more visibly and meaningfully contributing to actual business success, and that the metrics provide value outside of the marketing department.

Marketing objectives

Marketing objectives should be part of a marketing strategy and plan that helps make the business objectives happen. If your business objective is to grow revenue in the US market by 50%, your marketing objectives should work to make that happen (for example, by increasing brand awareness in the US by X%, provisioning for demand generation and providing local sales teams with tools and assets).

By defining these objectives, you can then determine what’s possible internally, and where you need external support. It may be that what you’re looking to achieve isn’t something that an agency can help with (sometimes businesses need to ‘get their house in order’ before they go outside).

For brands without a marketing strategy, marketing consultants can be invaluable. The right consultant will develop a picture of what you have, what you need, how to achieve it, and where (if at all) agency support would be most valuable. A common objection is, “Christ, they charge how much per day?” But the upfront cost of a marketing strategy could save the cost of going with the wrong agency. This is the architect before bringing the builders onsite.

Define your agency needs

Next, determine what kind of agency is required. This depends on objectives, your teams’ expertise and your work capacity.

B2B agency services range from strategic big-thinking (brand story and identity, campaign strategies, creative ideas, full-funnel development and production – this is where we at Earnest mostly sit), to more tactical and production-only (tactical or business-as-usual content and asset production, digital marketing, event production).

There are agencies that provide both ends of the spectrum; others specialize in one or the other. Understanding what you need and where the agency provides the most value will help you decide which one is right for you. Another clumsy house-building analogy: a handyman can do a bit of everything, but for a swimming pool you’ll probably want a specialist. The task dictates the service, not the other way around.

It’s important to realize what agencies cannot help you with, such as selling the need for external agency support to stakeholders; promoting the role of marketing internally; securing budget; and developing strategies and objectives. They can offer guidance and support, but these typically wouldn’t be in their remit.

Defining KPIs and the brief

Next, how will you measure the success of your work together?

Project KPIs should align with marketing strategy, determined by the task. That could be ‘increase visitors to the website by X%’ or ‘deliver a new brand identity by the end of December.’ This will help you find the right agency, help agencies determine if they’re right for you and inform the brief.

The importance of the written brief to agencies can’t be overstated. The quality of work from an agency is directly related to the quality of the brief; incorrect information, missing details and changing direction will lead to work that’s inaccurate, irrelevant and eventually costly.

Once the above are in place, determine what the engagement process looks like. Is it a request for proposal or a full-on strategic/creative pitch?

Engaging with an agency can supercharge your marketing with robust insight, informed strategy and inspiring creative. But don’t run into the unknown without foundational elements in place. The reality of the working world (aggressive deadlines, demanding stakeholders and tight budgets) make it difficult to have these in place when you need them, but taking a step back and communicating the importance of planning and objective-setting can pay off disproportionately.

Feature Image Credit: Matthew Waring via Unsplash

By George Sanders

Soured from The Drum

By Rebecca Brooks

Most shoppers today aren’t loyal; they’re always looking for the next best thing that will fit their exact needs. This evolution has been occurring for years, driven by powerful forces: technology, category disruption, socioeconomics and more. Shoppers have exponentially more choices and information at their fingertips, so they are making more informed decisions. World events, such as the global recession of 2008, ongoing political upheaval, the arrival of Covid-19 and now looming inflation, are accelerating a fundamental change in the way people shop. They are more willing than ever before to try new brands, products and categories.

We’ve coined this evolution as “shopper promiscuity” to indicate not a moral judgment but a distinct movement away from traditional brand loyalty. In our new book, Influencing Shopper Decisions, my co-author and I outline four key forces that are driving shopper promiscuity: innovation, unlimited access to information, the need for personal expression and the reprioritization by shoppers of what’s important to them. Inflation can now be added to today’s list of factors that are driving down brand loyalty and creating a new kind of shopper.

First, let’s talk a bit about the initial four forces that we identify in the book so we can understand how new inflationary trends are playing into the picture.

Innovation: Advancements in technology are driving innovation at a rapid pace, causing consumers to expect constant change and improvement as they seek the latest, greatest thing. Traditions and nostalgia no longer play into the picture for many shoppers, as they believe they deserve something new, shiny and leading-edge every single time they buy.

Unlimited access: From DTC products to the ubiquitous Amazon, new brands and shopping channels are constantly emerging. Shoppers are becoming increasingly accustomed to the new and unorthodox, and they don’t hesitate to adopt shopping behaviour that takes advantage of these new channels and technologies.

Need for expression: Consumers understand that what they buy says something about themselves, and they use purchases to tell their “story.” Consumers are looking for more than form, function, price and performance—they want to feel a connection to what they are buying and show their peers/social networks how their purchases contribute to their identity.

Reprioritization: People are redefining what “value” means to them, and it consists of more than just price. Over the past few months—years even—we’ve seen a major shift in how people think about their safety in the world, social justice, corporate responsibility and “needs” vs. “wants.” And it makes them reprioritize their resources.

Inflation is now also influencing the reprioritization of resources, driving shopper promiscuity in a very tangible way: at the pocketbook. In the United States alone, inflation is at its highest rate in 40 years. It stands to reason that this will have a huge impact on the way people spend money. In addition to looking for purchases that are innovative, reflect their personality or are a product of reprioritization, shoppers are now looking for products that will fit a new spending budget forced on them by rising prices.

Consumers are willing to try new things that are less expensive or more readily available to them as supply chain issues persist and impact inflation overall. Worry about financial instability can make people more cautious and less likely to spend more on a premium product. Higher prices on the things they need leave them with less disposable income as dollars become earmarked for more “survival” type purchases. It’s a trade-off that is boosting shopping promiscuity once again.

While this inflationary environment does present an opportunity for discount brands and competition based on pricing, that isn’t the only thing brands should be focusing on to be successful. Value adds, such as retailer loyalty rewards, can appeal to shoppers, as well as educational materials, transparency and information about pricing from brands on ways to make dollars go further. From a business standpoint, brands should focus on building an understanding of a changing customer in order to meet shopper needs head-on. It’s important to also remember that every category is being affected differently by inflation. By conducting the right market research, brands can gain insights into where their marketing or product development spend will have the most impact with their audiences, even as their behaviour shifts.

Feature Image Credit: getty

By Rebecca Brooks

Founder and CEO of market research consultancy, Alter Agents; believer that powerful insights can change businesses. Read Rebecca Brooks’ full executive profile here.

Sourced from Forbes

By Zachary McAuliffe

Not every iPhone app needs access to your photos and contacts. Here’s how to stop them.

When you use an iPhone app for the first time, you might be asked to give the app access to other features on your phone like your camera. If you’re like me and just want a new app to work you’ve probably tapped “Allow” without a second thought. However, you might not realize that tapping “Allow” gives the app access to other information on yourself and those closest to you.

Those apps could be sharing your data with digital marketing and ad tech companies without your knowledge. Companies like Apple and Facebook have faced lawsuits and fines for allegedly misusing customer data.

If you’ve granted a third-party iPhone app certain permissions, you can revoke them at any time. Here’s how to stop third-party apps from accessing your data.

How to change third-party app permissions

Here’s how to change permissions in iOS 15 and later:

1. Tap Settings on your iPhone.

2. Tap Privacy.

In the Privacy menu you can select functions like Contacts, Photos and Camera to see which third-party apps have requested permission to access this information. Tapping Contacts might show that a note taking app has access to the names and numbers of people in your contacts list. You can tap the slider next to these apps to halt access.

More in the Privacy menu

In addition to revoking app permissions in the Privacy menu, you can also customize which apps can access your location data. If you tap Location Services near the top of the menu, you can turn these services on or off for all or some apps on your phone. You can also tap the Share My Location menu to enable or disable Find My iPhone, as well as with which contacts you share your location.

There’s also an option in the Privacy menu called Apple Advertising. Tap this to view Apple’s ad targeting information, and turn these personalized ads on or off. Apple said turning personalized ads off will make ads you see in the App Store, Apple News and Stocks less relevant to you, but it might not reduce the number of ads you see in those apps.

For more, check out how to stop iPhone apps from tracking you, how to use Sign In With Apple to improve your privacy and the best iPhone VPNs.

Feature Image Credit: Patrick Holland/CNET

By Zachary McAuliffe

Sourced from CNET

By Taruka Srivastava 

Pinterest’s latest campaign is urging people to believe in themselves by advising them to ‘Don’t Don’t Yourself’

Share

 

The campaign aims to highlight how people can be their own worst enemies, as we have the power to silence negative feelings of fear and self-censorship.

Five spots – ‘Fear of Failure,’ ‘Judgment,’ ‘Doomscrolling,’ ‘Procrastination’ and ‘Inner Critic’ – have been launched. Every spot features protagonists alongside their negative twin, who is always criticizing, being negative and condescending – but the ads show how Pinterest helps all protagonists to overcome those fears by just doing things and believing in themselves. The single-take cinematic films are shot by acclaimed director Kim Gehrig.

In the ‘Inner Critic’ spot, the negative twin of the protagonist is trying to demotivate the protagonist by telling her how her art during childhood wasn’t up to the mark and her junior ballet performance wasn’t “on point,” and then mocks her personal style. The protagonist, however, shuts the twin back in the cupboard and confidently says she can pull off what she is wearing.

The campaign will run in the US, UK and Germany across TV, cinema, video on demand (VOD), out-of-home (OOH), digital out-of-home (DOOH), social and media partnerships. Developed in partnership with award-winning UK creative studio Uncommon, the campaign introduces Pinterest as the inspiring ‘anti-don’t.’ The complementary media strategy was developed in partnership with global media agency Mediahub.

Andréa Mallard, Pinterest chief marketing officer, said: “Our latest campaign highlights how Pinterest is a different side of the internet, where you can focus more on doing and less on viewing, where you can find what you love and forget about likes and where you can plan your life and try something new, free of judgment.”

Credits

Project Name: Don’t Don’t Yourself

Creative Studio: Uncommon

Client: Pinterest

Production company: Somesuch

Director: Kim Gehrig

Producer: Lucy Gossage

Executive producer: Chris Watling

DOP: Kasper Tuxen

Production designer: KK Barrett

Costume designer: April Napier

Casting: Jody Sonnenberg

Service company: The Lift

Editor: Fouad Gaber @ Trim

Post production: Time Based Arts

Colorist: Simone Grattarola

VFX: Stephen Grasso

Post producer: Sian Jenkins

Soundtrack composer: Soundtree Music

Composer: Benjamin Jones

Audio post-production: Soundtree Music

Media agency: MediaHub

By Taruka Srivastava |

Sourced from The Drum

By Douglas A. McIntyre

Brand value and loyalty studies have become a major part of the American marketing industry. Several of these studies involve brand value. Interbrand is the best known of these. It considers 100 brands, based on its own mix of criteria. Not surprisingly, tech brands such as Apple and Microsoft are in the lead with brands worth well into the hundreds of billions of dollars.

Another take on brands is based on reputation. Among the most well known of these is the Axios/Harris study, which ranks 100 companies on reputation. Grocery, retail and food brands tend to be near the top, with Trader Joe’s in first place. The Trump Organization is at the bottom.

Still another cut at brand research is the Brand Keys Loyalty Leaders, another list of 100. Its yardsticks are the companies that have the “best-practice guidelines for creating and nurturing customer loyalty.” The study covered 1,624 brands in 142 categories.

One of the points of the study is that brand reputation results have moved back to “normal” after some distortion over the COVID-19 pandemic’s first two years. Robert Passikoff, Brand Keys founder and president, commented: “The significant re-distribution of loyalty identified in the 2022 loyalty rankings are leading indicators of what a return-to-normalcy marketplace will look like.”

The study looks at brand loyalty rank and how much this has changed year over year by brand. The top brands on the loyalty rank list are Apple and Amazon, which typically rank high in all the brand studies. They were followed by Domino’s and Disney+. Although still considered a brand to which people are loyal, Tesla was at the bottom of the list.

In terms of brands that gained and lost the most in the study, State Farm rose 25 places on the list. MSNBC was second, rising 19 places.

The brand that lost the most ground was Purell, which dropped by 52 places. Clorox fell by 41. The only reasonable explanation is that when the COVID-19 pandemic was at its worst, these were products people used to protect themselves from the spread of the virus.

By Douglas A. McIntyre

Sourced from 24/7 Wall St

As the media landscape evolves, so do our preferences for communication. Rose Skews at Favoured delves into the changing world of online comms, audience segmentation and what it takes to engage the younger generation.

With the rise of short-form video content on platforms including TikTok and Instagram, three-minute videos are affecting our attention span. As online communications move toward Facebook Messenger, WhatsApp and social media direct messaging, the phasing out of email as a communication platform begs the question: are consumers still engaging with email marketing?

The short answer is yes. The long answer? Also yes, so long as you’re doing it well and know who you’re doing it for. More than half of generation Z and over a third of millennials still enjoy getting brand emails. With that in mind, let’s look at how you can create engaging emails that even gen Z will want to read.

Don’t be boring

Dull, plain text emails that waffle on won’t engage your impatient audience. So, what can you do?

  • Use short emails: try testing short-form emails. Get to the point of your email quickly and efficiently
  • Hook them in: create hooks for your subject lines and the headers that highlight the crux of the email. This will help with your open and click-through rate
  • Get personal: adding in a recipient’s name can be a little technical at first but it’s totally worth it. If you’re able to personalize things such as names, this makes your emails more trustworthy and engaging
  • Include a strong call to action (CTA): ‘Read more’ and ‘Discover now’ just won’t cut it. Add a little spice to CTAs, like ‘You won’t want to miss this’

Your copy and message need to be clear, concise and interesting. Try bringing your tone to a more personable level to better engage with your audience.

Flows and broadcasts

Email marketing can be a great tool if you can plan and set it up well. At Favoured, we typically split email marketing into two: flows and broadcasts.

Flows are automations where you can segment your audience, create cohorts and triggers, and (after an initial set-up) run them continuously. Broadcasts are monthly newsletters that give you an opportunity to update your audience on anything new.

You can create manual, ad hoc campaigns for an extra burst of comms (around a sale, for instance).

Flows

With email marketing and segmentation, you can capture audiences’ personas. Your main cohorts will be active users, inactive users and new users. Email flows for active users might be:

  • Repeat purchase: thank the customer for their continued support. This flow normally has a refer-a-friend scheme in later emails
  • ‘Superusers’: especially for app companies – when a customer has triggered an event within the app a certain number of times and you want to maintain their engagement

Inactive users will have email flows such as:

  • Abandoned cart: with an average conversion rate of 80%, this is one of the most important flows you can set up
  • Re-engage: this flow should focus on offering small discounts as a temptation to get customers back on track. If you have an app, try explaining a new feature as an incentive to click

New users will have email flows such as:

  • Onboarding: this is your opportunity to show the customer who you are and what you can offer them
  • ‘Web catch all’: if anyone submits a form on your website you can catch them here – a great place to convert them to onboard and/or purchase

Now that we have the flows sorted, let’s look at how you can bring engagement with monthly newsletters.

Broadcasts

The trick is to break your newsletter into sizeable chunks. Try highlights, news and testimonials. You could even connect with national marketing days to make sure you’re hitting key dates relevant to your brand.

News and updates sections give you the opportunity to chat with customers about what you’ve been up to. Adding a testimonial or two brings credibility to your brand and/or product, while adding an extra level of desire.

Who’s doing it well?

Some companies have been smashing email marketing. One is Estrid. It had a hard task ahead of it as its main target audience is gen Z and millennials – the prime suspects for a lack of attention span. It has smashed it: its emails are engaging; it has bought movement into its design with the use of gifs; and its tone is personable and fun.

Maybe you were on the fence about email marketing. We’re hoping that now you see the value it could add to your marketing strategy. Email marketing can capture and engage your audience in a different way than social media. If you ever need any advice, the expert team at Favoured is always available to help.

Feature Image Credit: Volodymyr Hryshchenko via Unsplash

By Rose Skews

Sourced from The Drum

By Gergely Orosz

Q: I’m hearing more about data engineering. As a software engineer, why is it important, what’s worth knowing about this field, and could it be worth transitioning into this area?

This is an important question as data engineering is a field that is without doubt, on fire. In November of last year, I wrote about what seemed to be a Data Engineer shortage in the issue, More follow-up on the tech hiring market:

“Data usage is exploding, and companies need to make more use of their large datasets than ever. Talking with hiring managers, the past 18 months has been a turning point for many organizations, where they are doubling down on their ability to extract real-time insights from their large data sets. (…)

What makes hiring for data engineers challenging is the many languages, technologies and different types of data work different organizations have.”

To answer this question, I pulled in Benjamin Rogojan, who also goes by Seattle Data Guy, on his popular data engineering blog and YouTube channel.

Ben has been living and breathing data engineering for more than 7 years. He worked for 3 years at Facebook as a Data Engineer and has gone independent following his work there. He now works with both large and small companies to build out data warehousing, developing and implementing models, and takes on just about any data pipeline challenge.

Ben also writes the SeattleDataGuy newsletter on Substack which is a publication to learn about end-to-end data flows, Data Engineering, MLOps, and Data Science. Subscribe here.

In this article, Ben covers:

  1. What do data engineers do?
  2. Data engineering terms.
  3. Why data engineering is becoming more important.

In part 2 – coming next week and already out for full subscribers – we will additionally cover:

  • Data engineering tools: an overview
  • Where is data engineering headed?
  • Getting into data engineering as a software engineer

With that, over to Ben:


For the past near decade I have worked in the data world. Like many, in 2012 I was exposed to HBR’s Data Scientist: The Sexiest Job of the 21st Century. But also like many, I found data science wasn’t the exact field for me. Instead, after working with a few data scientists for a while I quickly realized I enjoyed building data infrastructure far more than creating Jupyter Notebooks.

Initially, I didn’t really know what this role was that I had stumbled into. I called myself an automation engineer, a BI Engineer, and other titles I have long forgotten. Even when I was looking for jobs online I would just search for a mix of “SQL”, “Automation” and “Big Data,” instead of a specific job title.

Eventually, I found a role called “data engineer” and it stuck. Recently, the role itself has been gaining a little more traction, to the point where data engineering is growing more rapidly than data science roles. Also, companies like Airbnb have started initiatives to hire more data engineers to increase their data quality.

But what is a data engineer and what do data engineers do for a company? In this article, we dive into data engineering, some of its key concepts and the role it plays within companies.

Where do data engineers “sit”? They’re typically working with software engineers and data scientists, but much less with product managers. In this article, we dive deeper into the data engineering field.
Where do data engineers “sit”? They’re typically working with software engineers and data scientists, but much less with product managers. In this article, we dive deeper into the data engineering field.

1. What do data engineers do?

How do you define data engineering? Here’s how data engineer Joe Reis specifies this term in his recently released book, Fundamentals of Data Engineering:

“Data engineering is the development, implementation, and maintenance of systems and processes that take in raw data and produce high-quality, consistent information that supports downstream use cases, such as analysis and machine learning.

Data engineering is the intersection of security, data management, DataOps, data architecture, orchestration, and software engineering. A data engineer manages the data engineering lifecycle, beginning with getting data from source systems and ending with serving data for use cases, such as analysis or machine learning.”

In short, data engineers play an important role in creating core data infrastructure that allows for analysts and end-users to interact with data which is often locked up in operations systems.

For example, at Facebook there is often either a data engineer or a data engineering team which supports a feature or business domain. Teams that support features and products are focused on helping define which information should be tracked, and then they translate that data into easy-to-understand core data sets.

A core data set represents the most granular breakdown of the transactions and entities you are tracking from the application side. From there, some teams have different levels of denormalization they might want to implement. For example, they might want to denormalize if they remove any form of nested columns to avoid analysts having to do so.

Also, many teams will set standards on naming conventions in order to allow anyone who views the data set to quickly understand what data type a field is. The basic example I always use is the “is_” or “has_” prefix denoting a boolean. The purpose of these changes is to treat data as a product; one that data analysts and data scientists can then build their models and research from.

Our team at Facebook produced several core data sets that represented recruiting and People data. These core data sets allowed analysts and data scientists to see the key entities, relationships and actions occurring inside those business domains. We followed a core set of principles which made it clear how data should be integrated, even though it was being pulled from multiple data sources, including internally developed products and Saas solutions.

What are the goals of a core data set? Here are the three main ones.

1. Easy to work with. Datasets should be easy to work with for analysts, data scientists and product managers. This means creating data sets that can be easily approached without a high level of technical expertise to extract value from said data. In addition, these data sets standardize data so that users don’t have to constantly implement the same logic over and again.

2. Provide historical perspective. Many applications store data which represents the current state of entities. For example, they store where a customer lives or what title an employee has. Not all applications store these changes. In turn, data engineers must create data that represent this.

The traditional way to track historical changes in data was to use what we call Slowly Changing Dimensions (SCD). There are several different types of SCD, but one of the simplest to implement is SCD type 2 which has a start and end date, as well as an “is_current” flag.

An example of an SCD is a customer changing their address when they move home. Instead of just updating the current row which stores the address for said customer or employee, you will:

  1. Insert a new row with the new information.
  2. Update the old row, so it is no longer marked current.
  3. Ensure the end date represents the last date when the information was accurate.

This way, when someone asks, “how many customers did we have per region over the last 3 years,” you can answer accurately.

3. Integrated. Data in companies come from multiple sources. Often, in order to get value from said data, analysts and data scientists need to mesh all the data together, somehow. Data engineers help by adding IDs and methods for end-users to integrate data.

At Facebook, most data had consistent IDs. This made it feel like we were being spoiled, as consistent IDs made it very easy to work with data across different sets.

When data is easy to integrate across entities and source systems it allows analysts and data scientists the ability to easily ask questions across multiple domains, without having to create complex – and likely difficult to maintain – logic to match data sets. Maintaining custom and complex logic to match data sets is expensive in terms of time and its accuracy is often dubious. Rarely have I seen anyone create a clean match across data that’s poorly integrated.

One great example I heard recently was from Chad Sanderson of Convoy. Chad explained how a data scientist had to create a system to mesh email and outcome data together and it was both costly and relied on fuzzy logic which probably wasn’t as accurate as possible.

At Facebook, even systems like Salesforce, Workday and our custom internal tools, all shared these consistent IDs. Some used Salesforce as the main provider and others used internal reporting IDS. But it was always clear which ID was acting as the unique ID to integrate across tables.

But how can data engineers create core data sets which are easy to use?

Now we have discussed the goal, let’s outline some of the terms you’ll hear data engineers use to make your data more approachable.

2. Data engineering terms

Let’s explain some commonly used data engineering terms.

Some of the more common data engineering terms
Some of the more common data engineering terms

ETL\ELT\Data Pipelines

You will often hear data engineers describe most of their job as moving data from point A to point B. We do this using data pipelines.

Data pipelines are generally structured as either an Extract, Transform, Load or an Extract, Load, Transform (ETL vs ELT.) Of course, there are other design patterns we may take on, such as event pipelines, streaming and change data capture (CDC.) This is why many of us often just generalize and use the term ‘data pipelines.’ However, most pipelines are in the form of the steps below.

E – Extract. The extract step involves connecting to a data source such as an API, automated reporting system or file store, and pulling the data out of it.

For example, one project I worked on required me to pull data from Asana. This meant I needed to create several components to interact with Asana’s multiple API endpoints, pull out the JSON and store it in a file service.

I will say Asana’s API is not built for bulk data extracts. There were many cases where I would have to get all the new project IDs, as well as the current ones I had in my current table, in order to then set up a second set of calls to use the project ID to get all the tasks attached to said projects. This was plainly far more cumbersome than just saying, “give me all the tasks in this organization.”

T – Transform. The transform step (especially in an initial pipeline) will likely standardize the data (format dates, standardize booleans, etc.,) as well as sometimes starting to integrate data by adding in IDs which are also standardized, deduplicating data and adding in more human readable categories.

Transforms can also be more complex, but the examples above are standard. One example of this was that with all the various core data sets our team was managing, we always defined which ID was the core ID for a particular data set. Whether the core ID was an internal Facebook ID or an external Saas ID, we would be clear which one we used as “core”.

Because we were clear on the core ID, if we ended up having multiple IDs in a table, we would remove them to avoid further confusion, downstream.

From my own philosophical perspective, creating data as a product means creating a product someone with limited knowledge of your team’s data can approach and understand. By understanding, I mean they can quickly grasp what fields mean and how one table relates to another.

Understanding the data set is an important step in a baseline transform. There are more complex transforms which can occur for analytical purposes.

L – Load. This step is meant to load data into a table in the data warehouse. There isn’t anything fancy here. At Facebook, we loaded into our team’s “data warehouse,” but you could be loading into Snowflake, Databricks, Redshift, Bigquery or others.

Data Modelling

Software engineers who have built their own database will be familiar with some aspects of data modelling. Generally, for what is known as an OLTP (online transaction processing,) these systems are developed to be fast and robust for single transactions.

However, these data models pose problems when the data is being used for analytics.

The data in these models requires a lot of joins to get to an obscure field which a data analyst wants to know about. Also, the data model isn’t usually developed to support billions of row aggregations and calculations with good performance. These models tend to be heavily normalized and in turn require heavy amounts of processing to answer even simple questions.

In return for data sets being heavily normalized, data modelling for data engineers is a combination of adding missing abilities for these data sets. By missing abilities, I mean the ability to track historical data, improving performance of analytical queries and simplifying the data model into a much more straightforward set of tables. Common end-states will be referred to as snowflake, star, activity or OBT schemas.

Data Integrity Checks

Tracking every single table and all its columns isn’t feasible for a single data engineer. When I worked at Facebook, I had well over 300 tables for which I was listed as the owner. To make matters worse, bad data can enter a column and not fail because it’s the same data type. Especially in the case when you’re loading data via the order they come in, versus (vs) explicit name calls.

A software engineer upstream could change a source, which in turn changes the order or removes the column altogether. This change could lead to data being loaded improperly but not failing, if the data is the same data type.

Data integrity checks are a crucial first line of defence for detecting data issues like these I’ve just described. There are several traditional types of data integrity checks.

  • Null Checks – These checks calculate what percentage of a column is null and can then be set to a threshold value to go off when a column has too many nulls.
  • Anomaly Checks – Anomaly checks can be used to both check specific column values as well as metadata about a table, such as how many rows were just inserted. These checks aim to detect drastic changes in either the fields or row counts. If, for example, a table suddenly has 10x the number of rows compared to yesterday, then perhaps there is a problem.
  • Category Checks – Many fields represent enumerators or categories which should have only specific results. One example I always use is when I worked at a company that had a “State_Code” field. You’d assume this field only provided valid states as it was most likely filled via a drop-down menu. However, we found there were errors from time to time which weren’t valid states. So, we needed a data check in place to catch these issues.
  • Uniqueness Check – Joining data across multiple granularities and data sets risks creating duplicate rows in your data warehouse; even if you removed some duplicates, earlier. They can be easily re-introduced with the wrong join, so creating a check in your final core data layer is key to ensuring you have the correct level of unique rows. Especially when a lot of modern data warehouses don’t provide the unique column constraint.
  • Aggregate Checks – As data gets processed through multiple layers of data transforms, there is a chance that removal or changing of data can occur. In turn, creating checks which calculate aggregates such as total sales, row counts or unique customer counts is important because they detect any major changes or removals of data that occur.

Streaming Vs Batch Processing

A common question data engineers need to answer is this: does a pipeline need to be streamed or batch processed?

Batch processing is when you have a data pipeline which runs at a normal cadence, usually every hour, or daily.

At Facebook, most of our jobs would run at around midnight. We would use Dataswarm (which is similar to Airflow,) to set a scheduler using a cron-like configuration for how often the job should run. An interesting point here is that some schedulers struggle with certain constraints, such as daylight savings. I’ve worked on a few projects where twice a year there was a need to rerun pipelines to deal with issues caused by the pipeline running in an unexpected pattern.

Streaming involves ingesting and sometimes transforming data as soon as an event occurs.

For example, on one project we set up a Kafka topic which streamed events directly to Snowflake through its Kafka connector. This allowed the raw tables to constantly be up to date. This was important to the client as their service was providing a real-time utility that was directly connected to people’s day-to-day interactions. Their users both needed to be able to use their product in real-time, as well as glean information and insights from all the actions and machine learning models occurring across the platform.

It’s important to understand the use case before implementing a streaming data process. To fully process streamed data into a pipeline requires a lot more technical know-how, as well as contingencies if something goes wrong, and it can be far more difficult to recover. Whereas with batch data pipelines, if there is an error, it’s pretty easy to rerun the data and the next time the pipeline will need to run is the following day.

In order to know which of these two pipeline styles your team should use, you want to understand how your end-users plan to use the data, how much data is coming in, as well as the natural state of the system you’re pulling the data from. I was on a call recently with a client where initially they stated they wanted the data to be updated in real-time, then it went to every fifteen minutes and when I asked how often their users would be looking at the data, they said once a week. Not a great fit for investing in real-time data and I’d say I have a lot of conversations like this.

This isn’t to say real-time isn’t necessary and it has become far easier. Similar to the way processing large datasets has become far easier, thanks to a combination of the cloud and the popularization of solutions such as Hadoop.

Big Data Processing

Big Data. Some feel the phrase is more of a marketing term and others believe it is the solution to developing reliable machine learning models. At the end of the day, I often refer to it as a big problem, for which we have developed solutions. Big Data on its own was often expensive and difficult to manage, especially on-site. It required constant migrations to larger physical servers and would often limit how many queries could truly be run on a machine.

In turn, many of the techniques centered around Big Data are meant to ease some of these problems.

  • MPP (massively parallel processing) – is a processing paradigm which as the name suggests, takes the idea of parallel processing to the extreme. It uses hundreds or thousands of processing nodes to work on parts of a computational task in parallel. These nodes each have their own I/O and OS and don’t share memory. They achieve a common computational task by communicating with each other over a high-speed internet connection.
  • Map Reduce – is another processing paradigm that can sometimes appear very similar to MPP. It also breaks down large amounts of data into smaller batches, and then processes them over multiple nodes. However, while Map Reduce and MPP appear similar, there are some distinct differences. In general, Map Reduce is done on commodity hardware, whereas MPP tends to be done on more expensive hardware. MPP also tends to refer to SQL-based query computations, while MapReduce is generally more of a design paradigm most famously implemented in Java.

Data Warehouses

The concept of data warehouses has been around for decades. The purpose of data warehouses is to provide analysts and end-users data which tracks historical information and is integrated with multiple data sources.

A data warehouse is a central repository of information that can be analyzed to make more informed decisions. Data flows into a data warehouse from transactional systems, relational databases and other sources, typically on a regular cadence. Business analysts, data engineers, data scientists, and decision makers access the data through business intelligence (BI) tools, SQL clients, and other analytics applications.

Data Lakes

The term ‘data lake’ was coined around 2010. Data lakes became popular because they offer a solution to the rapidly increasing size and complexity of data. As defined by TechTarget:

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.

In addition, data lakes often provide a cheaper option in terms of data storage and computation, since they are often developed on cheaper hardware. In contrast, data warehouses are highly structured and often run on expensive hardware.

Data lakes provide the ability to store data which can have nearly no structure whatsoever, relying on the end-user to provide the schema on-read. What data lakes look like has changed rapidly from the days of Hadoop, as solutions like Delta lake are providing a very different approach to data lakes.

Data Lake Houses

Data lake houses has been a popular term recently, spearheaded by Databricks. The purpose of this paradigm is to balance the benefits of a data warehouse and data lake into a single solution.

For example, data lake houses should provide ACID (Atomicity, Consistency, Isolation, and Durability) transactions like a data warehouse while balancing the flexibility and scale of a data lake.

3. Why data engineering is becoming more important

Data has grown in size, speed and complexity. Over the past decade, the complexity of data has vastly increased. In the past, most transaction systems mostly tracked baseline information. For example, they tracked information like what someone purchased at a store, or which website they visited.

Nowadays, even basic applications can track additional information or provide even deeper functionality, which is then tracked. Tracking additional information leads to much more complex events and transactions being stored. Let’s add that users now interact with mobile and IoT devices most of the day, which increases the sheer volume and has led to what’s referred to as the “5 Vs” of Big data, velocity, volume, value, variety and veracity.

These increasing dimensions require better tooling and more specialized expertise in how to handle all this data. However, it’s not just the supply of data that’s increased. It’s also the demand.

Everyone wants access to their data. With the increasing number of analysts and data scientists, as well as the proliferation of SQL-literate employees, the demand for data has grown. Add to this the fact that today pulling and managing data is not limited to large organizations which can afford all the administration staff. Now, a small company can pay for what it uses in terms of data warehousing, storage and computation.

Tools like Snowflake and Bigquery make storing large amounts of data considerably easier and cheaper – when set up correctly. Add in the fact that tools like Salesforce and Shopify make it easier for end-users to pull data out from them. All of this has led to companies of all sizes pushing to create data storage systems for analytics. Overall – especially in the past decade with the mass adoption of the cloud – data has just become that much easier to manage and analyze.

This was it for part one. Subscribers can already read the rest of the article. We’ll wrap up with part two next week: subscribe to get it in your inbox, or bookmark this page and check back for it!

By Gergely Orosz

Sourced from The Pragmatic Engineer

By Michael Burgi

As e-commerce and retail media have separately made their mark on the media buying and selling business over the last decade — most notably over the last two years — one consultancy believes it’s time to look at the two as one big industry: commerce media that encompasses all the advertisers, retail media firms, media companies and shoulder industries that serve them.

And the thinking behind the rolling up of all that has to do with the power of connecting media investment to sales data in whatever form it takes, explained Quentin George, partner and Jon Flugstad, associate partner at McKinsey, two leaders of the firm’s commerce media practice. McKinsey estimates it all adds up to some $1.3 trillion in enterprise value.

  • Broken down that includes:
    $820 billion for retailers who develop new, margin-rich media businesses
  • $280 billion for advertisers in the form of higher returns on ad spending (ROAS)
  • $50 billion for publishers from new ways of capturing additional ad dollars
  • $5 billion for ad agencies that deliver high-efficiency performance marketing for clients or help firms set up media planning and buying capabilities
  • $160 billion for ad-tech providers who offer martech solutions to firms that have no experience as media companies.

“For the last 100 years, we’ve optimized media on impression delivery — did I reach the audience that I said I was going to reach?” said George. “The change here is, I can now connect an impression with a SKU level sale — not with a [checkout] basket, not with a credit card, but with a direct sale. And that is incredibly transformative for the industry.”

What’s more impressive to the McKinsey executives is that the growth the world of commerce media is experiencing is largely incremental — CPG advertisers can’t seemingly get enough of it.

“Our surveys [show] that somewhere around 70% of advertisers indicate that [when they buy ad time or space] on retail media, it’s somewhat or significantly better performance than what they can get elsewhere,” said Flugstad. “There’s no certain threshold or number that they’ll spend — if you’re driving performance they will keep shifting toward you. And therefore the pie that retail media can eat from is the broader digital pie.”

Clearly that power represents both incredible opportunity and a potential challenge to the agency world, as retailers and e-commerce firms take their story directly to brands. It helps to explain why some agency holding companies have taken steps to either partner with the bigger players of retail media, or have invested in their own shoulder and support businesses.

“When you can measure things toward direct sales, more dollars go there,” said Megan Pagliuca, chief activation officer for Omnicom Media Group, which announced four separate e-commerce-related partnerships during the Cannes Lions festival. She recalled that Facebook became and ad juggernaut only when it changed its focus from brand advertising to a more DTC approach that drove eyeballs directly to advertisers’ pages. “When it’s directly attributable you can’t argue with that.”

Industry analyst firm Forrester is working on its own research into the broader conjoining of performance, commerce and retail in the marketing ecosystem, and is finding that marketer are looking at it all as one as well. “The distinction at the client level between retail media performance media, or commerce media, is not a clear distinction,” said Jay Pattisall, vp and senior agency analyst.

Another unifying factor to Pattisall is that virtually all the buying and selling activity across the landscape is data driven. “Performance media is driven by third-party data, commerce is driven by first-party platform data and retail media is driven by first-party retailer data. But the signals are similar in the sense that they seek to understand, who’s buying, what they’re responding to, and what it means for sales or conversion.”

Will this approach to media investment lead to a world of haves and have-nots, the latter being media that don’t deliver similar levels of sales results or business outcomes? “Is there going to be an expectation that all media should become better measurable? I hope so, that’s a good evolution,” said Pagliuca.

Pattisall thinks the effect will be limited from a media point of view, because media can only be optimized so much. The creative side of marketing is where better optimization can happen. “Creative optimization comes from those same data signals of who’s responding to what andthose atomic elements of what’s being presented to them,” he said. “There’s a tremendous amount of work underway all across the industry to understand this … What combinations of information and content are most effective, rather than just where it’s placed?”

Feature Image Credit: Kevin Kim, directed by Ivy Liu 

By Michael Burgi

Sourced from DIGIDAY

By Justin Santamaria, & Ash Lamb

From 2003 to 2013, I was an engineer at Apple, where I led the teams that built FaceTime, iMessage and CarPlay.

Getting to work closely with Steve Jobs was an opportunity I’ll never forget. He was a visionary who taught me a lot about not just how to make products that people love, but also how to be successful at anything in life.

Here are the three simple yet profound lessons I learned from Jobs that have helped me succeed in my career as a tech entrepreneur today:

1. Mastery demands iteration.

Illustration: Ash Lamb for CNBC Make It

Getting something right requires patience and hard work. But it also means knowing when to stop making changes; you’ll know when you’ve arrived at the best product when you’re beyond excited to share it.

During my first week at Apple, Jobs was prepping for an iChat demo. “I’m going to make the crowd sh** their pants,” he said.

Jobs knew he had executed something great.

2. Use your failures as stepping stones to success.

Illustration: Ash Lamb for CNBC Make It

When Apple was ready to release the iPhone into the world, the foundation was already there, making it possible to keep taking new and different risks later on.

With every product, Jobs expected things to go wrong. But he also understood that messing up was often worth the reward. Perfection may not exist, but greatness could be achieved with a few software updates.

3. Remove the rock that’s blocking you from going beyond your comfort zone.

Illustration: Ash Lamb for CNBC Make It

The original iPhone changed the world forever in 2007, with its multitouch screen and digital keyboard as highlights.

The decision to remove the mechanical keyboard was a clever industrial design solution. It allowed the iPhone to have more screen space for other creative features.

Feature Image Credit: Justin Sullivan | Getty Images

By Justin Santamaria, & Ash Lamb

Justin Santamaria is a former Apple engineer. Currently, he is the co-founder of the fitness app Future. Prior to Future, he led the guest experiences product team at Airbnb. Follow him on Twitter.

Ash Lamb is an illustrator and designer based in Barcelona, Spain. He spends his time deconstructing and illustrating ideas for creative entrepreneurs, and teaching people how to create impactful visuals at visualgrowth.com. Follow him on Twitter and Instagram.

Sourced from CNBC make it