Google Cloud



Back in May 2018, Twitter announced that they were collaborating with Google Cloud to migrate their services to the cloud. Today, after two years, they have successfully migrated and have also started reaping the benefits of this move.

For the past 14 years, Twitter has been developing its data transformation pipelines to handle the load of its massive user base. The first deployments for those pipelines were initially running in Twitter’s data centers. For example, Twitter’s Hadoop file systems hosted more than 300PB of data across tens of thousands of servers.

This system had some limitations despite having consistently sustained massive scale. With increasing user base, it became challenging to configure and extend new features to some parts of the system, which led to failures.

In the next section, we take a look at how Twitter’s engineering team, in collaboration with Google Cloud, have successfully migrated their platform.

How The Migration Took Place

For the first step, Twitter’s team left a few pipelines such as the data aggregation legacy Scalding pipelines, unchanged. These pipelines were made to run at their own data centers. But the batch layer’s output was switched to two separate storage locations in Google Cloud.

The output aggregations from the Scalding pipelines were first transcoded from Hadoop sequence files to Avro on-prem, staged in four-hour batches to Cloud Storage, and then loaded into BigQuery, which is Google’s serverless and highly scalable data warehouse, to support ad-hoc and batch queries.

This data from BigQuery is then read by a simple pipeline deployed on Dataflow, and some light transformations are applied. Finally, results from the Dataflow pipeline are written into Bigtable. This is a Cloud Bigtable for low-latency, fully managed NoSQL database that serves as a backend for online dashboards and consumer APIs.

With the successful installation of the first iteration, the team began to redesign the rest of the data analytics pipeline using Google Cloud technologies.

After evaluating all possible options, the team chose Apache Beam because of its deep integration with other Google Cloud products, such as Bigtable, BigQuery, and Pub/Sub, Google Cloud’s fully managed, real-time messaging service.

A BigQuery slot can be defined as a unit of computational capacity required to execute SQL queries.

The Twitter team re-implemented the batch layer as follows:

  • Data is first staged from on-prem HDFS to Cloud Storage
  • A batch Dataflow job then regularly loads the data from Cloud Storage and processes the aggregations, and
  • The results are then written to BigQuery for ad-hoc analysis and Bigtable for the serving system.

For instance, the results showed that processing 800+ queries(~1 TB of data each) took a median execution time of 30 seconds.

Migration final picture via GCP

The above picture illustrates the final architecture after the second step of migration.

For job orchestration, the Twitter team built a custom command line tool that processes the configuration files to call the Dataflow API and submit jobs.

What Do The Numbers Say

via Twitter 

The migration for modernization of advertising data platforms started back in 2017, and today, Twitter’s strategies have come to fruition, as can be seen in their annual earnings report.

The revenue for Twitter can be divided mainly into two categories:

  • Ads
  • Data licensing and other services.

According to the quarterly earnings report for the year 2019, Twitter has declared decent profits with steady progress.

“We reached a new milestone in Q4 with quarterly revenue in excess of $1 billion, reflecting steady progress on revenue product and solid performance across most major geographies, with particular strength in US advertising,” said Ned Segal, Twitter’s CFO.

The 2019 revenue was $3.46 billion, which is an increase of 14% year-over-year.

  • Advertising revenue totalled $885 million, an increase of 12% year-over-year
  • Total ad engagements increased by 29% year-over-year

The motivation behind Twitter’s migration to GCP also involves other factors like the democratization of data analysis. For Twitter’s engineering team, visualization, and machine learning in a secure way is a top priority, and this is where Google’s tools such as BigQuery and Data Studio came in handy.

Although Google’s tools were used for simple pipelines, Twitter, however, had to build their infrastructure called Airflow. In the area of data governance, BigQuery services for authentication, authorization, and auditing did well but, for metadata management and privacy compliance, in house systems had to be designed.


Sourced from AIM


You may have used the Waze app to avoid traffic, but what if that data could be used to fight traffic data on a larger scale? That’s what the Waze for Cities Data program aims to do. Waze is making anonymized user data available to cities for free on Google Cloud and adding the tools to help urban planners analyze it.

Waze for Cities Data launched in 2014 as the Connected Citizens Program. It started with 10 city partners and has since grown to 1,000 partners globally, according to Waze, encompassing both cities and other entities that can make use of the app’s crowdsourced traffic data. Partners will now have access to Waze data collected since April 2019 via Google Cloud, as well as analysis tools BigQuery and Data Studio, which were designed to make sense even to lay audiences, according to Waze.

What can users do with this data? Genesis Pulse, an emergency services software provider, started using Waze data to give first responders real-time crash alerts from Waze users. In 40% of cases, crashes are reported by Waze users 4.5 minutes before they are called in via 911 or an equivalent method, according to Waze. According to the Federal Communications Commission, a one minute decrease in average ambulance response time saves more than 10,000 lives in the United States annually, Waze noted.

Public agencies that apply for the program can analyze up to 1TB of data, and store up to 10GB of data, for free each month. The basic data analysis tools are free as well, but more advanced tools will require a paid account. Cities will also be able to store and analyze their own data, while maintaining complete control of it, according to Waze.

Waze’s data-sharing scheme is already proving popular. The top three contributors are the cities of Seattle, Los Angeles, and San Jose, according to Waze. The government of Miami-Dade County and the state transportation agencies of Massachusetts and Virginia are also major contributors, as are both New York City and the Port Authority of New York and New Jersey, which operates large chunks of the Big Apple’s transportation infrastructure. So the next time you open up the Waze app, know that you may be helping to fight urban traffic.

Feature Image Credit: Andy Boxall/Digital Trends


Sourced from Digital Trends


Google is doing more to help companies connect, manage, secure and analyze ever-growing amounts of data.

At its Google Cloud conference in San Francisco, the company unveiled a raft of announcements, including new open-source integrations, more AI capabilities and product-development partnerships with large consulting firms like Accenture. Enhancements will assist large companies in areas such as data migration, analytics and cross-compatibility with competitors Amazon Web Services and Microsoft Azure.

Among the offerings is a vertical-specific suite, Google Cloud for Retailers, that will help retailers tap analytics and AI to predict their future inventory needs, recommend products for their customers and assist those customers in locating items they want to buy.

Within that suite is Vision Product Search, which uses Cloud Vision technology. Someone can take a photo or screenshot of a pair of pants they fancy, for example, and the tool will return search results with similar items from the retailer’s inventory.

“We’re able to help if a user likes a specific product; it finds ones that are similar, either in function or in style,” said Andrew Moore, head of Google Cloud Artificial Intelligence. “We provide tools that make the experience on the retailer’s website more immersive and useful for the user.”

Ikea is among those using the product. “We’re working with Google Cloud to create a new mobile experience that enables customers, wherever they are, to take photos of home furnishing and household items and quickly find that product or similar in our online catalogue,” Susan Standiford, chief technology officer at IKEA Group, said in a Google blog post.

Google’s Recommendations AI powers the new Product Recommendations tool, which suggests complementary products as customers browse a retailer’s website. Meanwhile, Real Time Inventory Management and Analytics helps retailers boost the in-store experience so that customers don’t end up empty-handed, costing retailers the sale.

“We’re using sales data regionalized over years or months, depending on what we have, to make a much more accurate prediction of what stocks they should have, in which parts of the country and when, so they are more accurate and have less wastage,” Moore said.

Google has tapped its vast partner network to develop additional tools for retailers. For example, Accenture’s Hyper-Personalization product helps retailers transform data into business insights they can use to boost customer response rates and lifetime value. Google Cloud and Accenture teamed up last year to launch the Accenture Google Cloud Business Group.

Tableau can help retailers quickly collect and analyze their data, while Publicis Sapient assists retailers with addressing data silos to connect and take action on the data points along the customer journey.

That goal is in line with other Google product announcements that improve speed and simplify data migration to Google Cloud. Its BigQuery Data Transfer Service, for example, which can automatically ingest data from SaaS apps to BigQuery, the company’s cloud-scale data warehousing solution, expanded support to more than 100 enterprise apps, including Salesforce and Marketo.


Sourced from adexchanger


As Viacom continues to expand its direct-to-consumer streaming video strategy, the company is turning to Google Cloud to enhance its content discovery capabilities.

Viacom, like many of its media competitors, is creating more content every year, increasingly for a global audience that may consume it on different platforms.

The company is using Google Cloud for its “Intelligent Content Discovery Platform.” The platform uses machine learning to automatically pull short clips from content as soon as it is ingested into the platform. It can highlight new content or segments, or identify commercials to optimize the viewing experience.

The company is using Google Cloud for automated content tagging, discovery and intelligence. Encompassing some 65 petabytes of content, the deal allows Viacom’s own teams to easily locate and understand the context of the content, while improving the efficiency of distribution strategies to consumers.

Viacom is also embracing Google’s Customer Reliability Engineering (CRE) strategy, collaborating closely with Google engineers to develop and maintain applications.

The ability of Google Cloud’s solutions to scale across platforms and around the globe is seen as an asset.

While companies like Disney and WarnerMedia are betting on their own premium direct-to-consumer offerings, Viacom has taken a multi-pronged approach to digital content.

The company has created and launched a number of digital studios to produce original content for streaming services, like Netflix and Hulu, as well as social-media platforms, such as Facebook and Snapchat, an Viacom’s own digital platforms.

At the same time, the company acquired Pluto TV to have a free, ad-supported streaming offering for consumers. Pluto TV has a rotating library of free programming, including from Viacom’s own stable of channels.


Sourced from MediaPost