Tag

generative AI

Browsing

By  and 

Generative AI is starting to change shopping. Instead of scrolling on websites or strolling through stores, people are beginning to prompt AI agents to find, compare, and even purchase products. Ask for something like a handmade gift under $100, a pair of vintage jeans from the 1970s, or a digital camera for a teenager, and watch a list of curated options appear in the chat. It’s fast and frictionless. But it’s also early days. And just as companies had to adapt to the new rules of e-commerce, they’re now faced with a new set of challenges around how they manage their reputations, connect with customers, and what it looks like to compete in this new paradigm.

Categories like beauty, lifestyle, and apparel are moving fastest, and early adopters are already experimenting. But if things go wrong, the consequences could be both immediate and lasting. For consumer-facing brands, there are five core risks that could break consumer trust as AI agents begin to shop on customers’ behalf:

  1. Agents misunderstand products and make the wrong choice. When product attributes aren’t structured for machines, AI agents guess. They can misinterpret sizing, miss constraints, hallucinate features, or recommend items that are not aligned with the customer’s intent.
  2. Agents act beyond what customers expected or authorized. Without clear delegation boundaries, agents can overspend, ignore constraints, or make irreversible decisions without confirmation.
  3. Sensitive conversational data becomes a liability. Agentic shopping captures more than transactions. It captures intent, emotion, and context. If that data is stored opaquely, reused unexpectedly, or exposed through a breach, customers can feel surveilled rather than served.
  4. Brands lose control of how they’re represented. In agent ecosystems, outdated prices, inaccurate information, or undisclosed sponsored placements can reach customers before marketing or legal teams ever see them.
  5. When something breaks, there’s no clear way back. In automated journeys, failures feel colder and harder to resolve. If customers can’t understand what went wrong, reach a human, or be made whole quickly, a single bad interaction can permanently sever the relationship.

Left unaddressed, these issues don’t just frustrate customers. They create real operational and financial impact: chargebacks, returns, and customer support costs; privacy violations that trigger regulatory scrutiny or lawsuits; and reputational damage that erodes loyalty and slows adoption.

Much of this comes down to trust. To drive agentic commerce adoption at scale, brands need to figure out how to earn—and keep—customers’ trust. And to do that, they need to understand what can go wrong and the steps they can take now to prevent trust from being broken.

The Trust Gap is Measurable

According to PwC’s 2025 Future of Consumer Shopping Survey, 64% of respondents said they need at least one safeguard, like a money-back guarantee, to feel comfortable letting an AI agent purchase for them. Even Gen Z and Gen Alpha, the most digitally native demographics, express caution alongside curiosity. Fundamental questions remain unanswered: Who has access to payment information? Who can authorize purchases? How is personal data stored and shared? Whose interests does the agent represent: the consumer’s, the tech platform’s, or the advertiser’s?

The challenge for brands in retail, consumer goods, and travel is both clear and urgent: How do you prepare for agentic commerce when the rules are still being written? You can’t fully control whether consumers adopt these tools. But you do have control over how your brand shows up in agent-driven experiences, and whether customers feel protected when they delegate decisions to AI.

Building the Trust Layer

We’ve seen this pattern before. In the early days of e-commerce, consumers were wary of entering credit card information on websites. But SSL encryption, PCI standards, and fraud protection transformed scepticism into confidence and unlocked mass adoption.

Agentic commerce needs its own trust infrastructure—what we call the trust layer. While trust can feel like an abstract concept, it breaks in specific, predictable ways: when agents misunderstand products, act beyond what customers expect, mishandle sensitive data, misrepresent brands, or leave consumers stranded when something goes wrong.

Addressing those risks requires concrete changes to how product data is structured, how delegation and consent are enforced, how data is protected, how brand presence is monitored in agent ecosystems, and how relationships are preserved when automation fails.

We recommend companies take five actions now to build that trust layer.

1. Structure your content for machines, not just humans.

To trust an AI agent, customers need it to return accurate and relevant information every time. This isn’t possible unless the agent can correctly understand the product and its features.

AI agents don’t browse visually or interpret nuance the way humans do. They digest text and numbers. That means product discoverability in agent-driven shopping depends less on branding or traditional search engine optimization (SEO) and more on machine-readable product data, an approach often referred to as generative engine optimization (GEO). Pricing, sizing, availability, materials, use cases, and constraints need to be expressed in formats agents can reliably parse and compare.

Consider two descriptions of the same hoodie:

  • “This sweatshirt is perfect for cozy fall nights.”
  • Material: fleece; temperature range: < 40°F; category: loungewear; fit: relaxed

While the first is written to evoke a specific vision in a customer, the second is optimized for an AI agent. To scale agentic commerce, companies may need to speak to both humans and agents, and be sure that they’re translating terms that customers naturally use—“lightweight,” “sustainable,” or “good for travel”—into an agent-focused product catalogue that maps those terms onto specific attributes.

Brands also may need to make sure that this information is accessible. While humans click from page to page and scan prose descriptions, descriptions for agents should be captured in machine-readable formats in your existing product information management systems and ecommerce platforms. They should also be formatted so agents can access them through APIs or web markup standards. Return policies, shipping info, and FAQs should similarly be modular and labelled. With information formatted and organized in the right way, agents can translate customer requests into precise matches.

2. Define clear boundaries and build in consent.

Consumers won’t delegate purchasing decisions to AI agents unless they understand, clearly and upfront, what those agents are allowed to do. This requires explicit delegation boundaries and consent that is embedded into the experience, not buried in terms and conditions. Safe delegation requires three things: clear limits, traceability, and reversibility. Every agent action should be attributable to a specific authorization, under defined conditions, with a clear way to undo or dispute the outcome.

In their own channels—the company website, app, or branded agent—brands can set spending caps, require approval for purchases over certain amounts, and build in confirmation steps before checkout. For example, a retailer could program its agent to surface return policies before a final purchase, or to pause and ask for confirmation if a recommendation falls outside a user’s budget.

When consumers use general-purpose AI platforms like ChatGPT, Claude, Google’s Gemini, or others to shop across multiple retailers, the brands’ direct control is limited. But they can still influence the experience by ensuring product data is accurate and structured (see action #1). While it may be technically possible to support safeguards like confirmation prompts or return-policy disclosures within these platforms, doing so requires collaboration between brands and platform providers. In the meantime, brands can still influence outcomes by ensuring their product data is accurate, structured, and complete.

Industry efforts—such as Google’s Universal Commerce ProtocolStripe and OpenAI’s Agentic Commerce Protocol, and Anthropic’s new constitution for Claude—point toward standardized ways to express what agents may do, when they must ask, and how consent is enforced. As agentic commerce moves from experimentation to scale, brands that treat delegation as an essential design problem will be the ones consumers trust.

3. Protect customer data and make that protection visible.

When consumers delegate tasks to AI agents, they share more than payment details. They share conversational context: preferences, constraints, intent, and often emotion. That context is what makes agentic shopping powerful, and what makes it uniquely sensitive. If customers don’t understand how that data is used, remembered, or protected, they won’t delegate in the first place.

As brands launch their own AI agents to help customers shop for products, they should embed privacy-preserving design directly into agentic interactions. For example, brands can use data minimization and anonymization techniques, so their agents retain only what is necessary to complete a task. Sensitive conversational signals can be processed transiently rather than stored indefinitely. Consent should be explicit and configurable, with clear choices about what is remembered, what is shared across sessions or platforms, and what is not.

Visibility matters as much as protection. Consumers should be able to see—and change—their privacy posture in real time. Some interactions may warrant persistence, such as remembering a preferred size or brand. Others may not. An “incognito” or one-time shopping mode, where interactions are not retained or used for future recommendations, gives customers a sense of control that mirrors how people already manage privacy in browsers and payments.

4. Observe how your brand shows up in agent ecosystems.

In agentic commerce, AI platforms may become the first (and sometimes only) interface between your brand and a customer. When that happens, trust depends on what the platform’s agent says on your behalf. If an agent surfaces outdated pricing, invents product features, omits critical context, or cites unreliable sources, customers don’t see a system error. They see a brand failure.

That’s why brands need agentic observability: the ability to monitor, in real time, how AI agents describe their products, which sources they rely on, how recommendations are framed, and what actions are being taken downstream. This requires ongoing visibility into prompts, responses, citations, and decision logic across the agent ecosystems where customers are shopping.

Without observability, brands lose the ability to detect misrepresentation, correct errors, or understand why a product was or wasn’t recommended. As agents increasingly act as intermediaries, monitoring how your brand shows up is no longer optional.

5. Preserve relationships and plan for recovery.

Even when agents handle transactions, brands still own the relationship. And as shopping becomes more automated, brands should embed branded agents in third-party platforms, extend loyalty programs through agents, and design seamless escalation paths to reach a human when needed.

When things break, and they will, the response matters more than the failure. Recovery mechanisms should be built in from the start: real-time alerts, clear escalation paths, and explain ability. Some brands are already simulating agentic shopping journeys with synthetic customers to stress-test before launch. Trust is built through accountability, transparency, and making customers whole when errors occur.

Trust as Strategy, Not Compliance

AI-driven shopping will scale when consumers feel secure. That requires systems that are well-governed, transparent, and aligned with human expectations. The brands that lead won’t treat trust as a compliance exercise. They’ll treat it as a core part of their commerce strategy—building the technical standards, business practices, and consumer protections that make delegation safe. Those who act now will help define the rules of this emerging ecosystem.

Feature image credit: KKGAS/Stocksy

By , ,  and 

Ali Furman is the consumer markets industry leader at PwC and an M&A partner. She writes and speaks widely on consumer markets trends and the future of business. She has been featured in many outlets including ABC, CBS, CNBC, Forbes, Vogue Business, and Bloomberg.
Ege Gürdeniz is an AI trust leader and technology risk expert at PwC. He advises companies on how to build trust, safety, and governance into AI-driven products, platforms, and business models.
Rima Safari leads data, analytics, and AI for PwC US and serves as the firm’s strategic alliance leader with OpenAI. She writes and speaks widely on AI strategy, agentic systems, and data readiness required for scaling AI, and her perspectives have been featured across leading business and technology forums.
Remzi Ural is the AI leader for consumer markets within PwC. He has been recognized as a thought leader for AI strategy definition and adoption, particularly with retail and consumer packaged goods clients, driving business outcomes and standing up modern AI capabilities.

Sourced from Harvard Business Review

By Dr. Gleb Tsipursky

Leaders who replace blame with post-mortems and psychological safety are seeing stronger AI adoption, as teams experiment more freely and turn failed pilots into long-term business gains.

Generative AI rewards those who embrace constant iteration. Instead of fearing errors, treat them as essential data. Every strange output reveals how the system actually thinks, providing the edge you need to master the tool.

AI offers the rocket fuel that propels innovation forward and enables organizations and teams to overcome challenges and manage risks. This is especially true in a field as unpredictable and transformative as Gen AI. When we talk about innovation, we must acknowledge that failure is not the opposite of success, but a crucial part of it.

Gen AI solutions, by their nature, demand iteration, testing, and refinement. Not every experiment will hit the mark immediately, if at all.

De-Stigmatizing Failure in Gen AI Strategy

The traditional corporate landscape often views failure through a punitive lens. This leads to fear and risk-averse behaviour. Employees who experience setbacks might worry about career repercussions, public embarrassment, or losing credibility.

This mindset is a death knell for innovation, suffocating the exploratory nature of Gen AI work, where trial and error are not just common, but essential.

Research by McKinsey shows that companies cultivating a culture of innovation and embracing failure greatly outperform their peers in implementing technology, with 21% of weak innovators succeeding in digital transformations compared to 45% of strong innovators. This underscores the undeniable link between embracing failure and achieving tangible business success.

So, how do we dismantle this culture of fear? We need a seismic shift in how we perceive failure, starting at the top.

Leaders must actively cultivate an environment where calculated risk-taking is not just tolerated, but celebrated. Employees need to know that their careers won’t be derailed by experiments that don’t pan out. Instead, the focus should be on the insights gained from every experiment, regardless of the outcome. Each “failed” project is a treasure trove of data.

Consider a recent engagement where I consulted for a mid-sized regional retail chain struggling to personalize its marketing efforts. This company, with around 500 employees and $200 million in annual revenue, was eager to leverage Gen AI to improve customer engagement.

Initially, they were hesitant. The leadership team was concerned about the potential for wasted resources and the stigma of failed projects.

We began by implementing a small-scale pilot project using Gen AI to tailor email marketing campaigns. The first few attempts fell short of expectations. The personalized content didn’t resonate as anticipated, and click-through rates remained stagnant at a measly 2.5%.

However, instead of viewing this as a failure, we treated it as a learning opportunity. We conducted a thorough analysis and discovered that the initial customer segmentation model was too broad, resulting in generic messaging that didn’t appeal to specific customer interests.

We also found that the tone of the AI-generated content didn’t align with the brand’s voice, with a formality score 15 points higher than their usual communications.

The Power of Post-Mortem Analysis for Gen AI Strategy

When an experiment doesn’t go as planned, the knee-jerk reaction might be to find someone to blame. This is counterproductive and stifles learning. A constructive approach involves a detailed post-mortem analysis.

What went wrong? Why did certain methods fail? How can we adjust our approach in the future? These questions are not about assigning blame, but about extracting knowledge.

We’re not looking for scapegoats; we’re searching for understanding. Were there gaps in the data or model training? Did we misalign the Gen AI tool with the business problem we were trying to solve?

Systematically answering these questions creates a roadmap for future success. This analysis also helps build institutional knowledge, ensuring that the entire organization benefits from individual teams’ learnings.

In the case of the retail chain, the post-mortem analysis of the initial Gen AI marketing campaign revealed critical insights. We refined the customer segmentation model, focusing on more granular data points like purchase history, browsing behaviour, and demographic information, increasing the number of segments from 10 to 25.

We also fine-tuned the Gen AI model to generate content that better reflected the brand’s personality, adjusting the formality score down by 15 points to match their existing brand voice.

The subsequent campaigns, informed by these learnings, showed significant improvement. Within three months, the retailer saw a 25% increase in click-through rates, rising from 2.5% to 3.125%, and a 15% rise in conversion rates, jumping from 1% to 1.15% from their email marketing efforts. They also received a 10% increase in positive customer feedback regarding email content relevance.

This translated to a noticeable uptick in sales directly attributed to the Gen AI-driven campaigns, with an eventual 8% increase in sales from email marketing.

This experience underscored the importance of embracing failure as a learning opportunity. By openly analysing what went wrong and adjusting our approach, we were able to unlock the true potential of Gen AI for this organization.

It’s worth noting that the organization saved an estimated $50,000 in marketing costs within six months by switching from broad marketing campaigns to more targeted Gen AI driven campaigns. And that was the first project of many, which overall improved their bottom line by over $300,000 in a year. Such a case study clearly illustrates how real businesses gain real, financially-relevant benefits from applying the approach of viewing failure as a learning opportunity when implementing Gen AI.

Building a Gen AI Strategy of Shared Learning and Resilience

An open and transparent approach to failure helps facilitate shared learning. When failures are openly discussed and analysed, it allows teams to learn from one another’s mistakes, accelerating the organization’s overall learning curve.

Instead of burying failed experiments, organizations should create forums where teams can present their findings, both successful and unsuccessful, to the broader group. This practice democratizes the learning process and reduces the likelihood of repeated mistakes, while simultaneously creating trust and openness.

Leaders can also encourage peer support networks, where employees involved in different Gen AI initiatives can offer advice and share lessons learned from their own successes and failures. This creates a communal learning environment, where the responsibility for Gen AI success is shared, rather than resting solely on individual teams.

These forums also allow for cross-functional collaboration, where failures in one department can provide insights that benefit another. This cross-pollination of ideas can lead to new approaches and methods for leveraging Gen AI that would not have emerged if failures were hidden or minimized. Moreover, organizations can take a proactive approach by building controlled environments where risk-taking is encouraged and the consequences of failure are minimized.

Innovation sandboxes — safe, controlled spaces for testing new technologies and processes — allow teams to experiment with Gen AI without the fear of disrupting core business operations. Such environments encourage risk-taking because the potential downsides are contained, allowing teams to focus on learning and improving rather than avoiding mistakes.

Creating a psychologically safe environment is paramount. This means a workplace where employees feel free to take risks, voice their ideas, and engage in creative problem-solving without fear of retribution if things don’t go as planned. This sense of safety is essential for encouraging experimentation, particularly in the context of Gen AI, where uncertainty is high.

A lack of psychological safety leads to a “play-it-safe” mentality, where employees only propose ideas they are confident will succeed. This limits the organization’s capacity to push boundaries and innovate. In contrast, when employees know that failure will be met with support rather than blame, they are more likely to take bold steps.

Leaders can foster this environment by publicly acknowledging the efforts of teams who take risks, regardless of the outcome, and by consistently framing failures as opportunities for growth.

An article by Forbes highlights the importance of psychological safety in driving innovation. It emphasizes how leaders can create a culture where employees feel empowered to take risks. Additionally, a study by Google, discussed on their re:Work platform, found that psychological safety was the most important factor in team effectiveness.

Failing to Gen AI Success

Ultimately, creating a culture where failure is viewed as a natural part of innovation enables the organization to remain agile and responsive. In a field as dynamic and quickly progressing as Gen AI, staying ahead requires continuous learning, which can only happen when employees feel empowered to experiment, fail, and try again.

Organizations that embrace failure as part of the process will not only see greater innovation but will also build a more resilient and adaptive workforce, capable of navigating the complexities of AI adoption with confidence and creativity.

Failure, when approached with the right mindset, is not an ending but a beginning. It’s the secret sauce that fuels the engine of innovation, driving us toward a future where Gen AI transforms our businesses and our world.

By Dr. Gleb Tsipursky

Dr. Gleb Tsipursky, called the “Office Whisperer” by The New York Times, helps tech-forward leaders replace overpriced vendors with staff-built AI solutions. He serves as the CEO of the future-of-work consultancy Disaster Avoidance Experts. Dr. Gleb wrote seven best-selling books, and his forthcoming book with Georgetown University Press is The Psychology of Generative AI Adoption (2026). Prior to that, he wrote ChatGPT for Leaders and Content Creators (2023). His cutting-edge thought leadership was featured in over 650 articles in prominent venues such as Harvard Business ReviewFortune, and Fast Company. His expertise comes from over 20 years of consulting for Fortune 500 companies from Aflac to Xerox and over 15 years in academia as a behavioural scientist at UNC-Chapel Hill and Ohio State. A proud Ukrainian American, Dr. Gleb lives in Columbus, Ohio

Sourced from Future of Work

By Jasmine Sheena.

Billion Dollar Boy, Gut, and Mischief are focused on ensuring that work powered by the tech retains a human touch.

It’s been over a year since ChatGPT first rolled out, and while constantly hearing the phrase “generative AI” has been really a(i)nnoying, there’s no doubt the technology has transformed the world. It was one of the hottest topics at CES earlier this year, and SXSW has a dedicated track for the tech.

When something’s trendy, marketers tend to take notice, and we spoke to execs at several agencies about how they have taken ChatGPT and other generative AI tools into their own hands. They told Marketing Brew that, so far, adland has found unique ways to incorporate generative AI into workflows while working to ensure there is still a human touch, all while tech giants and the federal government alike weigh potential restrictions on the tech.

Lead by example

For independent shop Billion Dollar Boy, generative AI has been useful in influencer marketing. The agency set up Muse, an emerging tech arm to help leverage AI for influencer content creation for clients, Thomas Walters, Billion Dollar Boy’s founder and its European CEO, told us. Muse, which has worked with AI artists like Jo Ann and Elmo Mistiaen on brand campaigns, has also worked with brands including Lipton Iced Tea and Versace, Walters said.

“[It’s] really at the bleeding edge of advertising,” he said.

Internally, the agency is interrogating ways to use AI to optimize work, Walters said. BDB set up a taskforce made up of folks across its departments, from leadership to business affairs, to identify workflow problems and figure out how to solve them using AI tools, Walters said. For example, after realizing the agency’s staff was spending a lot of time manually performing due diligence checks on influencers, the agency created a tool it built using ChatGPT that evaluates influencers’ posts and applies a “risk rating.”

Feature Image Credit: Amelia Kinsinger

By Jasmine Sheena

Sourced from Marketing Brew

Yeah, I’m not sure about Google’s various names for its generative AI products.

To clarify:

  • Bard is Google’s generative AI chatbot, much like ChatGPT
  • Gemini is Google’s large language model (LLM) group, like GPT
  • Imagen is Google’s AI image generation system

All clear?

Okay, then this paragraph from Google should now make more sense.

Last December, we brought Gemini Pro into Bard in English, giving Bard more advanced understanding, reasoning, summarizing and coding abilities. Today Gemini Pro in Bard will be available in over 40 languages and more than 230 countries and territories, so more people can collaborate with this faster, more capable version of Bard.”

I’m guessing that for most people, without the preceding context, the above explanation would have been somewhat bewildering, but basically, Google’s now making its Bard chatbot more powerful, with advanced AI models powering its responses, while it’s also adding image generation capability within Bard itself, powered by Imagen.

Google Imagen 2 in Bard

Google has taken a cautious approach with generative AI development, and has criticized others for pushing too hard, too fast, with their generative AI tools. Some have viewed this as anti-competitive bias, and Google simply protecting its turf, as more people turn to tools like ChatGPT for search queries. But Google’s view is that generative AI needs to be deployed slowly in order to mitigate misuse, which has already led to various issues in a regulatory sense.

But today, Google‘s taking the next steps with several of its generative AI tools, with Bard, as noted, getting improved system thinking and image creation, Google Maps now getting new conversational queries, powered by AI, to facilitate place discovery, and Imagen 2, the next stage of its visual creation system, also being rolled out within its image-generation tools.

Google Imagen 2

As explained by Google:

Imagen 2 has been trained on higher-quality, image-description pairings and generates more detailed images that are better aligned with the semantics of people’s language prompts. It’s more accurate than our previous system at processing details, and it’s more capable at capturing nuance – delivering more photorealistic images across a range of styles and use cases.”

That’ll provide more opportunity to create better visuals within Google’s systems, which will also be created with various safeguards in place, in order to limit “problematic outputs like violent, offensive, or sexually explicit content”.

“All images generated with Imagen 2 in our consumer products will be marked by SynthID, a tool developed by Google DeepMind, that adds a digital watermark directly into the pixels of images we generate. SynthID watermarks are imperceptible to the human eye but detectable for identification.”

Given the recent controversy surrounding AI generated images of Taylor Swift, this is an important measure, and is one of several concerns that Google has repeatedly raised in the rapid rollout of AI tools, that we don’t yet have all the systems and processes in place to fully protect against this kind of misuse.

Sourced from SocialMediaToday

By Alon Goren

At this point, most enterprises are dabbling in generative AI or planning to leverage the technology soon.

According to an October 2023 Gartner, Inc. survey, 45% of organizations are currently piloting generative AI, while 10% have deployed it in full production. Companies are eager to move from pilot to production and start seeing some real business results.

However, enterprises getting started with generative AI often run into a common stumbling block right out of the gate: They suffer analysis paralysis before they can even begin using the technology. There are tons of generative AI tools available today, both broad and highly specialized. Moreover, these tools can be leveraged for all sorts of professions and business purposes—sales, product development, finance, etc.

With so many choices and possibilities, enterprises often get stuck in the planning phase—debating where they should deploy generative AI first. Every business unit (and all of the business’s key stakeholders) wants to own a part of the company’s generative AI initiatives.

Things can get messy. To stay on track, businesses should follow these guidelines when experimenting with generative AI.

Focus On Specific Use Cases With Measurable Goals

Enterprises need to recognize that every part of the organization can benefit from generative AI—eventually. To get there, however, they need to get off the ground with a pilot project.

How do you decide where to get started? Keep it simple and identify a small, specific problem that exists today that can be improved with generative AI. Be practical. Choose an issue that’s been challenging the business for a while, has been difficult to fix in the past and will make a visibly positive impact once resolved. Next, enterprises need to agree upon metrics and goals. The problem can’t be too nebulous or vague; the impact of AI (success or failure) has to be easily measurable.

With that in mind, the pilot project should have a contained scope. The purpose is to demonstrate the real-world value of the technology, build support for it across the organization and then broaden adoption from there.

If organizations try to leverage AI in too many different ways and solve multiple problems, it’ll cause the scope to grow out of control and make it impossible to complete the pilot within a reasonable timeframe. Ambition has to be balanced with practicality. Launching a massive pilot project that requires extensive resources and long timelines is a recipe for failure.

What’s a good timeline for the pilot? It depends on the circumstances, of course. Generally speaking, however, it should only take a few weeks or a couple of months to execute, not multiple quarters or an entire year.

Start small, get something functional quickly and then iterate on it. This iterative approach allows for continuous learning and improvement, which is essential given the nascent state of generative AI technology.

Organizations must also be sure to keep humans in the loop from the very beginning of the experimentation phase. The rise of AI doesn’t render human expertise obsolete; it amplifies it. As productivity and business benefits increase with generative AI, human employees become even more valuable as supervisors and validators of AI output. This is essential for maintaining control and building trust in AI. In addition, the pool of early participants will also help champion the technology throughout the organization once the enterprise is ready to deploy it widely.

Finally, once the project has begun, organizations have to stick with it until it’s complete. Don’t waste time starting over or shifting to other use cases prematurely. Just get going and stay the course. After that’s been completed successfully, companies can expand their use of generative AI more broadly across the organization.

Choosing The Right Technology

The other major component of the experimentation phase is selecting the right vendor. With the generative AI market booming, it can seem impossible to tell the differences between one solution and another. Lots of noisy marketing only makes things more confusing.

The best way to cut through the noise is to identify the requirements that are most important to the organization (e.g., data security, governance, scalability, compatibility with existing infrastructure) and look for the vendor that best meets those needs.

It’s extremely important to understand where vendors stand on each of these things early on to avoid the headache of discovering that they don’t really check those boxes later. The only way to do that is by talking to the vendor (especially its sales engineering team) and seeing these capabilities demoed first-hand.

Get Ahead Of The Competition With A Strong Start

Within the next couple of years, I expect almost every enterprise will employ generative AI in production. Those wielding it effectively will get a leg up on their competition, while those struggling will be at risk of falling behind. Though the road may be uncharted, enterprises can succeed by focusing on contained, valuable projects, leveraging human expertise and selecting strategic technology partners.

Don’t wait. Embrace this unique opportunity to innovate and take that crucial first step now.

Feature Image Credit: GETTY

By Alon Goren

Follow me on LinkedIn. Check out my website.

CEO and Cofounder of AnswerRocket. Read Alon Goren’s full executive profile here.

Sourced from Forbes

By Chad S. White

Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

The Gist

  • AI disruption. Generative AI is set to disrupt SEO significantly.
  • Content shielding. Brands need strategies to protect their content from AI.
  • Direct relationships. Building strong direct relationships is key.

Do your customers trust your brand more than ChatGPT?

The answer to that question will determine which brands truly have credibility and authority in the years ahead and which do not.

Those who are more trustworthy than generative AI engines will:

  1. Be destinations for answer-seekers, generating strong direct traffic to their websites and robust app usage.
  2. Be able to build large first-party audiences via email, SMS, push and other channels.

Both of those will be critical for any brand wanting to insulate themselves from the search engine optimization (SEO) traffic loss that will be caused by generative AI.

The Threat to SEO

Despite racking up 100 million users just two months after launching — an all-time record — ChatGPT doesn’t appear to be having a noticeable impact on the many billions of searches that happen every day yet. However, it’s not hard to imagine it and other large language models (LLMs) taking a sizable bite out of search market share as they improve and become more reliable.

And improve they will. After all, Microsoft, Google and others are investing tens of billions of dollars into generative AI engines. Long dominating the search engine market, Google in particular is keenly aware of the enormous risk to its business, which is why it declared a Code Red and marshalled all available resources into AI development.

If you accept that generative AI will improve significantly over the next few years — and probably dramatically by the end of the decade — and therefore consumers will inevitability get more answers to their questions through zero-click engagements, which are already sizable, then it begs the question:

What should brands consider doing to maintain brand visibility and authority, as well as avoid losing value on the investments they’ve made in content?

Protective Measures From Negative Generative AI Effects

Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

1. Shielding Content From Generative AI Training

Major legal battles will be fought in the years ahead to clarify what rights copyright holders have in this new age and what still constitutes Fair Use. Content and social media platforms are likely to try to redefine the copyright landscape in their favor, amending their user agreements to give themselves more rights over the content that’s shared on their platforms.

A white robot hand holds a gavel above a sound block sitting on a wooden table.
Andrey Popov on Adobe Stock Photo

You can already see the split in how companies are deciding to proceed. For example, while Getty Images’ is suing Stable Diffusion over copyright violations in training its AI, Shutterstock is instead partnering with OpenAI, having decided that it has the right to sell its contributors’ content as training material to AI engines. Although Shutterstock says it doesn’t need to compensate its contributors, it has created a contributors fund to pay those whose works are used most by AI engines. It is also giving contributors the ability to opt out of having their content used as AI training material.

Since Google was permitted to scan and share copyrighted books without compensating authors, it’s entirely reasonable to assume that generative AI will also be allowed to use copyrighted works without agreements or compensation of copyright holders. So, content providers shouldn’t expect the law to protect them.

Given all of that, brands can protect themselves by:

  • Gating more of their web content, whether that’s behind paywalls, account logins or lead generation forms. Although there are disputes, both search and AI engines shouldn’t be crawling behind paywalls.
  • Releasing some content in password-protected PDFs. While web-hosted PDFs are crawlable, password-protected ones are not. Because consumers aren’t used to frequently encountering password-protected PDFs, some education would be necessary. Moreover, this approach would be most appropriate for your highest-value content.
  • Distributing more content via subscriber-exclusive channels, including email, push and print. Inboxes are considered privacy spaces, so crawling this content is already a no-no. While print publications like books have been scanned in the past by Google and others, smaller publications would likely be safe from scanning efforts.

In addition to those, hopefully brands will gain a noindex equivalent to tell companies not to train their large language models (LLMs) and other AI tools on the content of their webpages.

Of course, while shielding their content from external generative AI engines, brands could also deploy generative AI within their own sites as a way to help visitors and customers find the information they’re looking for. For most brands, this would be a welcome augmentation to their site search functionality.

2. Building Stronger Direct Relationships

While shielding your content is the defensive play, building your first-party audiences is the offensive play. Put another way, now that you’ve kept your valuable content out of the hands of generative AI engines, you need to get it into the hands of your target audience.

You do that by building out your subscription-based channels like email and push. On your email signup forms, highlight the exclusive nature of the content you’ll be sharing. If you’re going to be personalizing the content that you send, highlight that, too.

Brands have the opportunity to both turn their emails into personalized homepages for their subscribers, as well as to turn their subscribers’ inboxes into personalized search engines.

Email Marketing Reinvents Itself Again

Brands already have urgent reasons to build out their first-party audiences. One is the sunsetting of third-party cookies and the need for more customer data. Email marketing and loyalty programs, in particular, along with SMS, are great at collecting both zero-party data through preference centers and progressive profiling, as well as first-party data through channel engagement data.

Another is the increasingly evident dangers of building on the “rented land” of social media. For example, Facebook is slowly declining, Twitter has cut 80% of its staff to avoid bankruptcy as its value plunges, and TikTok faces growing bans around the world. Some are even claiming we’re witnessing the beginning of the end of the age of social media. I wouldn’t go that far, but brands certainly have lots of reasons to focus more on those channels they have much more control over, including the web, loyalty, SMS, and, of course, email.

So, the disruption of search engine optimization by generative AI is just providing another compelling reason to invest more into email programs, or to acquire them. It’s hard not to see this as just another case of email marketing reinventing itself and making itself more relevant to brands yet again.

Feature Image Credit: Andrey Popov on Adobe Stock Photo

By Chad S. White

Chad S. White is the author of four editions of Email Marketing Rules and Head of Research for Oracle Marketing Consulting, a global full-service digital marketing agency inside of Oracle. Connect with Chad S. White:  

Sourced from CMSWIRE

Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

The Gist

  • AI disruption. Generative AI is set to disrupt SEO significantly.
  • Content shielding. Brands need strategies to protect their content from AI.
  • Direct relationships. Building strong direct relationships is key.

Do your customers trust your brand more than ChatGPT?

The answer to that question will determine which brands truly have credibility and authority in the years ahead and which do not.

Those who are more trustworthy than generative AI engines will:

  1. Be destinations for answer-seekers, generating strong direct traffic to their websites and robust app usage.
  2. Be able to build large first-party audiences via email, SMS, push and other channels.

Both of those will be critical for any brand wanting to insulate themselves from the search engine optimization (SEO) traffic loss that will be caused by generative AI.

The Threat to SEO

Despite racking up 100 million users just two months after launching — an all-time record — ChatGPT doesn’t appear to be having a noticeable impact on the many billions of searches that happen every day yet. However, it’s not hard to imagine it and other large language models (LLMs) taking a sizable bite out of search market share as they improve and become more reliable.

And improve they will. After all, Microsoft, Google and others are investing tens of billions of dollars into generative AI engines. Long dominating the search engine market, Google in particular is keenly aware of the enormous risk to its business, which is why it declared a Code Red and marshalled all available resources into AI development.

If you accept that generative AI will improve significantly over the next few years — and probably dramatically by the end of the decade — and therefore consumers will inevitability get more answers to their questions through zero-click engagements, which are already sizable, then it begs the question:

What should brands consider doing to maintain brand visibility and authority, as well as avoid losing value on the investments they’ve made in content?

Protective Measures From Negative Generative AI Effects

Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

1. Shielding Content From Generative AI Training

Major legal battles will be fought in the years ahead to clarify what rights copyright holders have in this new age and what still constitutes Fair Use. Content and social media platforms are likely to try to redefine the copyright landscape in their favour, amending their user agreements to give themselves more rights over the content that’s shared on their platforms.

A white robot hand holds a gavel above a sound block sitting on a wooden table.
Andrey Popov on Adobe Stock Photo

You can already see the split in how companies are deciding to proceed. For example, while Getty Images’ is suing Stable Diffusion over copyright violations in training its AI, Shutterstock is instead partnering with OpenAI, having decided that it has the right to sell its contributors’ content as training material to AI engines. Although Shutterstock says it doesn’t need to compensate its contributors, it has created a contributors fund to pay those whose works are used most by AI engines. It is also giving contributors the ability to opt out of having their content used as AI training material.

Since Google was permitted to scan and share copyrighted books without compensating authors, it’s entirely reasonable to assume that generative AI will also be allowed to use copyrighted works without agreements or compensation of copyright holders. So, content providers shouldn’t expect the law to protect them.

Given all of that, brands can protect themselves by:

  • Gating more of their web content, whether that’s behind paywalls, account logins or lead generation forms. Although there are disputes, both search and AI engines shouldn’t be crawling behind paywalls.
  • Releasing some content in password-protected PDFs. While web-hosted PDFs are crawlable, password-protected ones are not. Because consumers aren’t used to frequently encountering password-protected PDFs, some education would be necessary. Moreover, this approach would be most appropriate for your highest-value content.
  • Distributing more content via subscriber-exclusive channels, including email, push and print. Inboxes are considered privacy spaces, so crawling this content is already a no-no. While print publications like books have been scanned in the past by Google and others, smaller publications would likely be safe from scanning efforts.

In addition to those, hopefully brands will gain a noindex equivalent to tell companies not to train their large language models (LLMs) and other AI tools on the content of their webpages.

Of course, while shielding their content from external generative AI engines, brands could also deploy generative AI within their own sites as a way to help visitors and customers find the information they’re looking for. For most brands, this would be a welcome augmentation to their site search functionality.

2. Building Stronger Direct Relationships

While shielding your content is the defensive play, building your first-party audiences is the offensive play. Put another way, now that you’ve kept your valuable content out of the hands of generative AI engines, you need to get it into the hands of your target audience.

You do that by building out your subscription-based channels like email and push. On your email signup forms, highlight the exclusive nature of the content you’ll be sharing. If you’re going to be personalizing the content that you send, highlight that, too.

Brands have the opportunity to both turn their emails into personalized homepages for their subscribers, as well as to turn their subscribers’ inboxes into personalized search engines.

Email Marketing Reinvents Itself Again

Brands already have urgent reasons to build out their first-party audiences. One is the sunsetting of third-party cookies and the need for more customer data. Email marketing and loyalty programs, in particular, along with SMS, are great at collecting both zero-party data through preference centers and progressive profiling, as well as first-party data through channel engagement data.

Another is the increasingly evident dangers of building on the “rented land” of social media. For example, Facebook is slowly declining, Twitter has cut 80% of its staff to avoid bankruptcy as its value plunges, and TikTok faces growing bans around the world. Some are even claiming we’re witnessing the beginning of the end of the age of social media. I wouldn’t go that far, but brands certainly have lots of reasons to focus more on those channels they have much more control over, including the web, loyalty, SMS, and, of course, email.

So, the disruption of search engine optimization by generative AI is just providing another compelling reason to invest more into email programs, or to acquire them. It’s hard not to see this as just another case of email marketing reinventing itself and making itself more relevant to brands yet again.

By Chad S. White

Chad S. White is the author of four editions of Email Marketing Rules and Head of Research for Oracle Marketing Consulting, a global full-service digital marketing agency inside of Oracle.

Sourced from CMSWIRE

chatgpt,  digital experience, search, email marketing, artificial intelligence, generative ai, artificial intelligence in marketing

 

By Chad S. White
Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

The Gist

  • AI disruption. Generative AI is set to disrupt SEO significantly.
  • Content shielding. Brands need strategies to protect their content from AI.
  • Direct relationships. Building strong direct relationships is key.

Do your customers trust your brand more than ChatGPT?

The answer to that question will determine which brands truly have credibility and authority in the years ahead and which do not.

Those who are more trustworthy than generative AI engines will:

  1. Be destinations for answer-seekers, generating strong direct traffic to their websites and robust app usage.
  2. Be able to build large first-party audiences via email, SMS, push and other channels.

Both of those will be critical for any brand wanting to insulate themselves from the search engine optimization (SEO) traffic loss that will be caused by generative AI.

The Threat to SEO

Despite racking up 100 million users just two months after launching — an all-time record — ChatGPT doesn’t appear to be having a noticeable impact on the many billions of searches that happen every day yet. However, it’s not hard to imagine it and other large language models (LLMs) taking a sizable bite out of search market share as they improve and become more reliable.

And improve they will. After all, Microsoft, Google and others are investing tens of billions of dollars into generative AI engines. Long dominating the search engine market, Google in particular is keenly aware of the enormous risk to its business, which is why it declared a Code Red and marshalled all available resources into AI development.

If you accept that generative AI will improve significantly over the next few years — and probably dramatically by the end of the decade — and therefore consumers will inevitability get more answers to their questions through zero-click engagements, which are already sizable, then it begs the question:

What should brands consider doing to maintain brand visibility and authority, as well as avoid losing value on the investments they’ve made in content?

Protective Measures From Negative Generative AI Effects

Brands have two major levers they can pull to protect themselves from the negative effects of growing use of generative AI.

1. Shielding Content From Generative AI Training

Major legal battles will be fought in the years ahead to clarify what rights copyright holders have in this new age and what still constitutes Fair Use. Content and social media platforms are likely to try to redefine the copyright landscape in their favour, amending their user agreements to give themselves more rights over the content that’s shared on their platforms.

A white robot hand holds a gavel above a sound block sitting on a wooden table.
Andrey Popov on Adobe Stock Photo

You can already see the split in how companies are deciding to proceed. For example, while Getty Images’ is suing Stable Diffusion over copyright violations in training its AI, Shutterstock is instead partnering with OpenAI, having decided that it has the right to sell its contributors’ content as training material to AI engines. Although Shutterstock says it doesn’t need to compensate its contributors, it has created a contributors fund to pay those whose works are used most by AI engines. It is also giving contributors the ability to opt out of having their content used as AI training material.

Since Google was permitted to scan and share copyrighted books without compensating authors, it’s entirely reasonable to assume that generative AI will also be allowed to use copyrighted works without agreements or compensation of copyright holders. So, content providers shouldn’t expect the law to protect them.

Given all of that, brands can protect themselves by:

  • Gating more of their web content, whether that’s behind paywalls, account logins or lead generation forms. Although there are disputes, both search and AI engines shouldn’t be crawling behind paywalls.
  • Releasing some content in password-protected PDFs. While web-hosted PDFs are crawlable, password-protected ones are not. Because consumers aren’t used to frequently encountering password-protected PDFs, some education would be necessary. Moreover, this approach would be most appropriate for your highest-value content.
  • Distributing more content via subscriber-exclusive channels, including email, push and print. Inboxes are considered privacy spaces, so crawling this content is already a no-no. While print publications like books have been scanned in the past by Google and others, smaller publications would likely be safe from scanning efforts.

In addition to those, hopefully brands will gain a noindex equivalent to tell companies not to train their large language models (LLMs) and other AI tools on the content of their webpages.

Of course, while shielding their content from external generative AI engines, brands could also deploy generative AI within their own sites as a way to help visitors and customers find the information they’re looking for. For most brands, this would be a welcome augmentation to their site search functionality.

2. Building Stronger Direct Relationships

While shielding your content is the defensive play, building your first-party audiences is the offensive play. Put another way, now that you’ve kept your valuable content out of the hands of generative AI engines, you need to get it into the hands of your target audience.

You do that by building out your subscription-based channels like email and push. On your email signup forms, highlight the exclusive nature of the content you’ll be sharing. If you’re going to be personalizing the content that you send, highlight that, too.

Brands have the opportunity to both turn their emails into personalized homepages for their subscribers, as well as to turn their subscribers’ inboxes into personalized search engines.

Email Marketing Reinvents Itself Again

Brands already have urgent reasons to build out their first-party audiences. One is the sunsetting of third-party cookies and the need for more customer data. Email marketing and loyalty programs, in particular, along with SMS, are great at collecting both zero-party data through preference centers and progressive profiling, as well as first-party data through channel engagement data.

Another is the increasingly evident dangers of building on the “rented land” of social media. For example, Facebook is slowly declining, Twitter has cut 80% of its staff to avoid bankruptcy as its value plunges, and TikTok faces growing bans around the world. Some are even claiming we’re witnessing the beginning of the end of the age of social media. I wouldn’t go that far, but brands certainly have lots of reasons to focus more on those channels they have much more control over, including the web, loyalty, SMS, and, of course, email.

So, the disruption of search engine optimization by generative AI is just providing another compelling reason to invest more into email programs, or to acquire them. It’s hard not to see this as just another case of email marketing reinventing itself and making itself more relevant to brands yet again.

Feature Image Credit: Andrey Popov on Adobe Stock Photo

By Chad S. White

Chad S. White is the author of four editions of Email Marketing Rules and Head of Research for Oracle Marketing Consulting, a global full-service digital marketing agency inside of Oracle. Connect with Chad S. White:  

Sourced from CMSWIRE

By Luke Hurst

 has led to an increase in websites producing low-quality or fake content – and major brands’ advertising budgets may be funding them.

The Internet is awash with not only low-quality content, but content that is misleading, misinformation, or completely false.

The availability of generative artificial intelligence (AI) tools such as OpenAI’s ChatGPT and Google’s Bard, meanwhile, has meant AI-generated news and information has added to this tidal wave of content over the past year.

A new analysis from NewsGuard, a company that gives trust ratings to online news outlets, has found the proliferation of this poor quality, AI-generated content is being supported financially thanks to the advertising budgets of major global brands, including tech giants and banks.

The adverts appear to be generated programmatically, so the brands aren’t necessarily choosing to advertise on the websites that NewsGuard dubs “unreliable AI-generated news and information websites (UAINs)”.

According to NewsGuard, most of the ads are placed by Google, and they fail to protect the companies’ brand safety – as many legitimate companies don’t want to be seen to be advertising on sites that host fake news, misinformation, or just low-quality content.

NewsGuard, which says it provides “transparent tools to counter misinformation on behalf of readers, brands, and democracies,” defines UAINs as websites that operate with little or no human oversight, and publish articles that are written largely or entirely by bots.

Their analysts have added 217 sites to its UAIN site tracker, many of which appear to be entirely financed by programmatic advertising.

Incentivised to publish low-quality content

Because the websites can make money from programmatic advertising, they are incentivised to publish often. One UAIN the company identified – world-today-news.com – published around 8,600 articles in the week of June 9 to June 15 this year. That’s an average of around 1,200 articles a day.

The New York Times, by comparison, publishes around 150 articles a day, with a large staff headcount.

NewsGuard hasn’t named the big brands that are advertising on these low-quality websites, as they do not expect the brands to know their ads are ending up on those sites.

They did say the brands include six major banks and financial-services firms, four luxury department stores, three leading brands in sports apparel, three appliance manufacturers, two of the world’s biggest consumer technology companies, two global e-commerce companies, two US broadband providers, three streaming services, a Silicon Valley digital platform, and a major European supermarket chain.

Many brands and advertising agencies have “exclusion lists” that stop their ads from being shown on unwelcome websites, but according to NewsGuard, these lists aren’t always kept up to date.

In its report, the company behind the Internet trust tool says it contacted Google multiple times asking for comment about its monetisation of the UIAN sites.

Google asked for more context over email, and upon receiving the additional content as of June 25, Google has not replied again.

Google’s ad policies are supposed to prohibit sites from placing Google-served ads on pages that include “spammy automatically-generated content,” which can be AI-generated content that doesn’t produce anything original or of “sufficient value”.

A previous report from NewsGuard this year highlighted how AI chatbots were being used to publish a new wave of fake news and misinformation online.

In their latest research, conducted over May and June this year, analysts found 393 programmatic ads from 141 major brands that appeared on 55 of the 217 UAIN sites.

The analysts were browsing the sites from the US, Germany, France, and Italy.

All of the ads identified appeared on pages that had error messages generated by AI chatbots, which say things such as: “Sorry, as an AI language model, I am not able to access external links or websites on my own”.

More than 90 per cent of these ads were served by Google Ads, a platform that brings in billions in revenue for Google each year.

By Luke Hurst

Sourced from euronews.next

By

Unstructured text and data are like gold for business applications and the company bottom line, but where to start? Here are three tools worth a look.

Developers and data scientists use generative AI and large language models (LLMs) to query volumes of documents and unstructured data. Open source LLMs, including Dolly 2.0, EleutherAI Pythia, Meta AI LLaMa, StabilityLM, and others, are all starting points for experimenting with artificial intelligence that accepts natural language prompts and generates summarized responses.

“Text as a source of knowledge and information is fundamental, yet there aren’t any end-to-end solutions that tame the complexity in handling text,” says Brian Platz, CEO and co-founder of Fluree. “While most organizations have wrangled structured or semi-structured data into a centralized data platform, unstructured data remains forgotten and underleveraged.”

If your organization and team aren’t experimenting with natural language processing (NLP) capabilities, you’re probably lagging behind competitors in your industry. In the 2023 Expert NLP Survey Report, 77% of organizations said they planned to increase spending on NLP, and 54% said their time-to-production was a top return-on-investment (ROI) metric for successful NLP projects.

Use cases for NLP

If you have a corpus of unstructured data and text, some of the most common business needs include

  • Entity extraction by identifying names, dates, places, and products
  • Pattern recognition to discover currency and other quantities
  • Categorization into business terms, topics, and taxonomies
  • Sentiment analysis, including positivity, negation, and sarcasm
  • Summarizing the document’s key points
  • Machine translation into other languages
  • Dependency graphs that translate text into machine-readable semi-structured representations

Sometimes, having NLP capabilities bundled into a platform or application is desirable. For example, LLMs support asking questions; AI search engines enable searches and recommendations; and chatbots support interactions. Other times, it’s optimal to use NLP tools to extract information and enrich unstructured documents and text.

Let’s look at three popular open source NLP tools that developers and data scientists are using to perform discovery on unstructured documents and develop production-ready NLP processing engines.

Natural Language Toolkit

The Natural Language Toolkit (NLTK), released in 2001, is one of the older and more popular NLP Python libraries. NLTK boasts more than 11.8 thousand stars on GitHub and lists over 100 trained models.

“I think the most important tool for NLP is by far Natural Language Toolkit, which is licensed under Apache 2.0,” says Steven Devoe, director of data and analytics at SPR. “In all data science projects, the processing and cleaning of the data to be used by algorithms is a huge proportion of the time and effort, which is particularly true with natural language processing. NLTK accelerates a lot of that work, such as stemming, lemmatization, tagging, removing stop words, and embedding word vectors across multiple written languages to make the text more easily interpreted by the algorithms.”

NLTK’s benefits stem from its endurance, with many examples for developers new to NLP, such as this beginner’s hands-on guide and this more comprehensive overview. Anyone learning NLP techniques may want to try this library first, as it provides simple ways to experiment with basic techniques such as tokenization, stemming, and chunking.

spaCy

spaCy is a newer library, with its version 1.0 released in 2016. spaCy supports over 72 languages and publishes its performance benchmarks, and it has amassed more than 25,000 stars on GitHub.

“spaCy is a free, open-source Python library providing advanced capabilities to conduct natural language processing on large volumes of text at high speed,” says Nikolay Manchev, head of data science, EMEA, at Domino Data Lab. “With spaCy, a user can build models and production applications that underpin document analysis, chatbot capabilities, and all other forms of text analysis. Today, the spaCy framework is one of Python’s most popular natural language libraries for industry use cases such as extracting keywords, entities, and knowledge from text.”

Tutorials for spaCy show similar capabilities to NLTK, including named entity recognition and part-of-speech (POS) tagging. One advantage is that spaCy returns document objects and supports word vectors, which can give developers more flexibility for performing additional post-NLP data processing and text analytics.

Spark NLP

If you already use Apache Spark and have its infrastructure configured, then Spark NLP may be one of the faster paths to begin experimenting with natural language processing. Spark NLP has several installation options, including AWS, Azure Databricks, and Docker.

“Spark NLP is a widely used open-source natural language processing library that enables businesses to extract information and answers from free-text documents with state-of-the-art accuracy,” says David Talby, CTO of John Snow Labs. “This enables everything from extracting relevant health information that only exists in clinical notes, to identifying hate speech or fake news on social media, to summarizing legal agreements and financial news.

Spark NLP’s differentiators may be its healthcare, finance, and legal domain language models. These commercial products come with pre-trained models to identify drug names and dosages in healthcare, financial entity recognition such as stock tickers, and legal knowledge graphs of company names and officers.

Talby says Spark NLP can help organizations minimize the upfront training in developing models. “The free and open source library comes with more than 11,000 pre-trained models plus the ability to reuse, train, tune, and scale them easily,” he says.

Best practices for experimenting with NLP

Earlier in my career, I had the opportunity to oversee the development of several SaaS products built using NLP capabilities. My first NLP was an SaaS platform to search newspaper classified advertisements, including searching cars, jobs, and real estate. I then led developing NLPs for extracting information from commercial construction documents, including building specifications and blueprints.

When starting NLP in a new area, I advise the following:

  • Begin with a small but representable example of the documents or text.
  • Identify the target end-user personas and how extracted information improves their workflows.
  • Specify the required information extractions and target accuracy metrics.
  • Test several approaches and use speed and accuracy metrics to benchmark.
  • Improve accuracy iteratively, especially when increasing the scale and breadth of documents.
  • Expect to deliver data stewardship tools for addressing data quality and handling exceptions.

You may find that the NLP tools used to discover and experiment with new document types will aid in defining requirements. Then, expand the review of NLP technologies to include open source and commercial options, as building and supporting production-ready NLP data pipelines can get expensive. With LLMs in the news and gaining interest, underinvesting in NLP capabilities is one way to fall behind competitors. Fortunately, you can start with one of the open source tools introduced here and build your NLP data pipeline to fit your budget and requirements.

Feature Image Credit: TippaPatt/Shutterstock

By

Isaac Sacolick is president of StarCIO and the author of the Amazon bestseller Driving Digital: The Leader’s Guide to Business Transformation through Technology and Digital Trailblazer: Essential Lessons to Jumpstart Transformation and Accelerate Your Technology Leadership. He covers agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO and digital transformation influencer. He has published more than 900 articles at InfoWorld.com, CIO.com, his blog Social, Agile, and Transformation, and other sites.

Sourced from InfoWorld