Here comes part two of Anna Gellers report from Berlin Buzzwords 2023:
This article continues upon the post from Day 1 and highlights Anna’s personal key takeaways from the second conference day.
Summary of selected talks
At any time, there were several parallel talks. Each section below gives a brief summary of the sessions I was able to attend.
Avoiding Anti-patterns in Technical Communication
Sophie Watson, a technical marketing manager at Nvidia, shared how to communicate technical knowledge effectively without falling into common anti-patterns. The session started by defining anti-patterns — initially appearing to be a good solution but ultimately leading to more negative than positive outcomes. The speaker gave examples of common anti-patterns in various aspects of the tech industry, including technical communication. She discussed why certain patterns are ineffective and how to avoid them, including:
- over- and under-communicating
- the pursuit of likeability metrics communicated without context; for example, things like a high model accuracy or a low latency might not always be positive outcomes, depending on the context
- using irrelevant analogies — for instance, football analogies might be misunderstood by people unfamiliar with football.
The talk emphasized the need to consider the audience, communicate clearly and concisely, avoid irrelevant analogies, and focus on conveying essential information at the right level of detail.
Connect GPT with your data: Retrieval-augmented Generation
Malte Pietsch, the Co-Founder & CTO at deepset.ai, discussed the challenges faced when moving large language models (LLMs) to production. He mentioned the gap between the impressive demos seen on social media and the real-world issues engineers and product managers face applying those models in production scenarios.
The talk shared insights gained from working with over 50 enterprise customers. He discussed the need for data security and scalability, considering factors such as latency, throughput, costs, and model performance. Preventing hallucination in LLMs was the key concern, and the speaker stressed the importance of systematically assessing model behavior.
The session shared code examples using the open-source framework Haystack, along with tips for evaluating the performance of NLP applications through end-user feedback.
From keyword to vector
Byron Voorbach, the Head of Sales Engineering at Weaviate, summarized the last decade in search technologies based on his experience with Elasticsearch and Weaviate. The talk discussed the challenges in keyword-based systems in E-commerce, such as handling synonyms and user typos, and how ML-based approaches can help. The session shared insights on using search in practice.
Semantic vs. keyword search as context for GPT
Tudor Golubenco, the CTO at Xata, discussed the use of search to provide context for building a chatbot on your own data. The session was based on a chatbot embedded into Xata’s documentation page.
While vector search is commonly used, keyword search has advantages too. Xata ended up using a keyword search for their use case, but the comparison itself had no clear winner. You can read more on their blog.
Highly Available Search at Shopify
Khosrow Ebrahimpour, Production Engineering Manager at Shopify, shared the story of how they implemented a highly available platform that powers search for millions of users. The talk started with an overview of Shopify as a cloud-based Commerce platform and the importance of search in their business.
Shopify’s search platform handles around 2 petabytes of data across 114 fault-tolerant Elasticsearch clusters. They use Kafka as a main messaging service to facilitate communication between applications.
The session discussed three areas they focused on to ensure high availability: handling system failures, managing large seasonal sales events, and addressing data growth.
They use redundancy features built into Elasticsearch and Kubernetes to tackle system failures, such as machine, disk, or cloud provider failures. They also distribute their nodes across availability zones to protect against AZ failure.
To handle high-volume commerce events (such as Black Friday), they don’t auto-scale Elasticsearch directly. Instead, they work with data scientists to predict load during sales events and scale accordingly.
Finally, they’ve made their storage scalable by using a custom Kubernetes controller to tackle rapid data growth.
Future plans include leveraging Elasticsearch for vector search and handling scaling challenges (data growth) leading to issues with Elasticsearch sharding.
ChatGPT is lying, how can we fix it?
Kacper Łukawski, a Developer Advocate at Qdrant, discussed the issue of factuality with LLMs and explained how to use Retrieval Augmented Language Models along with a custom knowledge base to improve the accuracy of the results. The talk reviewed possible ways to implement this approach using vector databases and custom data.
Fact-Checking Rocks: how to build a fact-checking system
Stefano Fiorucci, an ML Engineer at 01S, shared how to build a fact-checking system for rock music using open-source libraries, including Haystack, FAISS, Hugging Face Transformers, and Sentence Transformers. He demonstrated how to combine Information Retrieval tools with modern LLMs to implement a fact-checking baseline.
The talk provided insights into the development of AI applications and covered Dense Retrieval, Natural Language Inference models, and integrating LLMs in NLP applications.
The code and deployed application can be found here.
Column-level lineage is coming to the rescue
Paweł Leszczyński and Maciej Obuchowski presented column-level lineage recently added to OpenLineage. The session demonstrated how column-level lineage is automatically extracted from Spark jobs and SQL queries and how the lineage metadata gets consumed by Marquez to display data and job dependencies in a DAG view. The column-level lineage is a popular feature that the community has developed recently, and both Paweł and Maciej are OpenLineage contributors actively involved in that project.
Tiny Flink — Minimizing the memory footprint of Apache Flink
Apache Flink is mostly used for large-scale real-time data processing at the scale of terabytes per second. But what if you need to process low-throughput streams? There’s a lot of overhead for distributed coordination when running a full, distributed Flink cluster.
In this talk, Robert Metzger, a Staff Engineer at decodable, discussed Flink’s MiniCluster, allowing you to run Flink in-JVM for integration tests in CI/CD, for local testing, as a microservice, or just as a small data processor deployed to Kubernetes. The session covered lessons learned from running MiniCluster in production for a service offering Flink SQL in the cloud.
The slides are available on speakerdeck.com/rmetzger.
Extra highlight: barcamp on Sunday
Even though the conference officially started on Monday, there was a special event on Sunday called Barcamp, moderated by Nick Burch. Barcamps are informal sessions with a schedule decided on the spot, driven by the interests and expertise of the attendees. This session is particularly worth keeping in mind for:
- First-time speakers who want to present in front of a smaller crowd
- Those who were late in submitting a talk proposal
- Those whose talk wasn’t accepted get a second chance to share their ideas/project/knowledge.
This year, the topics included, among others::
- What is the relevance process
- InPars: Fine-tuning neural search for GPT question answering on private data
- Possibilities for feature calculation in Vespa and how to integrate a BERT model during one morning.
General Impression
The conference was a great opportunity to connect with and learn from experts building search and AI products. There was a lot of energy and enthusiasm for AI and building data applications among the participants. I look forward to attending future conferences and continuing to be a part of this community.