TutorialsSeptember 15, 2021

Webhooks: Overview & Demo Integration with Amazon SageMaker

A tutorial showing a demo integration with Amazon SageMaker hosted by Sarah Moir, head of Content & Education at Heartex.

Transcript

Michael
Hello and welcome to our slightly upgraded setup and long-overdue webinar tutorial. I'm joined today by my colleague Sarah—you know what, I've never actually pronounced your last name in public. Want to go ahead and say it?

Sarah
Thank you! It’s Sarah Moyer.

Michael
Moyer, of course. Scottish, right?

Sarah
What can I say.

Michael
Sarah heads up content and education at HumanSignal. If you’ve read the docs or an article written by us, Sarah’s had her hands all over it. Today, we’re going to walk through one of those articles—focused on webhooks—and Sarah’s going to show us a live demo of an integration with Amazon SageMaker. Let me run through a couple slides first.

If you have questions as we go, we’re taking them in our #webinars channel in the Label Studio Slack community. It’s free to join. You can go to labelstud.io, find the Slack link, and head to that channel. We’ll take questions throughout the webinar.

Also, we now have a section on the site for upcoming webinars, where you can RSVP. We’ll be doing more live demos, interviews, and hopefully some partner webinars as well.

If you want to follow along today, you can use the article that Sarah wrote. It’s the latest post on the blog at labelstud.io/blog.

With that said, over to you, Sarah.

Sarah
Sounds great. Let me share what I need here. Like Michael said, this is a live demo, so fingers crossed the demo gods are with me today.

I’ve already done a few things that are in the article. First, I set up an S3 bucket with bird images. The goal of this integration is to connect Label Studio to a SageMaker model training pipeline. We’re doing semantic segmentation, so we’ll use the annotation_created webhook, send it through AWS Lambda and API Gateway, and trigger the SageMaker pipeline. The trained model will be stored back in S3.

The bucket already contains the bird images, a place for annotations, and the Python scripts for preprocessing and cleanup. I’ve also set up a Label Studio project with a semantic segmentation config.

Now we’ll configure SageMaker. First step: IAM policies. I’ve already created the user policy and user role. Now I’m attaching policies to give SageMaker and S3 access.

Once that’s done, we create the SageMaker pipeline. If you’re following the article, you’ll see the pipeline definition there—it runs a preprocessing script to split the dataset into train and validation sets, pulls images and annotations from S3, and triggers a ResNet-50 model for semantic segmentation.

I’m updating the pipeline definition with the ARN for the IAM role we just created. Now we deploy the pipeline. It’s deceptively fast to create, and you can view it along with other pipelines you’ve previously launched.

Next, we set up IAM policies for the Lambda function. Lambda is what receives the webhook from Label Studio and waits until 16 annotations are created before invoking the SageMaker pipeline.

I’ve already created the Python script for Lambda off-screen. We zip it, upload it to Lambda, and then grab the ARN so we can connect it to the API Gateway.

API Gateway is the last piece before jumping back to Label Studio. It securely passes webhook data from Label Studio to Lambda. I’m setting that up now with the Lambda function we just uploaded, and adding the right permissions so it can invoke Lambda.

That gives us a public URL we can use as our webhook endpoint. Save that—we’ll need it shortly.

Back in Label Studio, we’ll set up cloud storage. I’m using AWS S3. My bucket is sarah-showcase-bucket, and the prefix for input images is bird-images. These are JPEGs, and the bucket is in us-east-2.

Now I’ll enter my AWS credentials. These are required to generate pre-signed URLs. Adding the storage... and syncing. We’ve got 87 images pulled in. Great.

Next, I’ll configure target storage—that’s where annotations go after you label. Same bucket, but the prefix is annotations.

Now we add the webhook. Use the URL from API Gateway. We want this to trigger only when an annotation is created—no headers needed, and we do want to send the full payload. The documentation has details on what’s included in the payload.

Webhook added. Time to start annotating.

(At this point, Sarah encounters a bug due to special characters in AWS credentials, which prevents Label Studio from accessing the bucket.)

Sarah
Of course, we’ve hit the classic AWS bug where secret keys with special characters cause a 403 SignatureDoesNotMatch error. This has nothing to do with Label Studio—it’s an issue on the Amazon side that’s been documented for years. The workaround is to regenerate your credentials and make sure the new secret key contains no special characters. It’s annoying, but it works.

Let me update the credentials again and wait a minute—Amazon sometimes takes a bit to register changes. Fingers crossed...

[Pauses]

Success! We’ve got bird images. Let’s label some.

I’m going to do a few by hand and then duplicate them to hit the 16-annotation threshold quickly. We’re using public domain bird images from the U.S. Fish & Wildlife Service—lots of fun options in here.

You’ll notice after each annotation, the webhook fires and sends data through the Lambda function. That’s what eventually triggers the SageMaker pipeline.

I’ll sync the annotations to target storage. And now we wait for the SageMaker pipeline to execute.

Checking S3... we’ve got our temp directory! That means the preprocessing script ran and split the data into training and validation sets. Once the model finishes training, we should see an output directory with the .tar.gz model file.

Let’s refresh... and there it is. The model is trained, the cleanup script ran, and the temp files are gone. End-to-end workflow is complete.

Michael
Perfect timing. Thank you, Sarah. This ended up being a great example of how messy real integrations can be—but also how doable it is with the right tooling. Really appreciate you walking us through it.

If anyone has questions, Sarah is active in our Slack community, so feel free to join and reach out. Thanks again, and see you next time!

Webhooks: Overview & Demo Integration with Amazon SageMaker

Transcript

Related Content