TutorialsApril 28, 2022

Natural Language Annotation & Cloud Storage Integration

This beginner friendly webinar, led by Heartex's Senior Frontend Engineer Nick Skriabin & moderated by Heartex's Head of Open Source Community Michael Ludden, will teach viewers how to annotate text and configure LabelStudio version 1.0.1 for cloud storage integration. It will also answer some common questions and walk viewers through various key features of the product.

Transcript

Michael: Welcome everyone. I’m Michael Ludden, and I head up Open Source Community at HumanSignal, focusing on the Label Studio community. This is the second in our webinar series, which also serves as beginner-friendly tutorials. Today, I’m joined by Nick, our senior front-end engineer and one of the founding members of the team.

We’ll be covering two main topics today: cloud storage integration and natural language annotation in Label Studio.

Before we dive in, a quick note: we’re doing these webinars bi-weekly. The next one will feature another partner, and we’ll announce details soon. If you want to submit questions during the session, please post them in the #webinars channel of the Label Studio Slack. You can join our Slack community via the Bitly link shown on screen.

We’ll start with cloud storage, take questions from Slack, then move on to natural language annotation and take more questions.

Nick: Thanks, Michael. Let’s start with cloud storage. I’ll walk through setting up a project and integrating AWS S3.

In Label Studio, go to Project Settings → Cloud Storage. You’ll see two types: Source Storage (where data comes from) and Target Storage (where annotated results are exported). We support cloud options like AWS S3, Azure, and GCS—as well as Redis and local files.

For this example, I’ll show AWS S3. First, create an AWS account and a bucket. Then, go to IAM, create a user group with AmazonS3FullAccess, and add a user with programmatic access. You’ll receive an Access Key ID and Secret Access Key—keep them safe.

In Label Studio, input these credentials, your bucket name, and (if needed) a prefix to locate your files. Toggle “Use each file as a separate task.” Set the region (like us-east-1) and click Add Storage.

Note: you must enable CORS on your bucket to allow Label Studio to access the files via pre-signed URLs.

After syncing, your data will appear as tasks, each referencing the file URL. These are not file contents—just links. If your bucket is private, you’ll need pre-signed URLs or proper permissions.

Michael: A quick question from the Slack channel: What flavor of regex should be used in file filters?

Nick: Use Python-compatible regex—it matches Label Studio's backend.

Michael: Another question: Can I auto-sync cloud storage at regular intervals?

Nick: Label Studio doesn’t have built-in scheduling. But you can use the API with a cron job or similar to call the sync endpoint periodically.

Michael: Let’s move to natural language annotation.

Nick: Sure. First, go back to Project Settings → Labeling Interface. For text classification, we use the Choices tag.

Add labels like “Fake” or “Real.” Since our input is via URL, update the tag to have valueType="url".

Once tasks are loaded, click on one to label it. You’ll see the text and choose your label. Submit when done.

Now, to export: Go back to Project Settings → Cloud Storage → Add Target Storage. Enter the bucket, prefix (e.g., “train_data”), and new access keys if needed. Once added, any submitted tasks will sync to that folder.

Each labeled task is saved as a JSON file—one per task—with metadata, annotation, labels, and more.

Michael: Can we showcase a more complex NLP task like Named Entity Recognition (NER)?

Nick: Yes. First, remove existing annotations—Label Studio doesn’t allow changing the labeling config once data is labeled.

Switch to the NER template. With NER, you can highlight entities (like names or organizations) and optionally define relationships between them using the relation tool.

You can also create nested entities by dragging one label into another. Add metadata to each annotation by clicking the “+” icon.

Michael: Can I preload labels for NER?

Nick: Yes. There are two options:

Paste multiple labels into the interface config.

Preload data with annotations in JSON format—Label Studio will extract the labels automatically.

Michael: What types of NLP tasks does Label Studio support?

Nick: Several, including:

Text Classification

Named Entity Recognition

Relation Extraction

Question Answering

Text Summarization

Taxonomy tagging

Machine Translation

You can also combine them into multi-task labeling configs.

Michael: We had a question about whether CSV exports can be re-imported to another instance.

Nick: CSV isn’t recommended. Use JSON exports instead. They retain all metadata and annotations and can be imported to another Label Studio instance.

Michael: That’s a wrap for today. Thanks Nick, and thanks to everyone who joined. Please fill out the feedback form and let us know what topics you'd like covered in future webinars. See you in two weeks!

Natural Language Annotation & Cloud Storage Integration

Transcript

Related Content