techniques for manipulating text - evan schiff...techniques for manipulating text (and why that’s...

Post on 01-Apr-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Techniques for Manipulating Text

(and why that’s useful)

presented by Evan Schiff

What do these files have in common?

ALE STL SubtitlesSubCapAvid Bin Export

They are all plain text.

FCP XML

You Can Do A Lot With Plain Text

Reformat one type of file into another

Add, remove, or fix data generated by other applications

Quickly hunt down a specific piece or pattern of information within a large document

Make changes en masse to avoid time-consuming manual adjustments

Parse it using a Terminal command or script

Some Real World Examples

Convert an EDL with Locators into a SubCap file for importing back into Avid

Convert Avid Locators into a DVDSP or Compressor Chapter Markers file

Create an EDL out of data in your Filemaker codebook

Batch rename files with advanced substitution patterns

Process a list of missing ProTools media to automatically hunt it down

What Tools Do You Need?

A good text editor.

Textmate ($64) Atom (Free) Sublime ($70)

(Not TextEdit or Notepad.)

Starting Out

A) What type of data do you have?

B) What type of data do you need?

How do you turn A into B?

Starting Out

Get familiar with everything a text editor can do.

Learn to navigate using the keyboard.

Experiment.

After Learning the Basics

Learn Regular Expressions (RegEx)

Learn a programming language such as Python, Javascript, or Bash.

Combine scripting with RegEx to accomplish complex tasks

Constantly reassess your workflow to find faster and easier methods

If you come up with something cool, share it!

Text Manipulation Without Regular Expressions

Multiple Cursors Demo

Most of the time when we want to manipulate text, we want to change a lot of it all at once

One way to do that is with Multiple Cursors

Let’s look at what that is, and how it can be useful

Multiple Cursors Demo

Cmd-F: Find Opt-Enter: Find All, Add a Cursor at every Occurrence Cmd-Click Text: Add a Cursor manually Cmd-Shift-L: For every line of selected text, add a Cursor

OS X Keyboard Navigation • Cmd ←/→: Go to start/end of line • Cmd ↑ / ↓: Go to top/bottom of document • Opt ←/→: Go to previous/next word • Add Shift to select or unselect text

Windows Keyboard Navigation • Home/End: Go to start/end of line • Ctrl Home/End: Go to top/bottom of document • Ctrl ←/→: Go to previous/next word • Add Shift to select or unselect text

Text Manipulation Using Regular Expressions

What are Regular Expressions?

Regular Expressions (RegEx) are a way to define patterns of text

They enable you to find and select text that matches those patterns

And with that matching text selected you can then change it to suit your needs

What Does It Look Like?

(\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$))

Wait, what the #\.$*^# is that?!

Hold it, Hold it, What the hell is that shit?!

RegEx is made up of codes

Each code represents a set of characters such as letters, numbers, and $#!*

When written in a specific sequence,

they define a pattern of text.

Let’s take a closer look.

What does RegEx look like?

Simple:

Timecode: \d{2}:\d{2}:\d{2}:\d{2} 01:02:03:04

E-Mail Address: [\w.-]+@[A-Za-z0-9.-]+\.[A-Z]{2,4} evan@evanschiff.com

Complex:

EDL Event: (\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$)) Reference 003 08P013V V C 13:31:20:14 13:31:28:16 04:00:42:22 04:00:51:00

Locator in an EDL: \* LOC.*[\d:]{11}\s+([\w]+) +\b([^\r\n]*?)\r?\n Reference: * LOC: 04:00:47:20 BLUE CS0020

What are some other patterns you can think of?

Let’s test them with Rubular.com

RegEx Structure

Regular Expressions consist of character codes and quantifiers

Or in other words, what is the character you are looking for and how many times do you expect to see it?

3 Digits:A word of any length:

Zero or one ‘u’:

\d{3}\w+

colou?r

315Avid

color or colour

Search Criteria Code Example

RegEx Symbols

Symbol What It Represents. Any character except line break

Such AsA-Z 0-9 Special Characters

\d Any digit 0-9

\s Whitespace (spaces, tabs, line breaks)

\t Tab

\n and \r Line Break: \n is Mac/Linux, \r\n is Windows

\w Any character that could be part of a word A-Z 0-9 _

[ ] Match the letters or symbols inside the brackets. The example to the right matches only the letters D, E, or F

[DEF]

RegEx Quantifiers

Symbol What It Represents? Zero or One occurrence

+ One or more occurrence

* Zero or more occurrences

{2} Exactly 2 occurrences

{2,5} Between 2 and 5 occurrences

Some Real World Demos

Convert an EDL with Locators into a SubCap file for importing back into Avid

Convert Avid Locators into a FCP or Compressor Chapter Markers file

Batch rename files with advanced substitution patterns

When are RegEx Not Useful?

When you don’t know or can’t clearly define a pattern

If it’s faster to make the changes by hand than figure out what the pattern is

When there’s no variation in what you’re searching for

Sometimes a normal Find & Replace is all you need

When there’s too much variation, don’t try the all-in-one approach. Maybe you can break it down into multiple smaller patterns

So What’s the Next Step?

Shell Scripting

What is Shell Scripting?

Shell scripting uses a programming language

to execute a series of commands

in order to accomplish a more complex task

ProTools Example

In 40 lines of code, and using regular expressions,

this script takes a text file of media that ProTools can’t find,

locates it, and copies it to a directory on your desktop.

How Do I Learn?

Google it!

Check out sites like Codecademy, Code Avengers, Khan Academy, etc.

Pick a language to learn,

and of the many languages out there, I would probably start with Python

Thanks!

top related