wildcards

25
Language Services TranslateMedia London | New York | Paris | Munich | Hong Kong Accurate. Punctual. Confidential. Professional Language Services www.translatemedia.com

Upload: translatemedia

Post on 11-May-2015

672 views

Category:

Self Improvement


3 download

DESCRIPTION

This is the TranslateMedia guide for the use of regexes in Word.

TRANSCRIPT

Page 1: Wildcards

Language ServicesTranslateMedia

London | New York | Paris | Munich | Hong Kong

Accurate. Punctual. Confidential.

Professional Language Services

www.translatemedia.com

Page 2: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

WILDCARDS & REGULAR EXPRESSIONS

Page 3: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

This is a guide for the use of regexes in Word. Wildcards seem different according to the program you use them in (Google, Memoq,…)

Memoq has its own regex search feature (Auto-translatables window), but better use Word (easier + live double-checking)

Page 4: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Why?

Processing a fair word count

Preparing files for translation

Page 5: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Why?

Non-translatables = Numbers References in a catalogue Names Company registration names etc

Page 6: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

What?

Wildcard= a keyboard character that you can use to represent one or many characters.

example: * in *.doc

Regular expression=  a combination of literal and wildcard characters that you use to match patterns of text.

example: media[0-9]{3} matches media309, media110, etc

Page 7: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Common wildcards

? = a single character

* = any number of characters

! = any but the character that follows

Page 8: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Markers

< = beginning of a word > = end of a word ^13 = ¶

Page 9: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Ranges

-> [ ] [0-9] = any number [3-6] = any number between 3 and 6

included [a-z] = any lower case letter [A-Z] = any upper case letter [aAiI] = a or A or i or I etc

Page 10: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Repetitions

-> { } t{2} = tt 5{6,7} = 555555 or 5555555 [A-Z]{4} = any sequence of four capital

letters [0-9]{3} = any sequence of three

numbers

@ = one or more occurrences of previous character

Page 11: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Note

If you want Word to find the actual characters usually used as wildcards, you have to type \ before these characters.

\? \< \@ etc

Page 12: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Example

Page 13: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Copy-paste in WORD

Note: the search option does not support wildcards in the Notepad (TXT files).

Page 14: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Find and Replace window

Ctrl + H

Click More > > button

Page 15: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Find and Replace window

Tick Use wildcards box

Enter a space in the Replace with field

Page 16: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Deleting all numbers?

-> [0-9]

BUT:References made of numbers + capital letters will be left.

-> [A-Z] ?NO! For some titles are written with an upper case.

PLUS: •Translators could argue about dates, values, etc•Numbers in title

Page 17: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Word count in MemoqTRADOS-LIKE WORD COUNT DOES NOT COUNT NUMBERS (ISOLATED SEQUENCES OF NUMBERS)

Page 18: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Though…

Page 19: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Regular expressions• UX[0-9]

• [A-Z]{4}[0-9]{2,3}[A-Z]{2}[0-9] ([A-Z]{4})([0-9]{2,3})([A-Z]{2})[0-9] -> You can add brackets to make it clearer. They will not be taken into account in the search. But you cannot add spaces, for they are searched for as characters. If you want to search for brackets, you have to put them between square brackets.

Page 20: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

[A-Z]{4}[0-9]{2,3}[A-Z]{2}[0-9]*^13

·x·

Page 21: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Conclusion

Look through the whole document for different non-translatable patterns

When creating a regex, make sure it will not delete anything you need to count

Still a rough count, unless you spend time going into details (counted as repetitions then)

Page 22: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

Links

http://office.microsoft.com/en-us/help/ha010873051033.aspx

http://office.microsoft.com/en-us/help/HA010873041033.aspx

http://word.mvps.org/FAQs/General/UsingWildcards.htm

Page 23: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

In Memoq (Auto-translatables)

Page 24: Wildcards

TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.

In Memoq (Auto-translatables)

PATTERN: (\d) = any number (\d+) = any number of numbers

REPLACEMENT RULE: $1 $2 -> according to position in digit

sequence $3 …