wildcards
DESCRIPTION
This is the TranslateMedia guide for the use of regexes in Word.TRANSCRIPT
Language ServicesTranslateMedia
London | New York | Paris | Munich | Hong Kong
Accurate. Punctual. Confidential.
Professional Language Services
www.translatemedia.com
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
WILDCARDS & REGULAR EXPRESSIONS
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
This is a guide for the use of regexes in Word. Wildcards seem different according to the program you use them in (Google, Memoq,…)
Memoq has its own regex search feature (Auto-translatables window), but better use Word (easier + live double-checking)
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Why?
Processing a fair word count
Preparing files for translation
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Why?
Non-translatables = Numbers References in a catalogue Names Company registration names etc
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
What?
Wildcard= a keyboard character that you can use to represent one or many characters.
example: * in *.doc
Regular expression= a combination of literal and wildcard characters that you use to match patterns of text.
example: media[0-9]{3} matches media309, media110, etc
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Common wildcards
? = a single character
* = any number of characters
! = any but the character that follows
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Markers
< = beginning of a word > = end of a word ^13 = ¶
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Ranges
-> [ ] [0-9] = any number [3-6] = any number between 3 and 6
included [a-z] = any lower case letter [A-Z] = any upper case letter [aAiI] = a or A or i or I etc
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Repetitions
-> { } t{2} = tt 5{6,7} = 555555 or 5555555 [A-Z]{4} = any sequence of four capital
letters [0-9]{3} = any sequence of three
numbers
@ = one or more occurrences of previous character
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Note
If you want Word to find the actual characters usually used as wildcards, you have to type \ before these characters.
\? \< \@ etc
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Example
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Copy-paste in WORD
Note: the search option does not support wildcards in the Notepad (TXT files).
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Find and Replace window
Ctrl + H
Click More > > button
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Find and Replace window
Tick Use wildcards box
Enter a space in the Replace with field
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Deleting all numbers?
-> [0-9]
BUT:References made of numbers + capital letters will be left.
-> [A-Z] ?NO! For some titles are written with an upper case.
PLUS: •Translators could argue about dates, values, etc•Numbers in title
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Word count in MemoqTRADOS-LIKE WORD COUNT DOES NOT COUNT NUMBERS (ISOLATED SEQUENCES OF NUMBERS)
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Though…
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Regular expressions• UX[0-9]
• [A-Z]{4}[0-9]{2,3}[A-Z]{2}[0-9] ([A-Z]{4})([0-9]{2,3})([A-Z]{2})[0-9] -> You can add brackets to make it clearer. They will not be taken into account in the search. But you cannot add spaces, for they are searched for as characters. If you want to search for brackets, you have to put them between square brackets.
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
[A-Z]{4}[0-9]{2,3}[A-Z]{2}[0-9]*^13
·x·
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Conclusion
Look through the whole document for different non-translatable patterns
When creating a regex, make sure it will not delete anything you need to count
Still a rough count, unless you spend time going into details (counted as repetitions then)
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Links
http://office.microsoft.com/en-us/help/ha010873051033.aspx
http://office.microsoft.com/en-us/help/HA010873041033.aspx
http://word.mvps.org/FAQs/General/UsingWildcards.htm
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
In Memoq (Auto-translatables)
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
In Memoq (Auto-translatables)
PATTERN: (\d) = any number (\d+) = any number of numbers
REPLACEMENT RULE: $1 $2 -> according to position in digit
sequence $3 …
TranslateMedia London | New York | Paris | Munich | Hong KongAccurate. Punctual. Confidential.
Link
http://en.wikibooks.org/wiki/CAT-Tools/MemoQ/Tips_and_Tricks#Using_auto-translatables_for_number_format_conversion