You can create advanced custom rules to support your organization's preferred writing style using the ProWritingAid Style Guide. In this article, we dive into advanced techniques for creating custom rules using our Style Guide. If you haven't started creating Style Guide rules yet, or are just looking for basic support, we recommend reading our Style Guide 101 article first.
- How Computers Read Language
- Creating Rules That Find More Than One Form of Your Word or Phrase
- Why CT(verb) is Your Style Guide's Best Friend
- How Do I Replace One Word, But Only When It Appears Within a Certain Phrase?
- How Do I Give Multiple Suggestions for a Word Using @?
- How Do I Remove One Word From a Phrase?
- What If We Want to Replace a Whole Phrase With a Different Word or Phrase?
- How Do I Remove or Add Punctuation to a Certain Word or Phrase?
- How Can a Tilde ( ~ ) Help Me Flag Something "With or Without" An Element?
- How Do I Flag a Specific Structure in Sentences and Replace Another?
- How do I replace multiple versions of a phrase with multiple versions of a different phrase?
- Use the ProWritingAid Style Guide to Create Advanced Custom Rules for Your Organization
How Computers Read Language
Computers can do incredible things with language. They just need to be told or shown how to act. So, asking a computer to change the tense of a verb depending on its use in a sentence is no problem. Likewise, asking it to treat the same word differently depending on whether it’s the verb or noun form is no sweat.
There are four key elements that computers use to understand language:
#1: Parts of Speech
To begin with, let’s look at POS (that’s Parts of Speech to you and me). A part of speech is the broad category of use for a specific word. The most common parts of speech are verbs, nouns, adjectives, adverbs, determiners, and prepositions.
#2: Tokens: breaking a phrase down into its essential parts
Tokenization splits a word, phrase or sentence into its component parts. Each component is a token and has a unique identifying number. Token numbers begin at 0 and are always preceded by a $: $0, $1, $2, $3, etc
Tokenization includes both words and punctuation as you'll see below:
Tokens allow the software to understand which part of the phrase you are referring to when you instruct it to find or replace something in your text.
#3: Tags: Name Every Part of Your Phrase
For matching within the style guide tool, we further subdivide POS. These subdivisions have unique codes and are known as tags. We use the Penn Treebank POS tags:
#4: Lemmas: The Basic Form of a Verb
The lemma is the base form (root) of your verbs (eg. run, fly, sneeze). (This is the same as the infinitive but without the 'to'.)
Creating Rules That Find More Than One Form of Your Word or Phrase
If you were to write a rule for every single variation of an incorrect spelling, phrasing or style point, you'd be at it a long time. So now we're going to look at how we can use the style guide tool to find variations of the same problem by using the correct tags and basic code. Just a little warning: it can get quite addictive when you get to grips with creating rules that solve your problems.
Why CT(verb) is Your Style Guide's Best Friend
One of the most useful tools is the code CT, which means 'context'. We use CT followed by a verb in brackets where we want a rule to match all variants of a verb. (Note that the verb in brackets needs to be the lemma, or base form, as explained above, ie use, eat, play.)
Here's an example: let’s say you want to ban the term "endeavor," a classic example of an over-complicated word. It's one of those words that can cause your reader to break their flow. Using 'try' is simpler and much more reader-friendly:
By using the code ‘CT(endeavor)’ in the match, the computer knows to find any form of the verb: endeavor, endeavors, endeavored and endeavoring.
Now we need to tell the computer to use the replacement verb, 'try' but in the same tense as the version of ‘endeavor’ that it just found.
If we enter the replacement as follows: "try-$0", we’re telling the computer to replace any form of the word "endeavor" with the corresponding form of the word "try". So if it finds "endeavoring", it will suggest "trying". If it finds "endeavored, it will suggested "tried". And so on.
Here's what it looks like in the style guide tool:
And then all versions of that sentence will be flagged:
How Do I Replace One Word, But Only When It Appears Within a Certain Phrase?
Here, a client wants to find the phrase "website sales" or "website sale" and replace it with "online sales" or "online sale."
We can’t search only on "website" as it would find lots of instances of the word used in the correct way.
We could create two different rules, one flagging "website sale" and another flagging "website sales".
It's simpler to include the options ‘"sale" and "sales" in brackets next to the term we are searching for - this tells the computer that either 'sales' or 'sales' must be present in the match. Then, we use token '$1' to tell the computer to use the same word in the replacement. Using a token guarantees an exact match so the correct singular or plural form will be used.
(Why $1 in this replacement? Because $0 is "website" and $1 is "sale" or "'sales" and it's $1 - the second word - that we want to use in the replacement.)
Now "website" will only be flagged in instances where it is followed by "sale or "sales".
Note: the space on the inside of each bracket is important as it tells the computer to read what's contained within the bracket and not include the bracket in its search. If you provide the match of 'easy opportunities', every time the term low-hanging fruit or low hanging fruit is used, the computer will suggest the replacement of easy opportunities.
How Do I Give Multiple Suggestions for a Word Using @?
So, you know how frustrated you are each time you read a team member's client report that mentions "actionable items"? Now you can create a rule that flags it and suggests a variety of more specific adjectives instead:
Here's the replacement string containing the @ in action:
Once again, we're using token $1 in the replacement to ensure the correct form of 'item' or 'items' in the replacement. The @ symbol means that each will appear as an alternative option in the editing pop-up.
How Do I Remove One Word From a Phrase?
In the next example we use the token to tell the computer to replace the matched phrase with only the second word (token $1), leaving out the first word entirely.
The computer knows to treat tokens in the correct format depending on their position in the replacement sentence. So this rule to always remove ‘Best’ from ‘Best regards’ will replace it with ‘Regards’ or ‘regards’ depending on where it’s used in the sentence.
What If We Want to Replace a Whole Phrase With a Different Word or Phrase?
1: Bang for your buck
So, let’s say we want to set a rule that identifies the business jargon ‘bang for your buck’ and suggests that it be replaced with ‘value’.
We know the offending sentence could use one of several possessive pronouns:
You could write four different rules, one for each version. But you can save time by using the POS tag for possessive pronoun, which is ‘PRP$’. The computer will then match any version of the phrase containing a possessive pronoun.
Not sure which parts of speech make up your sentence? No problem, click "Show Sentence Analysis" and it will give you the tags:
In the tool, this rule will appear like this:
*2: Boiling the ocean *
This is another example of corporate speak that would be better expressed in simple language.
Here we use the CT (context) command to identify the verb ‘boil’ in any context followed by ‘the ocean’. The use of the $0 in the replacement tells the computer to use the verb ‘waste’ in the same way that ‘boil’ was used. So ‘boiled’ would be replaced by ‘wasted’, ‘boiling’ by ‘wasting’ and so on.
And ‘the ocean’ is deleted and replaced with ‘time’.
How Do I Remove or Add Punctuation to a Certain Word or Phrase?
Example 1: Preference for limited punctuation in abbreviations:
Here, we’re using an example from a client that prefers an open punctuation style, including using no periods in ie and eg.
If you want to create a rule for specific punctuation, you need to separate that punctuation in the ‘match’ box. If it has a space on either side, the computer will recognise it as a token and treat it as part of the matched string.
*Example 2: Common missed punctuation errors *
A large college wanted to tackle the high volume of student report cards that featured versions of the phrase ‘Well done Jade’ and correct them to ‘Well done, Jade’.
We only want this rule to apply to instances where the term ‘Well done’ is followed by a proper noun.
So we create the match: Well done NNP (NNP is the POS tag for proper noun)
And then the replacement: Well done, $2
Which the computer reads as an instruction to use token 2, in other words the proper noun that it found in the match, and re-use it in the replacement
How Can a Tilde ( ~ ) Help Me Flag Something "With or Without" An Element?
Imagine you're a comms leader and you're fed up of seeing sales proposals going out of the door that talk about 'low-hanging fruit'.
Sales people love to use metaphors but your business looks much more professional if you say what you mean and describe the 'easy opportunities' in less fruity ways. How do we code this into a style rule?
We want the computer to find 'low-hanging fruit'. But some people may well miss the hyphen and just type 'low hanging fruit'.
Again, we could create two separate rules, but if you use the ( ~ ), you can tell the computer to search for examples that contain the hyphen as well as those that don't:
low ( - ~ ) hanging fruit
How Do I Flag a Specific Structure in Sentences and Replace Another?
An international publisher wants to tackle the issue of their journalists using nouns as verbs. It’s increasingly common and while some people don’t mind it, for others it goes against their style guidelines.
Here, we’re looking at the noun ‘author’ being used as the verb, ‘authored’.
In the example, we use the CT command to identify any form of ‘author’. We’re then using the POS tag incase ‘author’ is followed by a determiner (DT is the POS tag for determiner). They could have authored several, many or just a few books.
We use the tilde (~) to show the computer that the determiner is optional - it might feature but equally it might not. Finally, and as before, we’ve used spaces inside the brackets to ensure the computer isn't searching for the bracket as part of the match.
In the replacement, we use the tokens to tell the computer to replace ‘author’ with the verb ‘write’ but in the matching tense. And we go on to tell it to replace tokens $1 and $2 to ensure the sentence otherwise remains the same.
How do I replace multiple versions of a phrase with multiple versions of a different phrase?
This example brings together several elements.
Ever heard the expression '... it isn't brain surgery'?
To most native English speakers it's an expression meaning that something is not particularly difficult. But in a multi-cultural, multi-language setting, it could confuse your reader. So let's just say what we mean: it's not complicated.
Turning this into a rule is not too complicated, either. We just need to account for different ways of spelling out the 'is not': is the writer using a contraction or not?
So, we use an optional 'n' and ''' denoted by ( n ~ ) and ( ' ~ ). Remember they are individual parts of the string (tokens) - even the apostrophe - so we need to tell the computer to search for each element as separate items.
We include the search for 't' or 'not' as a non-option. Why? Because if we also treated these as optional elements, the computer would suggest a replacement even where you're using the term brain surgery in the correct context, the term 'she has had extensive brain surgery' would be flagged.
We use $0, $1, $2, $3 and $4 to replace any of the first five tokens in whatever format they featured, followed by 'complicated'.
The result? "It isn't brain surgery" becomes "It isn't complicated."
Use the ProWritingAid Style Guide to Create Advanced Custom Rules for Your Organization
ProWritingAid's Style Guide gives you the opportunity to create advanced custom style rules for your organization. By using this feature, you can find and replace pesky words and phrases to shore up your content and make sure you and your team are communicating about your business the same way.
Feel free to share any feedback, questions, or comments on the Style Guide with us at firstname.lastname@example.org.