The enhanced searching features advised in an earlier announcement are now operational for SIRCA’s Australian Company Announcements (ACA) data set. ACA’s search syntax now lets you construct complex queries. The examples below show some of the things you can do.
Entering one or more words in the search field makes ACA search for documents that contain any of the words. For example, searching for
earnings quarterly
will return documents that contain either of both of the words “quarterly” and “earnings”.
You can specify that a particular word must or must not appear in search results by prefixing it with a “+” or “-” respectively. For example, searching for
+mining exploration -gold
will only return documents that contain the word “mining” and do not contain the word “gold”. Results may or may not contain the word “exploration”, but those that do will be considered more relevant and will appear first in the results list.
Searches can also combine words using the AND and OR keywords (NB: These must be entered in caps). In addition, parentheses can be used to group search terms. For example, searching for
kalgoorlie AND (gold OR silver)
will return documents that contain the word “kalgoorlie” and either or both of “gold” or “silver”.
It is possible to search for a phrase by enclosing it in quotes. For example, searching for
"chinese government"
will only return documents that contain that exact phrase.
You can use the wildcard characters “*” (to match any number of letters) or “?” (to match a single letter). For example, searching for
chin*
will match “china” and “chinese”, as well as any other word starting with “chin”. Wildcard characters can be used at the end of a search term or in the middle, but not at the start.
You can even do fuzzy and proximity searches.
A fuzzy search finds matches like the string you define rather than exactly that string. For instance, a fuzzy search for “gold” might find “sold”, “bold”, “told”, “sole” depending upon the degree of fuzziness implemented. The degree of fuzziness is a parameter you can specify.
gold~
is how you request a fuzzy search for “gold” and
gold~0.8
defines a degree of fuzziness. Values closer to one are more precise and those nearer to zero are less precise. The default value, when no degree is specified, is 0.5.
A proximity search finds matches where words are within a specific number of words from each other. For example you may wish to find announcements with “cash” and “share” within six words of each other. Then you would specify
“cash share”~6
ACA’s search functions are now implemented using Apache Lucene. See the Lucene Documentation for a complete description of the search syntax.
In addition to the new search capabilities, ACA now delivers all available text conversions as part of the Download Results link. Previously that link only delivered PDF files when these were available. Now, the text conversions of those files are also provided.
Researchers should note these conversions are not always accurate. Text conversions result from a process that does its best to find usable text in PDF documents. The worst conversions are not used but some degree of errors must be accepted in order to make text searching possible. So do not expect these text conversions to be error free. The original PDFs are provided so you can correct errors in announcement conversions that are particularly important to you.
Despite these realities, we are sure access to our text conversions means will you now be better able to routinely process Australian company announcements for statements that are most relevant to your research. The new targeting tools should help you find those announcements more quickly.