Safety Data Sheet suppliers are usually required to provide emergency telephone number in section 1 of the SDSs. Some regulations even recommend having 24/7 local emergency telephone numbers. OSHA clearly states that the telephone number should be provided for person / entity who is either knowledgeable of the hazardous material being shipped and who has comprehensive emergency response and incident mitigation information for that material. However, companies are facing issues with availability of knowledgeable persons / staff who can be available for 24/7 to carry out such tasks. There are many instances where companies rely upon 3rd party service providers, whereas EU member states use official advisory bodies. Such advisory bodies do not exist for US other countries.
To meet regulatory requirements and ensuring proper hazard communication within the organizations, companies can make use of Conversational BOTS using Natural Language Processing techniques. These emulators can be used to answer the conversations, but actually the behaviour is not the real conversation. Technologically goal is achieved by using the tricks like placing huge information of safety data sheets and based on the search of the product name the corresponding Safety Data Sheet is flashed as pdf document. Sometimes programs crawl the pdf documents and retrieve the information. What matters in GHS or OSHA regulatory requirements is the determination of the truth, context, calculation of entailments and taking appropriate action in the light of conversation and translate to other languages if needed.
In this blog I have tried to give an overview of the Natural Language processing techniques using tools spaCY demonstrating how the language is interpreted by the existing methods and what more is needed for commercial / regulatory compliant conversational BOTs. To make the blog readable to general public I have avoided pasting the conversation code here.
To give a simple overview of how a Natural Language Understanding tool works – it starts, if any input modality is speech then speech recognition systems are implemented using Neural Networks. NLP annotation is done using tokenization, POS tagging or NER for pre-processing, followed by interpreter which is used to read the machine-readable semantics using various APIs. Finally, the output is printed using Text to Speech (TTS) or simple text output. There are various tools available like SIRI, Google Talk or Q&AMaker. The major fall-back mechanism is to search the web for the text which is either entered or talked and show the list of web sites which contains the text in consideration. When I made a simple SIRI Search for the “Spill support for Xylene”, I am shown a list of websites which contained xylene term. Similarly when I have used Google talk embedded within UBL speaker it does similar search output.
Some highlights of Conversation include like POS Tagging which stands for Parts of Speech tagging is a process where text is read and assigning the part as NOUN, ADJ, verb etc. POS Tagging is important, when I tried using the Xylene safety statement for personal protection equipment, I could easily identify the output terms like shown below
When the same text has been outputted for Lemma (Root form of the word being processed) , POS (Part of the Speech text), Tag, Dep, Shape, Stop (is the word stop attributes) attributes
The Lemmatization, which is word based on intended meaning it showed exact word for the search word. However when I have tried to use the Named Entity recognition (NER) it get into issues in identification of entities and places. When clearly demonstrates that I needed to do more knowledgebase to train the model.
Besides that there were other analysis I carried out, but for the sake of this blog, what I found is the Safety Data Sheet contains similar phrases which differ based on the legality. Such minor changes requires careful planning of training content and annotations. But when such conversations are implemented can be very much successful in the hazard communication programs be it internal plant site guidance if not regulatory support. When these conversational BOTS are integrated with Speech to Text Translators they can give very good insight to plant people in their day to day management of the chemicals.