Customized Solutions to Your Problems

ALBORZ TECH Company has a professional and integrated group of researchers with high expertise and experience. In addition to different products, the company is able to carry out various projects including research and development services in the field of artificial intelligence. ALBORZ TECH is ready to carry out all scientific and research projects related to computer science, artificial intelligence and signal processing. Some of the projects that the company has carried out so far are listed in the following section.

For more information, please contact the company.

 

Voice commands recognition

The purpose of this system is to create speech communication between human and machine. With the voice command capability, humans can perform routine tasks by speaking with the machine instead of using keys and buttons. Applications of this system are:

  • Setting up and controlling computer programs via speech
    This capability enables users to use speech to perform computer tasks or control software. For example, the user can say: “Connect to Internet” or “Internet”, and the internet browser will be opened and connected. Alternatively, by saying: “enlarge the font size of the text” or “larger”, the font size of the written text becomes larger in an editor. The user can define various voice commands for each of his/her installed softwares start controlling the software by uttering them. With the help of the system, working with computers and software gets easier and faster, especially for users not familiar with computers and people with physical disabilities.
  • Home and industrial automation using speech recognition
    the purpose of this system is to provide solutions for remote speech recognition to control devices and instruments. Some applications of this system are in cars, home or factories to perform various commands such as turning on or off a device and controlling the robots. This system can also be used via telephone lines for remote control in smart buildings.
  • Usable in educational software and games
    Voice commands can be utilized for increasing new capabilities in various applications such as games and educational softwares. Giving this ability to a software adds new features to the system and makes it way more attractive. To name a few, we can mention applications that do need answers/questions (such as language learning and Quran recitation learning, softwares with multiple-choice tests, etc.).

 

Speech recognition on small computers (cell phones, DSP, etc.)

There is a growing interest in using small processors such as cell phones, PDAs and DSPs in many applications. With limited processing power and memory, developing software in this context is difficult and complex. ALBORZ TECH has also developed a version of the speech recognition system on processors with limited resources such as DSPs (for use in embedded applications as part of other systems) and mobile phones with high efficiency and optimum processing speed.

Some applications of these systems are as follows:

  • Providing speech applications on mobile handsets
  • Speech dialing and voice-based SMS on mobile handsets
  • Speech to speech Voice Translator

 

 

Natural language processing (NLP)
one of essential requirements of artificial intelligence applied systems such as speech recognition, text to speech conversion, machine translation, optical characters recognition and correction of typing errors is the incorporation of language models and language information. ALBORZ TECH uses the latest methods in Natural Language Processing to extract and apply language information to various systems. This has resulted in using a large volume of information for the first time in Persian. Immense language information has been used in our speech recognition engine such as Persian statistical language models, Persian grammar Model, and a set of computational vocabularies for Persian language. This information can be used in different applications and research activities.

 

Pronunciation ranking of words and phrases

The examination of the accuracy of words and phrases in pronunciation correction softwares (such as Quran recitation and language speaking learning) is a smart and helpful feature that not only contributes to high quality teaching, but also increases the attractiveness of the software. This feature may be used as a module or SDK in different applications. Based on pattern recognition techniques and statistical modeling, the feature can transform the similarity between the word/phrase pronounced by the user and the reference word/phrase into a score. This module can act dependent or independent from the speaker and language.

One of the applications of this system used so far is scoring the recitation of the Quran incorporated in software known as “The first smart Quranic software in the Muslim world”. This software assigns a score to users’ pronunciation to help them learn the correct  Quran recitation.

 

Speech quality enhancement

The need to make the sound understandable and to enhance the quality of speech sounds has been a very longstanding need. This is done through removal of noise added or convoluted by clean speech signal being recorded in a lecture, music performance, etc. Using the latest techniques in this field, ALBORZ TECH has embarked on performing researches and producing a product for this purpose. This product can be used as an independent application or as a separate unit in other applications. For instance, using this unit in speech recognition systems in noisy environments such as a moving car or in an exhibition improves the efficiency and accuracy of the ASR system. This engine can be customized and optimized based on the rquirements of any specific application.

 

 

Continuous speech recognition

Continuous Speech Recognition (CSR) implies recognition of human speech by computers and converting them into texts in which the input speech as a continuous series of words and sentences is uttered. ALBORZ TECH is currently using the newest available techniques to develop a continuous speech recognition engine with a large lexicon and independent of the speaker. By using this engine, Persian speech dictation software is produced in various versions. By using this engine, the possibility of designing and developing speech dictation software for other languages ​​(including English, Arabic, Kurdish, etc.) is also provided.  Further research for improving recognition accuracy and increasing the capabilities of this engine is still ongoing.

The engine uses hidden Markov model and Mel frequency cepstral coefficients (MFCC) as a core feature extraction with some modifications. In addition, the engine is equipped with the known robustness techniques such as:

  • robust features: CMS, PCA, RASTA-PLP, RCC, Liftering
  • speech enhancement: Spectral Subtraction, Microphone array and beam-forming
  • model adaptation: MLLR and MAP
  • model prediction: PMC
  • speaker normalization: VTLN

 

 

Telephony speech recognition

In parallel with continuous speech recognition project in which speech is usually given to the computer through a microphone, telephony speech recognition project has also been carried out. Comparing to microphone speech recognition, telephone speech recognition is much more complicated. This is due to telephone speech quality that its bandwidth is limited to 4 kHz. Furthermore, telephone speech is usually conversational and there are a high variety speakers and communication channels as well. This issue makes us using techniques for telephone speech recognition different from microphone recognition. Telephone speech recognition is used for recognizing numbers and speech commands in speech-based IVR systems.

 

Text to speech (TTS)

The aim of this project is reading electronic texts. This project has two parts or sub-projects:

  • The first part is converting text into a sequence of phonetic units (such as phoneme, syllables, etc.) and then the second part is converting sequence of phonetic units to speech (speech synthesis). The first part is language dependent and for each language must be done separately, however, the second part can be done independently of the language. For the first part, ALBORZ TECH has developed an engine for text to phonetic unit converter (TTP) in Persian language.
  • For the second part, a speech synthesis engine of high quality using new methods of synthesis has been designed and developed which can be deployed for many languages. The main issue in converting text to speech is voice output quality that is tried to be closer to human speech and less similar to machine-like speech.

Efforts to increase the output speech quality of text-to-speech system are being continued at ALBORZ TECH

 

Speaker recognition

Human voice has Biometric and unique characteristics. The aim of speaker recognition is the extraction of information from speech signals which contains the unique characteristics of the speaker. Speaker recognition comprises two areas of identification and authentication. In the former case, the person is identified from his/her manner of speaking and in the latter case, the person is identified from a claim based on who is to be confirmed or rejected. The system of voice recognition from human voice – can be used in various applications of security and access controls, alone or together with other security methods.  ALBORZ TECH has developed a speaker identification system with open range that can run online or offline. It is able to perform processing on telephone lines and satellite.

 

Keyword spotting

Keyword spotting, finding specific words in an audio stream, is another research field in ALBORZ TECH the first version of this software is now available in Persian and English and the researches are going to make the system more robust to acoustic variations.