The eighth week: Reflection and goodbyes

Welcome back to my last project update. I still can’t believe the fellowship is over and honestly it feels insane thinking back to the time we started this project and how much we’ve achieved by the end of it. I became aware of DTSF my freshman year from my international mentor and didn’t ever think I was capable enough to carry out my independent project with little knowledge of using technologies. But now that I completed my project, I feel proud and grateful for how far I’ve come. From dancing with a VR robot during lunch breaks to continuously debugging the same code for an entire week, I can safely say it has been a tough but fun ride.

The project kicked off by brainstorming ideas. Firstly, I knew I wanted to work with drones. I’ve always wanted to fly one and thought it would be great if I could program it and make it more useful. I came across an article that mentioned mind-controlled drones. I thought this idea of being able to think about moving a drone in a certain direction and making it do exactly was pretty incredible. However, this was too complex and time-consuming for me to execute so I settled on creating a voice-controlled drone instead. Now, let’s talk about how I turned this project idea into reality!


Firstly, I used the DJI Tello EDU drone for my project because it was easily programmable in python. It had pre-built APIs which were pretty flexible to use and small in size that could be easily tested indoors. Then I proceeded to write a program to convert speech commands to text commands so that the drone can receive those text commands and execute them. To do so, I used google’s speech to text API. While this was a decent API, it has some disadvantages too. Sometimes it would give widely inaccurate predictions and the drone would receive a wrong command. Below is an example where I passed a voice command ‘Take-off’ and it thought I said: “steve-o”, “Eva”, or “takeoff”.

To solve this issue, we used an external microphone. I also added a noise cancellation function so that the voice is clear and the API can predict more accurately.

Next step: Operating the drone via laptop. I used the official DJI Tello API to control the drone using my laptop. This part didn’t take a lot of time and I thought was the easiest one. As you keep reading, you will realize how wrong I was. Now at this point, I realized that I needed my laptop to connect to the internet for my speech-to-text API to work but I also needed my laptop’s WIFI to connect to the drone. Both could not be done at the same time. So, I used a USB wifi adapter to create a dual WIFI interface and this issue was resolved.

Finally, now was the moment of truth: feeding voice commands to operate the drone. I tried passing few commands but sadly the drone refused to take any other command than “takeoff”. It was as if the drone was confused about what to do after the execution of the first command was completed. My voice command took a while to process into a text command so the drone couldn’t stabilize until the next command was processed. I had to completely change my entire code and import a different API other than the official one to solve this issue. This took quite a bit of time.

At this point, my drone could follow the voice commands sent but I had to wait until the execution of the first command to pass another command. However, I wanted my drone to be able to take continuous voice input, create a queue of command sequences and execute it one by one. I did so using a programming concept called Threading. Using threading you can have two or more two code blocks running at the same time. This took a huge chunk of the project time because I had limited experience with threading, especially in Python. After I was successful in completing this part, my drone had a background program that would continuously take voice input and the main program would execute the commands.

I was almost done with my project but I wanted to add more functionality to make the product more user-friendly and interactive. I decided to add text to speech functionality as well. This will allow the user to ask questions such as “What’s the battery life?” or ”What’s the internal temperature?” and the drone should be able to speak back to the user through the external speaker. In addition to that, the user would also be able to provide their value for commands such as “go forward by 60” or “go up by 100” instead of the drone executing the commands using the default value of 50. Putting everything together, the drone was now able to process the commands fed by the user and even talk back to the user in certain situations.

This was the end of my original project idea. However, I had about a week and a half to add more functionalities to the project and make it even better. To do so, I wanted to make my drone be able to scan QR codes and process the commands encoded in the QR code. Firstly, I was able to create two different programs to encode and decode the QR codes. However, the problem arose when I tried taking a video stream from the drone. The thread that I used to take the video stream did not communicate well with the other previous threads and the program entirely crashed. I took a week trying to figure this out but I couldn’t and was running out of time. So, I started preparing for my final presentation instead.

The final “take-off”

The presentation was great and we all got really good responses from the audience. As for the future, I would love to keep working on this idea and take it further by completing the QR detection part, adding object recognition functionality, and in the end creating a prototype of a device that would help physically challenged people to move different objects around in their apartment.

At last, I would like to thank all my fellow friends along with Josh and Eric for being amazing support throughout the project. I learned so much this summer and hope to continue using the creative lab to expand my knowledge of different technologies.

7th week:

Welcome to almost the end of the project! This has been a bitter-sweet journey with some failures and some successes but I am proud to say that I definitely have learned a lot. Especially, patience.

I spent this entire week working on making my drone follow the commands encoded in a QR code. To do so, I started coding to write a program that could create a QR code out of the text. I had tons of installation issues and errors in the terminal. Then, I looked through the internet, and after following about 100 StackOverflow posts, I was finally able to solve it by installing the Visual C++ Redistributable package. Then, Josh helped me create some QR code stickers using a vinyl cutter. After this, I moved on to create a program to decode the QR code. Then, it took me a while to figure out how to get a video stream from the drone as I was using EasyTello as my API instead of the official one. So, there wasn’t a lot of documentation out there that I could follow. I tried copying functions from the official API but it didn’t work. Then, I tried different ways of getting a video stream and finally figured out how to do it. Even after this, there was a lag in the video and it wasn’t very clear. So, I first decided to try and pass the QR encoding function and see how it goes. The program collapsed and did not respond well to the other threads that were running.

Debugging the code to make the QR code scanning work took a lot of time so I decided to let go of this for a while and start focusing on preparing for the presentation. I took a lot of videos and edited them. We also had some practice sessions to improve our presentation.

Week 6: Progress taking off..

What a busy week! It took a couple of days for me to solve the issue with python concurrency. I was finally able to do it by having a code running in the background that continuously takes voice commands while the main program runs the commands given. After solving the issue, I was able to give voice commands even when the drone was still executing prior commands. I thought my work is done after solving this however google speech to text API has not been very nice to me these days. The following pictures are the screenshots of google’s API trying to predict my ‘’take-off’ command. Some of these predictions aren’t even close to sounding similar so you can see how inaccurate they can be sometimes.

I could maybe be able to make it do better by training it if it was a model I created on my own but since it is a pre-built API, I cannot make it perform any better than this. So, I just decided to work with what I had and maybe try to use it in a silent area with a better microphone for better predictions. Furthermore, I added text-to-speech functionality to the drone as well to improve the user experience. The drone would talk back to the user when their command is being processed and when they want to know what about the battery left, flight time etc.

Moving on, I wanted to add gesture control functionality to the drone so I started looking for datasets online to create a model and train it to recognize different gestures. After working on it for a little bit, I realized that the drone’s API already had a gesture recognition feature so I had to change my plan. Then, I moved on to the idea of creating a prototype for a drone assistant that could help physically disabled people to move things around in their apartment. First, I would use QR codes for the drone to scan and go to the destination room. After that, I plan to use object recognition to have the drone recognize objects. I started this by modifying my program so that it can take video streams from the drone. We also had a presentation this week with more people. It helped us a lot to gain more confidence and also to get feedback from the audience.

Week 5: More complications

My next step in the project was to further advance my program. At that moment, my drone could only follow commands after the execution of the command given beforehand. However, we wanted to be able to continuously give voice commands while the drone is still in execution and be able to execute the first command in the queue. The only way I thought we could be able to do this is by using threading. I went back on trying to use threading to solve this issue. I constantly got errors and did not know what was going on. First of all, I had an issue with storing the result from one thread and passing it to the other. When that didn’t work, I also tried creating a global variable that could be shared by the two threads in my program. However, variable sharing requires synchronization of the threads so that the variable would not be modified by both the threads at the same time. I used locks to do this. I could execute the program at this point but only the voice recognition function was executed and the program did not perform well. I tried modifying and debugging the code multiple times but it seemed like nothing worked.

I had been stuck with the same issue for a week and I was getting a bit frustrated because of that. So, I decided to keep looking at the problem while also doing some research on object recognition because I was already late on where I needed to be at this point in the project. So, I did some digging on the internet and read some articles on how python handles real-time object recognition. I plan on start implementing it next week.

Fourth Week:

I started this week trying to figure out the issue with keeping the drone stable after taking off. We thought of using multi-threading to solve the issue. However, since I had never worked with multi-threading before, especially with python, it took me a while to learn and implement it. I wasn’t able to solve the issue using the technique, so instead, I stepped back and looked for ways I could solve this issue without using any complicated concepts. Then, I found out that there were multiple libraries for the Tello drone apart from the official one that I used earlier. I found easyTello to be the most efficient and flexible library to use. Although it didn’t have a lot of documentation and tutorials on its functionality, it was pretty straightforward. I used it and could finally pass other commands such as go left, land, etc.

My next task was to add more advanced commands. I was able to add some commands but the processing speed of the commands was quite low. I thought the problem might be in the transfer speed of the adapter I was using. To verify this, I tried using an ethernet cable to get internet access and tried connecting my laptop to the drone’s wifi. However, I wasn’t able to do so so we went back on using the adapter for the drone connection. I added some more commands, did some testing and debugging by referring to the API documentation and everything seemed to work fine.

Then, I started working on making my program more flexible in terms of how commands are passed. At that time, I had to wait until a command was executed before I could pass another command. But, I wanted to be able to pass a command sequence and have the drone follow it one by one. I did so by breaking the command sequence into smaller command segments using different keywords such as go or move. Then, I extracted integers from the command sequence and passed it to the drone so that it can execute commands passing any distance or speed according to the choice of the user. My plan for next week is to polish the speech recognition interface and move on to the object recognition part of the project.

Third week: Errors and debugging

I started this week by trying to connect my drone using a USB wireless network adapter so that I could use my laptop’s internet connection to get the google speech to text API working. The process was easier than I expected, and we got it working pretty soon. After that, my task was to combine the speech recognition code with the code that connects to the drone and have the drone follow the commands using my voice. I was able to have the drone takeoff using my voice command. After that, I created a loop that continuously took voice input from the user until the drone landed. Then, I tried running the code to pass different commands after taking off. It didn’t quite work. The drone would take off, wait for a response for a couple of seconds and make an emergency landing. I thought it might be due to the time lag between the completion of the take-off command and the execution of the command after that. So, I increased the response timeout so that it wouldn’t throw an error even if it took a long time for my voice to convert into text commands. It didn’t work. After then, to figure out where the problem was I executed a code that controls the drone using the keyboard. It worked well and I was able to control my drone using keyboard controls without any time lag. So, then I thought the problem might be somewhere in the process of converting my voice into text commands while the drone command is in execution. I did some digging on the internet to see if any other people have had some errors but I could find nothing.

Then, after learning more about drone programming and watching a bunch of YouTube videos on people working on drone projects, I realized that the drone performed the emergency landing because it didn’t know what to do after the first command was executed. It takes a while for the program to take my voice input and change it to text commands so by the time my second command is processed, the drone is confused because it doesn’t have any command to execute. So, I tried manipulating the Tello library to keep the drone stable on the air if no response is given. It didn’t work and I got new errors after that. I am still investigating how to solve this issue.

Second week

This week I ran into a problem where I needed my laptop to connect to my drone’s wifi but I also needed an internet connection for my speech to text API to convert my speech commands into text commands. Due to this issue, I tried looking for offline APIs. For the first try, I used pocket sphinx which is a part of the CMU Sphinx Open Source Toolkit For Speech Recognition. I ran through some errors along the way but when it finally started working it gave widely inaccurate results. My second try was the VOSK speech recognition toolkit which worked better than the one I used before but was tough to work with. Then finally, I used a wireless USB LAN adapter to connect my laptop to the drone and used my computer’s wifi to connect to the internet so I could use the APIs that required an internet connection. I went back to using Google’s speech-to-text API. 

Alongside the speech-to-text API issue, I was also working on trying to connect to the drone using my python code. I tried using different python commands to control the drone. I experimented with different commands that could flip the drone, make it travel in a curve, etc. Then, I also wanted to look at how accurately the drone followed the commands. For example, when I sent the command to move forward by 20 cm, it moved in the right direction but the measurement had some fluctuations by about 15%. I repeated this continuously to see how off the drone was. During the trials, I also realized that the drone gets a little unstable during the flight sometimes which messed up the measurements. I plan to work on improving the stabilization later in the project. 


Hi everyone! My name is Pratikshya Prasai. My DTSF project this year is to create a voice-controlled and object-recognizing drone. Object recognition is a computer technique used for finding and identifying objects by feeding in the data of an object and make the machine learn how to identify it. For example, after my project is completed, I should be able to command my drone to find someone and click a picture of them. It would search around its environment to identify the person and then fulfill the command.

For my first week of the project, I focused on finding resources about object recognition, drone programming, types of drones, etc. I finalized the materials I would need at the moment and planned out my project over 8 weeks. I also had to figure out which drone I would use, and which programming language was suitable for my project. I watched videos of the people who have done similar projects in the past and learned how to program a drone and make it follow my commands. I was still unsure about how I would implement my project idea, but I decided to take a small step at once. So, I began by writing code to convert my voice commands into text commands that the computer would be able to recognize. I looked into the best and the simplest API to use for my project and completed writing the code. The processing speed of my program is pretty slow right now, so I am also trying different approaches to speech recognition by using a machine learning model instead of the python package that I am currently using to see if that works any better.

Apart from working on my project, I have also been studying for the FAA exam to receive my drone license and be able to fly outside. The agenda for the next week is to have the drone follow basic commands and also keep studying for the exam. Additionally, I also plan to learn how to fly the drone properly and avoid crashing it everywhere.