Techie-Talk: The Inner Workings of Data Technology

Eleanor Barlow 

Interview with Dean Wronowski: Freelance Full-Stack Software Developer

JavaScript, HTML, CSS, and JQUERY books on a shelf with a cactus plant

 

Can you explain to us what you get up to as a software developer at Pansensic? 

Dean Wronowski: “As a software developer my job is basically to programme. At Pansensic, I have developed a system that allows colleagues within the company to launch a scrape. Which means to harvest or extract comment data.

“There are millions of comments on thousands of websites on the internet. So, for instance, you have been tasked to scrape URL’S relating to a certain product for a client. You can put all those URLS’ into the scraper UI and then specify whether they need to be translated, using either Google or Microsoft translator. Once you have adapted the settings to what you want, the scrapper will then go and grab all the available data. This could include the product title, descriptions, images and comments.

“After the data has been scraped, the system will then run a number of algorithms. Such as natural language processing (NLP), to process the text. As well as machine learning (ML) algorithms, such as classification and image recognition.

“These comments then get stored in a database. At the same time, you can start multiple scrapes to go through other sites and do the same thing to download information. This information gets put into a separate database. You can have several databases full of products and comments, all running at the same time.”

An algorithm

What happens next?

Dean: “Once all the scraping is done for each project, we use a database mapper that I have developed. This is using the same programming language (PHP) and allows Pansensic to transfer all the comments that have been scrapped. And for these comments to be auto-mapped into the correct Pansensic database. Pansensic can then get down to the analysing part. Which I am not at liberty to divulge.

“Webpages, websites and the tags within each webpage are, however, constantly changing. When we start off a scrape, we are trying to extract certain tags in order to get certain data. But because these tags are always changing we are having to change the scrape on a constant basis and look for different tags. As sites change, we have to change our scraper to match new technology. So, I have to keep adapting to these parameters frequently.”

What else are you up to?

Dean: “I have also started developing a new system to make managing the projects, alongside the scrappers, easier. It is called a Pan Organiser. This is using a technology called REACT (which is built by Facebook), REDUX, NODEJS, SASS and WEBPACK. All these technologies are quite new. We also use a WebSocket. So as soon as Pansensic colleagues go into Pan Organiser, they can add in a project, or the date of a project. Then, when any clients log in to the external system, they can see any updates in real time.

“As the scrappers go around the internet downloading information, it automatically uploads the stats (the number of comments and products). These will mechanically transfer to the Pan Organiser so that external clients can see if any comments have been downloaded for each project.”

statistic charts

How do you see the system you are building progressing in the future?

Dean: “Each time we update our work it is faster and easier for both us and our clients. It is always evolving. Like with the new system, we went down the route of using version control, by using a method called Bitbucket. We did this so that every time changes are made to Pan Organiser, they are recorded and tracked. I can select any files that have changed since they were updated last and commit a message to say that these files have changed. I can commit this change to Bitbucket and then, because of version control, you will be able to see the changes made. You will also be able to see where I have made these changes, at what time, and how long it has taken. Others (clients/directors) can then go into a chosen file and see what has changed, been removed, deleted and so on.”

How do you keep everything in one place? 

Dean: “We are also using this thing called Vagrant. This basically means that when you download or clone a repository from Bitbucket, which is where files are contained, you can download that repository to another developer’s machine. This means that you have all the files and set-up files in one place. So, if another developer were to launch a project, then all the frameworks would automatically install. As would all the libraries and code. Once they have installed all of this, it would then initialise and set up the database, ready for the system. The system would then start-up so a developer could login directly and start developing without having to learn how to transfer all the files. They would be up and running, ready to start developing, alongside the team at Pansensic.

“That’s why we are going down the route of Bitbucket, repositories and version control. It makes it easier to expand. And means we can all work as a team.”

You briefly touched upon it, but do you use ML and/ or AI, as a software developer, in your technologies at Pansensic?

Dean: “We are starting to. Nowadays, when you go to sites such as Instagram, it is all about imagery. In the past it was all about the comments. Of course, comments are still abundant. But images are rapidly increasing, especially on social media sites. People tend not to leave lots of comments or lengthy texts that we can analyse. As a result, we are starting to go down the route of actually downloading the images, then use image recognition to work out what is in a picture.

“Say someone uploads a picture of a green Nike shoe, for example. Say a review is left saying ‘I really love this product’. In the past we would have analysed the text but that relates to nothing without the image alongside it. Whereas if we upload the picture as well, and combine it with machine learning to decipher the picture, the comment becomes useful. 

“The machine will be able to identify that in the picture there is a shoe, it is green and has a brand logo. So can work out that Nike is in the image. This is very useful.”

Green Nike ShoeThank you, Dean, software developer extraordinar. 

We look forward to our next interview, as we take a closer look into just how amazingly terrifying Artificial Intelligence and Machine Learning can be.

 

(This interview has been lightly edited, for the purpose of clarity.)

Leave a Reply

Close Menu