Background
During a final year module at university, I was tasked to produce an application that biologists could use to search amyloid protein sequences for a specific pattern that is known to cause neurodegenerative diseases such as Alzheimer’s disease.
The design decided upon was a website based application, which in my opinion is much better suited to the intended audience. The front-end website would deal with all the user input options whilst keeping a clean and minimalist design to ensure accessibility for all levels of technical literacy. The backend of the service consisted of a Python script that would perform some operations on the users inputted value, and then feed the results back to the HTML document for the user to view.
Due to limitations on the school webserver, modern Python libraries and frameworks were not used. Instead, to run python from the web, CGI (Common Gateway Interface) was used. The CGI script written in Python would be invoked by the HTTP server and would then process data fed through the HTML forms on the web front-end.
Implementation
The following code highlights how the linear motifs were found in the given sequence number. The basis of the function was a text mining task, utilising a regex pattern which was converted from the given prosite pattern.
The website facilitates several input options, inlucing a single protein search and a batch protein search. This was achieved using multiple functions with differing techniques of splitting up inputted queries. Batch queries were incrementally run through the Uniprot API and then output was provided to the user once all queries were complete.
Work Produced
The final product is highlighted in the images below: