Python and its ecosystem

 

The different breeds

This is a free safari into the section of jungle where many species pythons delve. The list is large. Nuances involved in working with each species are different. In this dispatch we will roll up our sleeves and bring in one or if possible multiple of them and see for our selves how different they are.
Don’t worry we will not bring the real python which might eat you alive! We will rather bring in python’s which will eat the problems alive that too using novel mathematical and statistical techniques. We will be; starting from this dispatch configure one such ecosystem. Followed by defining a problem which can be solved by machine learning for later dispatches. Ultimately, we will be using the configured distribution, we will attempt building a model to address the problem. We do not intend to conclude this train of dispatches. We will keep sifting through different advancements in technology and problems that appear in this jungle.

Flora & Fauna

This dispatch is for developers who have spent great amount of time in programming Line of Business application, configuring a product in customer environment, developing a cool UI for a customer.
For the generation of developers who have started straight in Machine Learning might find this trivial. For such audience, I recommend navigate to this domain after few weeks. By then I intend to have reached the dispatch which might kindle your appetite.

You can look upon the different distributions of python as different flavour of… say JDK. Java through their specification process have defined a standard for Java Virtual Machines and there are few such distributions in market. E.g. The one from Oracle, then you have from RedHat and IBM etc.
In very similar manner Python as a language evolves . There is a PEP (called Python Enhancement Process) which involves community members and industry leaders who direct the standards which different python distributions should comply. New enhancements are first implemented in the standard CPython first for other implementations to have a common base (at least sounds logical).
You could dive deep but our focus is on the variety of its distributions and its application. With this as given let us zoom across to the variety.

There are broadly 3 categories in which python is distributed –
1. Cross-compiling distribution – Implementations which allow python code target a different virtual machine than native one which CPython targets. Examples of this include IronPython (targets .NET), Jython (targets JVM), CLPython (targets LISP) etc.
2. Repackaged / Commercial distribution – Implementations which are derivatives of CPython but are backed by organizations with support or libraries or licenses or combination of these. Examples include ActivePython, AnacondaPython, winpython, pythonoxy etc.
3. Browser hosted distribution – Implementations of this kind offers python to be hosted in cloud and developers can access it from browsers. Example include PythonAnywhere

We pick the Repackaged / commercial distribution first and in specific Anaconda distribution. Do not get mislead by the word commercial in the category. The word is to emphasize many such repackaging offers commercial support typically required for enterprise and businesses. Each of them differs slightly in the way they allow developers work with the ecosystem. This dispatch is to bring such aspects to surface.

Anaconda distribution

Anaconda is managed, licensed and distributed by Anaconda Inc. It came into existence in 2012. It is known to bundle python environment for different roles with effectiveness – Data scientist role, Information technology professionals, and Executives.

While stating the effectiveness of this distribution role of people are of because, the software should cater to the requirement of those roles. Having said that let us take a look at important aspects which a data science implementation in enterprise should address –
1. Development done with ease
2. Collaboration which is frictionless
3. Governance which is restrained
4. Deployment for omnipresence

These aspects clearly drive the ridge between the roles of data scientist and IT professional. E.g. In stereotypical manner aspect 1&2 could point directly to a developer, aspect 3&4 could be of interest for IT professional. however, executive will have their expectations placed in aspect 2-4.
Funnelling down we park aspects 2-4 for later dispatch. Development done with ease a practice is tricky and difficult to achieve particularly when one doesn’t have framework and practice in place. Anaconda is to serve that purpose with a feature called Conda.

Conda is a package and environment manager which comes with the distribution of Anaconda which anyone can download . There are three important concepts in Conda which a user of Conda should comprehend well.

Package

Is the bundling specification which contributes to development with ease and collaboration without friction. You need not send across your .py files to other developers or consume raw .py files from other developers. Package offers a .tar.bz2 file which can be referenced within python code for you to re-use a cool package developed by others or distribute your cool package to wider audience. For developers across the java or .net world package can be drawn as vague parallel to .dll or .jar file.

Channel

Is the medium through which these packages are discovered. The channel is backed by a repository either private or public. Private repositories can be seen in enterprise where proprietary packages are shared between the teams. Public is easy to spot and in fact Anaconda lists public channels here https://repo.anaconda.com/pkgs/. For developers from the mobile development world you can draw parallel to App store, whereas for developers from the .net or java world can draw parallel to Nuget repo, or Maven repo.

Environment

Even before you start this gem of feature until you have not spent hours fixing the dependencies of a python project you might not appreciate the value of this feature. You can launch yourself in the web and discover typical problem of developers. One that I could mention is here https://www.nylas.com/blog/packaging-deploying-python/.
Environments solve the problem of developers where they need to install multiple version of a package and associated dependencies. These environments allow developers to switch between projects without having to uninstall or install modules and run the risk of polluting the module space. In the simplest manner environment is the directory which contains the collection of packages. The more environments you create the more directories (storage will be used) you will notice.
If you have just installed the Ananconda you could fiddle with the environment. Anaconda themselves publish this as a cheat sheet (https://bit.ly/33kOR1S) which you could use to fiddle as well use latter during the development.

These three concepts are pivotal before we begin with actual development to solve problem statement of a business. A developer with clear grasp on this concept will be able to wield the power of Anaconda distribution.
Please take time always to fiddle with environment and know the how’s of a tool (in this case python). Next dispatch we will take a discourse to another active distribution ActivePython.