Understanding Python for Machine Learning – Part 1 (Data Types in Python)

Machine learning has become one of the most sought after fields in the current times. With advancements in both software and hardware, building complex mathematical and computational heavy models have become easier and quicker. Introduction of TPU (Tensor Processing Units) have further expedited the process of machine learning models. While all the advancements in the field of hardwares have enabled this change, one notable change has been in the use of python for machine learning algorithms. 

Marking a shift from traditionally used language R (which is still preferred by statisticians), python has gained popularity in recent times amongst many python developers and python consultants. Many new 3rd party machine learning tools now provide python support out of the box. Infact, many companies have now started calling themselves python development companies.

Hence, through this series of articles, we will understand python for machine learning and how python consultants can leverage python to build robust machine learning models. As a first article in this series, we will understand different data types in python and their syntax in python.

Let us begin!

There are majorly 7 types of data in python

TextString (str)
NumericInt, Float, Complex
SequenceList, Tuple, Range
MappingDictionary
SetSet, Frozenset
BooleanBool (True/False)
Binarybytes, bytearray, memoryview

We will now understand each of the above types in detail.

  1. Text

Strings in python are a stream of characters, contiguous in nature. They are often represented in between single quotes. Strings allow slicing on them, that is, we can use parts of a string to extract substring out of it. One important thing to note is that unlike R, indexes in python start from 0. 

Syntax and Output:

We can also use + operator to concatenate the strings.

  1. Numeric

This type of data includes use of integer (signed), float (real values of floating point) etc. These are number inputs. It is important to note that all integer types in Python 3 are long. Hence, there is no separate long type in Python 3.

Syntax and Output:

  1. Sequence

This is probably the most frequently used data type in Python. It finds several applications at different places. The basic data types like numeric and text are also used inside the sequence data type. That is, a sequence data type can contain data of type integer and float. To understand the concept better, let us understand what is a list, tuple and a range. 

  1. List

As mentioned above, since sequence data type can contain other data types as well, it is also called a compound data type. Lists are very versatile. List stores a sequence of values. It can be accessed using ‘[ ]’  and can be manipulated using operators like ‘+ and *’.

Syntax and Output:

  1. Tuple

Similar to lists, except the values in the tuple cannot be changed/updated. The size of the tuple can also not be changed once created. In other words, tuples can be thought of as “read-only lists”. One change also lies in the syntax of tuple and list. Tuples are formed using parentheses ( ‘( )’ )

Syntax and Output:

  1. Range

Range is often used in loops, to define a set of numbers over which the counter will iterate. It is used to create a sequence of numbers, with start(included) and end(not included) as its parameters. 

Syntax and Output:

  1. Mapping (Dictionaries)

Dictionaries are key-value pairs in python. Like hashtables, we have a key which corresponds to a particular value. When we want to access the value, we use the key corresponding to the value and the value is retrieved. 

The key is generally an integer or a string, though they can be any of the python data types. The value can be any python object.

Syntax and Output:

With this, we come to the end of this article. In this article, we learnt how python developers and python consultants can use different data types in different scenarios. A python development company, which is planning to expand into machine learning and AI, should be able to understand and deliver on this basic premise of data types in python. This base can then be used to build great machine learning models.

As promised, as a multi-series tutorial, we will cover “Linear Regression in Python” in the next post.

Until then, bye bye!

Leave a Reply

Your email address will not be published. Required fields are marked *