Python is simple yet powerful. In recent years Python gained its popularity due to its availability of a vast range of libraries. It has tons of libraries for every domain of software. According to traditional definitions of programming language, Python is a high-level, interpreted general programming language. It has a simple syntax which makes it very popular among programmers. While Python is used to develop business logic, MongoDB is used to store data. One of the key features of MongoDB is it can handle unstructured data. As it can handle unstructured data, it can also handle structured data. All Natural Language Processors(NLP) often use unstructured data.
The problem with human-based language is that it can not be represented in tabular format. This is a major drawback in tabular database software like MySQL, MS SQL Server, etc. Coming back to Python, Pymongo is a library or simple terms a bridge that connects Python and MongoDB. In this blog, I shall try to demonstrate the same functionalities Pymongo and MongoDB.
Important things to remember pymongo:
There are few differences in the structure of MongoDB and SQL like databases. Firstly, the table in SQL is the same as a collection in MongoDB. Secondly, rows in SQL is the same as documents of the collection. Thirdly, columns in SQL are the same fields in MongoDB.
What is a Cursor?
It is a temporary memory location with stores the resultant of any query or function return.
How to connect MongoDB with python?
To connect python with MongoDB, make sure MongoDB is installed and the mongo database server is running.
Step 1. Import MongoClient method from pymongo.
Step 2. In MongoClient method pass the IP address of the server, in our case, it is the host machine and port number.
Step 3. Mention the database name if it exists, if it does not exist then a database with the same name will be created.
Step 4. Mention the collection name inside userdb. If the collection is present then the cursor will point to that collection. And if the collection is not present, then a new collection with the given name will be created.
How to read data from mongo collection and store it in Dataframe?
Find() without any condition queries all the data present in the collection. Queried data converted to a list. Dataframe() converts the list into the pandas dataframe.
Count number of documents in pymongo collection:
Count() return number of documents present in the collection. The above code is an example of pymongo. In below code demonstrates that how it working in the Mongo GUI.
Greater than / less than query in pymongo:
There are situations where we need to find data that falls under some conditions. In this example, we have a field called ‘Date’. Let’s say we want to find all the documents whose date is greater than ‘2016-07-20’. Greater than is denoted by $gt. Likewise less than is denoted by $lt. As $gt or $lt is a MongoDB operation therefore, while executing these through Python we have to keep them inside inverted commas. The pic below refers to the same operation using Pymongo.
The pic below refers to the ‘greater than’ operation using Mongo GUI. Notice $gt is without an inverted comma when executed in MongoDB GUI.
Greater than X, less than Y
Query to get all the documents greater than ‘2015-07-10’ and less than ‘2016-05-13’.
How to sort documents with respect to a field:
There is a sort() in MongoDB and Pymongo, which sorts to documents with respect to a field. The sort() takes two arguments. The first argument is the name of the field and the second argument is either 1 for ascending order or -1 for descending order. The below pic shows how to sort() function works in Pymongo.In the below figure, sort() applied to the volume field. The argument -1 denotes descending sort.
The pic below shows how to execute sort() in MongoDB.
Query documents with AND condition:
Like other programming languages, the AND condition return if both the conditions are true. How it works in pymongo , mention in the below example. The query to return all the documents whose ‘Volume ’ is greater than 18986725 and ‘Open’ is less than 280.64
The image below shows, how query with and condition returns documents