GitHub repository
Please visit my GitHub profile for all repositories related to data science.
Visit:
https://gist.github.com/thetongs
Data analysis initial steps for every dataset
For every data analysis problem on dataset we need to perform analysis in very procedural way. So here I present you the initial data analysis steps on every dataset with its insight explanation.
Visit :
https://github.com/thetongs/Data-analysis-initial-steps-version-1/blob/main/analysis_steps.py
Data preprocessing initial steps
To solve any data science related problem we need to convert our provided dataset into training and testing format that algorithms accept.
As we all know that all data science engineer spend most of their time on data preprocessing. So it is very essential to understand some important data preprocessing steps.
Visit:
Audiobook create from pdf file
Simple python based project. Taking pdf file as input and generate the audio file from it as a audiobook.
Visit:
https://github.com/thetongs/Audio-Book-Creator-Using-PDF-and-Python
Image converter into other formats
Import any image and convert into any format from jpg, jpeg, png, tif, tiff, bmp, gif and eps.
Visit:
Identify the language of document using adv. NLP
Best way to identify language of document with the help of Adv NLP approach and Python.
Sometimes it very important in NLP based project to identify language of the file before applying any analysis or functions on it. So the big question is how to identify the language of the different types of files?
Answer :
In almost every language on this planet we have some common set of words that we use in the communication medium that includes talking or in writing. So how about using those set of common words to identify the language of different files.