Data Science

From Sinfronteras
Revision as of 15:49, 22 February 2023 by Adelo Vieira (talk | contribs) (Replaced content with "{{Sidebar}} <accesscontrol> Autoconfirmed users </accesscontrol> ==Projects portfolio== <div style="margin-left: 20px; width: 550pt; margin-top: 50px !important"> <ul> {{...")
Jump to: navigation, search



This is a protected page.

Projects portfolio



Data Analytics courses

Data Science courses


  • Posts


  • Python for Data Science and Machine Learning Bootcamp - Nivel básico
https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/
  • Machine Learning, Data Science and Deep Learning with Python - Nivel básico - Parecido al anterior
https://www.udemy.com/course/data-science-and-machine-learning-with-python-hands-on/
  • Data Science: Supervised Machine Learning in Python - Nivel más alto
https://www.udemy.com/course/data-science-supervised-machine-learning-in-python/
  • Mathematical Foundation For Machine Learning and AI
https://www.udemy.com/course/mathematical-foundation-for-machine-learning-and-ai/
  • The Data Science Course 2019: Complete Data Science Bootcamp
https://www.udemy.com/course/the-data-science-course-complete-data-science-bootcamp/


  • Coursera - By Stanford University



  • Columbia University - COURSE FEES USD 1,400



Possible sources of data


Irish Government Data Portal https://data.gov.ie/
UK Government Data Portal https://data.gov.uk/
UK National Health Service Data https://digital.nhs.uk/data-and-information
EU Open Data Portal http://data.europa.eu/euodp/en/data/
US Government Data Portal https://www.data.gov/
Canadian Government Data Portal https://open.canada.ca/en/open-data
Indian Government Open Data https://data.gov.in/
World Bank https://data.worldbank.org/
International Monetary Fund https://www.imf.org/en/Data
World Health Organisation http://www.who.int/gho/en/
UNICEF https://data.unicef.org/
Federal Drug Administration https://www.fda.gov/Drugs/InformationOnDrugs/ucm079750.htm
Google Public Data Explorer https://www.google.com/publicdata/directory
Human Rights Data Analysis Group https://hrdag.org/
Armed Conflict Data http://www.pcr.uu.se/research/UCDP/
Amazon Web Services Open Data Registry https://registry.opendata.aws/
Pew Research Datasets http://www.pewinternet.org/datasets/
CERN Open Data http://opendata.cern.ch/
Kaggle https://www.kaggle.com/
UCI Machine Learning Repository https://archive.ics.uci.edu/ml/index.php
Open Data Network https://www.opendatanetwork.com/
Linked Open Data - University of Münster https://www.uni-muenster.de/LODUM/
US National Climate Data https://www.ncdc.noaa.gov/data-access/quick-links#loc-clim
US Medicare Hospital Quality Data https://data.medicare.gov/data/hospital-compare
Yelp Data https://www.yelp.com/dataset/challenge
US Census Data https://www.census.gov/data.html
Broad Institute Cancer Program Data http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi
National Centers for Environmental Information https://www.ncdc.noaa.gov/data-access
Centers for Disease Control and Prevention https://www.cdc.gov/datastatistics/
Open Data Monitor https://opendatamonitor.eu/
Plenario http://plenar.io/
British Film Institute http://www.bfi.org.uk/education-research/film-industry-statistics-research
Edinburgh University Datasets http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html
DataHub http://datahub.io



What is data

It is difficult to define such a broad concept, but the definition that I like it that data is a collection (or any set) of characters or files, such as numbers, symbols, words, text files, images, files, audio files, etc, that represent measurements, observations, or just descriptions, that are gathered and stored for some purpose. https://www.mathsisfun.com/data/data.html https://www.computerhope.com/jargon/d/data.htm



Qualitative vs quantitative data

https://learn.g2.com/qualitative-vs-quantitative-data



Qualitative data Quantitative data
Qualitative data is descriptive and conceptual information (it describes something) Quantitative data is numerical information (numbers)
It is subjective, interpretive, and exploratory It is objective, to-the-point, and conclusive
It is non-statistical It is statistical
It is typically unstructured or semi-structured. It is typically structured
Examples:

See unstructured data examples below.

Examples:

See structured data examples below.



Discrete and continuous data

https://www.youtube.com/watch?v=cz4nPSA9rlc


Quantitative data can be discrete or continuous.

  • Continuous data can take on any value in an interval.
  • We usually say that continuous data is measured.
  • Examples:
  • Measurements of temperature: ºF.
Temperature can be any value within an interval and it is measured (not counted)


  • Discrete data can only have specific values.
  • We usually say that discrete data is counted.
  • Discrete data is usually (but not always) whole numbers:
  • Examples:
  • Possible values on a Dice Roller:
  • Shoe sizes: . They are not whole numbers but can not be any number.




Structured vs Unstructured data

https://learn.g2.com/structured-vs-unstructured-data

http://troindia.in/journal/ijcesr/vol3iss3/36-40.pdf


Structured data Unstructured data Semi-structured data
Structured data is organized within fixed fields or columns, usually in relational databases (or spreadsheets) so it can be easily queried with SQL

https://learn.g2.com/structured-vs-unstructured-data

https://www.talend.com/resources/structured-vs-unstructured-data

It's data that doesn't fit easily into a spreadsheet or a relational database. The line between Semi-structured data and Unstructured data has always been unclear. Semi-structured data is usually referred to as information that is not structured in a traditional database but contains some organizational properties that make its processing easier.
  • Examples of structured data include:
  • Quantative data:
  • Weather forecast data: Measurements of temperature, precipitation (in millimeters (mm)), atmospheric pressure, wind speed, cloud coverage
  • Seismic data: Measurement of ground movement caused by seismic activity.
  • Housing data: Gattered housing data composed, for example, by Price, Area of the house, Number of rooms, House age, Area population, Avg. Income of residents of the city
  • Numeric financial information and Market reports


  • Another good example of structured data is a company's database where the company stores all the data that is usually associated with the ERP (Enterprise resource planning: A suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities), such as:
  • Human resource data: For example, an «Employees» table: id, fname, lname, dob, email, phone_number, address
  • Customer data (Customer relationship management (CRM)): «Client» table
  • Projects data
  • Accounting data
  • Text files: Word docs, PowerPoint presentations, Email, Chat logs, Text messages, Customer reviews, News articles, etc.
  • Email: There’s some internal metadata structure, so it’s sometimes called semi-structured, but the message field is unstructured and difficult to analyze with traditional tools.


  • Media files (Images, Audio, and Video files): Satellite images, surveillance images/videos, Call recordings (Call logs), Music audios/videos, Locations, etc.


  • Some sources of data are:
  • Social Media data: Data from social networking sites like Facebook, Twitter, and LinkedIn
  • Mobile data: Text messages
  • Call centers data

For example, NoSQL documents are considered to be semi-structured data since they contain keywords that can be used to process the documents easier. https://www.youtube.com/watch?v=dK4aGzeBPkk


It is important to highlight that the huge increase in data in the last 10 years has been driven by the increase in unstructured data. Currently, some estimations indicate that there are around 300 exabytes of data, of which around 80% is unstructured data.

The prefix exa indicates multiplication by the sixth power of 1000 ().


Some sources also suggest that the amount of data is doubling every 2 years.






Data Levels and Measurement

Levels of M