Week 6 & Week 7 - Scraping & Structured Extraction
A hands on recap of two weeks focused on accessing, parsing, and exporting data through modular scraping workflows.

Hey, I'm Ramya ๐I write to learn, and I learn by building. This space is my digital notebook where curiosity meets clarity and every post reflects a milestone in my journey. I'm a final-year B.Tech student in Artificial Intelligence & Data Science at GMR Institute of Technology. I recently completed an internship at Tao Digital, where I worked on AWS cloud services and contributed to a Smart Fridge Annotation Project using YOLOv11. Learning Out Loud is my blog a place where I document what I learn, build, and reflect on. Itโs organized into evolving series like:๐ Foundation Phase Series : Week-by-week insights from my early cloud and data engineering journey. I believe in thoughtful growth, clean documentation, and expressive storytelling. Whether itโs building ETL pipelines, annotating datasets, or writing about yoga and balance Iโm here to share what matters.
๐ Introduction
After completing my Foundation Phase, Iโve stepped into the Core Workflow Phase where the focus shifts from setup to flow. These two weeks were all about understanding how data moves, how tools connect, and how logic becomes pipelines. I transitioned from raw access to structured extraction, building scraping systems that are both ethical and scalable.
๐ง Week 6: From Curiosity to Craft
This week laid the groundwork for scraping starting with the โwhyโ and moving into the โhow.โ
๐น Scraping Ethics & Strategy
I explored the ethics of scraping and how to design flows that respect site structure and access policies.
Topics Explored:
Static vs dynamic sites
Choosing between
requests,Scrapy, andSeleniumMimicking human behavior with headers, delays, retries
Scraping as negotiation: availability vs permission
๐น Python Requests: Lightweight & Powerful
Requests became my gateway to APIs and HTML pages. I focused on modularizing logic for reuse.
Topics Practiced:
GET/POST requests with headers and query parameters
Pagination, timeouts, session persistence
Status codes and response handling
Reusable request functions across scripts
๐งฉ What Shifted
I started thinking in flows:requests โ raw HTML โ (next: parsing) โ cleaned data โ reusable modules
๐ง Week 7: From HTML to Structured Data
This week was all about parsing, spidering, and exporting turning raw HTML into usable datasets.
๐น BeautifulSoup: Parsing with Precision
I learned to navigate HTML like a map identifying tags, classes, and IDs to extract structured data.
Topics Practiced:
.find(),.find_all(),.attrs.text,.strip(), regex cleanupHandling missing elements and nested tags
Combining with
requestsfor full flow
๐น Scrapy: Scalable Crawling
Scrapy introduced a new mindset modular, maintainable scraping with spider logic and pagination.
Topics Explored:
scrapy startproject, spider scaffoldingresponse.css()andresponse.xpath()Pagination with
response.follow()Exporting to CSV/JSON
Navigating GitHub documentation and source code
๐น CSS Selectors: Targeting with Intent
I refined my selector logic to extract movie names, URLs, and nested elements with precision.
Topics Practiced:
Combined selectors for clean output
Selector testing in Scrapy Shell
Adapting logic across different site layouts
๐ Daily Reflections (Day 24 to Day 31)
You can view my documented progress here:
๐ GitHub: Week6 Reflection
๐ GitHub: Week7 Reflection



