A rubyist at the EuroPython conference. Day 2.

The EuroPython 2014 conference took place last week at the BCC Berlin (Alexanderstraße 11). As I attended Brighton Ruby on Monday, you will have to do without a summary of that day. Summarized are my favorite talks.

Tuesday July 22

For Hynek Schlawack’s (@hynek) The Sorry State of SSL talk I should refer to his blog post on the topic. If SSL (or TLS) interests you, this is a go-to resource.

Web Scraping in Python 101

Muhammad Yasoob Ullah Khalid (@yasoobkhalid) is (amongst other things) the creator of freepythontips, which is cool. And he’s very much into web scraping, a method to extract data from a website that does not have an API, or a helpful tool when we want to extract a LOT of data (like product info or job postings) – which we can not do through an API due to rate limiting.

Yasoob focused on the parsing bit of the scraping workflow and libraries like BeautifulSoup, lxml, re(gex) and Scrapy (which is actually a full blown framework).

Where BeautifulSoup has a “beautiful API”, can handle broken markup and is written purely in Python, it is rather slow.

tree = BeautifulSoup(html_doc)
tree.title

lxml is fast, yet it is not purely Python (due to binding with C libraries) but works with all Python versions.

tree = lxml.html.fromstring(html_doc)
title = tree.xpath('/title/text()')

re, as the regex lib, is purely Python, part of a standard library and amazingly fast.

title = re.findall('(.*?)', html_doc)

Scrapy is fast, asynchronous, a full blown broad scale web scraper, that’s fully tested and purely Python. It’s very suitable if you want to scrape millions of web pages every day. BUT it only supports up until Python 2.7. The workflow in Scrapy is the following: you define a scraper, define the items you are going to extract, the items pipeline and then run the scraper (or: spider). A project can have multiple spiders. A spider is a class written by the user to scrape data from a website. Writing a spider is as easy as creating a subclass (scrapy.Spider), defining your start_urls list and then defining the parse method in your spider.

Marconi – OpenStack Queuing and Notification Service

Marconi is a multi-tenant cloud queuing system written in Python as part of the OpenStack project. Yeela Kaplan works at Red Hat and made considerable contributions to Marconi. Development on Marconi started January 2013, led by Rackspace and Red Hat. Marconi was incubated in OpenStack during the Icehouse release and is thus production ready.

Marconi is an alternative message broker in case your current one is not secure enough. OpenStack was missing a queuing service, and Notification as a Service (publishing messages to other services). Marconi is a lightweight messaging API, Yeela evangelizes, not a replacement for existing technologies. Nor is it a task manager or queue provisioning service. Marconi aims to work on top of those technologies / to unify existing technologies. An OpenStack alternative to Amazon’s SQS and SNS, you can use Marconi to deploy SQS, for Horizon notifications or guest agents intercommunication.

Its composable architecture is definitely a selling point. Plus: Marconi is Open Source, has an unified API, is FIFO guaranteed, provided with storage pools, easy to scale and targets OpenStack services. tl;dr it fits in your stack. Queue flavors, full Redis support, AMQP support and live-migrations are on the roadmap.

How to make a full fledged REST API with Django OAuth Toolkit

Federico Frenguelli (@synasius) marvels over how easy it used to be to support one tool, one single project, deployed once. Yet the times they are a-changin’. Frontend applications have multiple devices to support, which makes for a lot of projects (backend, web, Andriod, iOS, desktop, …). And then 3rd party services want to connect with your application.

Using Django, the Django REST Framework and the Django OAuth Toolkit, Federico Frenguelli talked us through serializing and the benefits of DRF.

Talking us through the OAuth2 Authorization Framework, Frederico hooked up Songify to his time-tracker example app. Where the resource owner is the user, resource server is the Timetracking API, Authorization Server is again the Timetracking API and the client is the Songify App. The Authorization code flow is where the clients registers with the authorization server. The authorization server provides client id and client secret. The clients directs the resource owner to an authorization server via its user-agent. The authorization server authenticates the resource owner and obtains authorization. The authorization server directs the resource owner back to the client with the authorization code. The client exchange the authorization code for a token. The token is used by the client to authenticate requests.

Out of breath? So was I. Thankfully, all the code is up on GitHub.

PyPy status talk

“A lot has happened in the last two years, since the last EuroPython PyPy talk.” Romain Guillebert (@rguillebert) is a PyPy contributor since 2011. PyPy is a fast, compliant alternative implementation of the Python language (2.7.6 and 3.2.5), build on top of the Rpython toolchain. The past two years PyPy moved from version 2.0 to 2.3 May 2014, adding JSON, and a C API for embedding (amongst a lot of other stuff).

Romain shares how the maintainers managed to do a lot of stuff without a lot of money – although they are currently putting more calls for donations out there. Plus, PyPy offers commercial support, which comes at a prize.

Lots of CFFI modules around lxml, pygame_cffi and psycopg2cffi were added and yet PyPy is 6.5x faster than Rpython. PyPy offers support for ARM since 2.1. Numphy support is (still) in progress as is scipy support. py3k support is stable since py3k 3.2. STM (Software Transactional Memory) support, solving the GIL problem, is on the roadmap

Want to catch up the developers for a chat? They are ‘basically always’ on IRC: #pypy@freenode.net

Leave a Reply

Your email address will not be published. Required fields are marked *

*