Page 1 of 2

Proliphix Web Scraper Help

PostPosted: Sun Sep 20, 2009 4:29 pm
by HFTobeason
I have two Proliphix thermostats that are in a remote location. Both report to the Proliphix Remote Management web site. I am trying to figure out how to scrape the data off that web site into Indigo variables. The issue I'm faced with is figuring out how to log in to the Proliphix site. The page, "https://access.proliphix.com/Login.php" requires a username/password response. I cannot figure out how to get curl to work in this case. Any help/ideas much appreciated. Thank you.

Re: Proliphix Web Scraper Help

PostPosted: Sun Sep 20, 2009 5:56 pm
by jay (support)
HFTobeason wrote:
I have two Proliphix thermostats that are in a remote location. Both report to the Proliphix Remote Management web site. I am trying to figure out how to scrape the data off that web site into Indigo variables. The issue I'm faced with is figuring out how to log in to the Proliphix site. The page, "https://access.proliphix.com/Login.php" requires a username/password response. I cannot figure out how to get curl to work in this case. Any help/ideas much appreciated. Thank you.


I'm not positive this will work, but in many apps that take a url with authentication you use the URL in this format:

https://username:password@hostname.com/whatever

PostPosted: Sun Sep 20, 2009 9:48 pm
by berkinet
Check the curl man page.

PostPosted: Mon Sep 21, 2009 8:26 am
by HFTobeason
Thank you both for your replies. I have tried the form
Code: Select all
curl -k -u UserName:PassWord https://access.proliphix.com/Login.php
with no success - curl returns the HTML of the login page, and does not seem to submit the UserName:PassWord credentials to get to the next page. From the reading I've done, it seems as if I may have multiple layers of problems here - even if I figure out how to get past the login page, the Proliphix site may require cookies, and the page I ultimately want to scrape is presented in a frame. Any further guidance most appreciated.

PostPosted: Mon Sep 21, 2009 11:03 am
by berkinet
HFTobeason wrote:
...From the reading I've done, it seems as if I may have multiple layers of problems here - even if I figure out how to get past the login page, the Proliphix site may require cookies, and the page I ultimately want to scrape is presented in a frame. Any further guidance most appreciated.

Sorry, I didn't realize the application was that complex. I believe curl can do what you want, but you probably need someone who has a lot more experience with curl than you are likely to find here (on the Indigo forums). I suggest you try the curl mailing lists. Note there are also commercial providers of curl support. They are listed under support on the curl web site.

Please do share your your results.

PostPosted: Mon Sep 21, 2009 11:49 am
by seanadams
This guy has a program for polling and graphing the data from a pholiphix: http://www.anders.com/projects/thermostat-graph/

I just had a brief look at it - he appears to be using a URL which polls the values by OID (in a direct way - not screen-scraping).

You should be able to modify his script, strip out all the graphing stuff, and just have it dump the values you're interested in to STDOUT. Or maybe use to help you figure out how to craft the right query with curl.

PostPosted: Mon Sep 21, 2009 11:59 am
by seanadams
check this out - includes curl examples: http://rtc.rubyforge.org/svn/trunk/docs ... _R1_11.pdf

PostPosted: Mon Sep 21, 2009 12:01 pm
by berkinet
seanadams wrote:
This guy has a program for polling and graphing the data from a pholiphix: http://www.anders.com/projects/thermostat-graph/

I just had a brief look at it - he appears to be using a URL which polls the values by OID (in a direct way - not screen-scraping)...

I think HFTobeason's problem is a little different. He is not on the same network as his thermostats and needs to get the data via a web reflector (Proliphix Remote Management web site), thus the need for curl to log into the reflector web site first.

If he could punch a hole in his firewall then he could probably read the data directly, as you suggest. But, I assume he has discarded that idea for one reason or another.

PostPosted: Mon Sep 21, 2009 12:08 pm
by seanadams
berkinet wrote:
I think HFTobeason's problem is a little different. He is not on the same network as his thermostats and needs to get the data via a web reflector (Proliphix Remote Management web site), thus the need for curl to log into the reflector web site first.


Ah, ok. Well in that case it IS possible to handle cookie-based authentication using curl, with two successive commands. http://www.google.com/search?q=cookie+login+curl

Why not use the API

PostPosted: Wed Nov 04, 2009 2:33 pm
by lombrano
Why not getting values directly from the thermostat ? I know they've got a web interface and an http API which you can direcltly call. The API is not public, but you can ask the support to send them to you. Ther're free for personal use.
Antoio

PostPosted: Thu Nov 05, 2009 3:38 pm
by sboutros
HFTobeason, did you ever find a way to get your data?

Sam

PostPosted: Thu Nov 05, 2009 8:30 pm
by HFTobeason
No, I've totally failed to figure this out.

For the record, I can't get directly to the thermostats because they're behind a modem/router which DOES NOT have a static IP, and there is no computer at that location to run a DynDNS-type client, nor does the modem/router have built-in DDNS capability.

I'm at a loss.

PostPosted: Thu Nov 05, 2009 9:39 pm
by berkinet
Don't give up hope just yet... You might try taking a look at iMacros.
The iOpus web site wrote:
iMacros lets you record and replay repetitious work. iMacros programmatically interacts with any and all websites. It fills out forms and automates the download and upload of text, images, files and web pages. You can import or export data to/from using CSV & XML files, databases or any other source to and from web applications.

They claim to have a Mac version in beta (scroll to the bottom of the page). They have a free Firefox Add-In. I gave it a quick try and it did learn to log in to the Proliphix remote access web site. I didn't try, but I'd guess you could create a macro to login, select your thermostat and save the contents of the web page (or better, the source) to parse later.

If you used the Firefox Add-In you would probably need to figure out how to call the macro from an AppleScript - it could then continue to parse the saved HTML data after the macro finished.

PostPosted: Thu Nov 05, 2009 10:00 pm
by berkinet
BTW. If you don't know already, there is a Yahoo Group for Proliphix owners.

Since I also have an interest in this issue, I posted your query there. I'll let you know if there is any response.

PostPosted: Thu Nov 05, 2009 10:46 pm
by matt (support)
I use the iMacros Firefox plug-in. It works pretty well. Every month I have it download various PDF statements from different Web sites. It won't be an issue with the Proliphix, but the only problem I have is Website pages changing and breaking my scripts (no fault of iMacros).