#duraspace IRC Log


IRC Log for 2012-03-21

Timestamps are in GMT/BST.

[1:25] * bradmc (~bradmc@ has joined #duraspace
[11:18] * bradmc (~bradmc@ has joined #duraspace
[11:33] * bradmc (~bradmc@ Quit (Ping timeout: 260 seconds)
[11:57] * mhwood (mwood@mhw.ulib.iupui.edu) has joined #duraspace
[14:47] * barmintor (~ba2213@dyn-butler-158-112.dyn.columbia.edu) has joined #duraspace
[19:11] * mdiggory (~mdiggory@rrcs-74-87-47-114.west.biz.rr.com) has joined #duraspace
[19:34] * ryscher (98033b9f@gateway/web/freenode/ip. has joined #duraspace
[19:37] * kompewter (~kompewter@sul272sandbox.lib.ohio-state.edu) has joined #duraspace
[19:42] * KevinVdV (~KevinVdV@d54C154B1.access.telenet.be) has joined #duraspace
[19:53] * hpottinger (~hpottinge@mu-162198.dhcp.missouri.edu) has joined #duraspace
[20:00] * richardrodgers (~richardro@ has joined #duraspace
[20:01] * aschweer (~schweer@schweer.its.waikato.ac.nz) has joined #duraspace
[20:02] <PeterDietz> Hi All, Tim is out sick today, so I'll jump in to cover the meeting today.
[20:03] <PeterDietz> Feel free to toss in topics. I've got some things I'd like to cover anyways
[20:03] <PeterDietz> I'm semi-tempted to say we should have the meeting on one of the voice/video systems, but IRC is fine today.
[20:04] <richardrodgers> which voice-video systems?
[20:04] <PeterDietz> Google+ or Skype
[20:04] <richardrodgers> ok
[20:04] <PeterDietz> if there's interest, we could
[20:06] <mdiggory> I'm not in a position to go audio today, but int he future I would welcome it
[20:06] <richardrodgers> fine with me - but we may want to announce in advance so folks can line up a good audio environment...
[20:06] <PeterDietz> ok, so GitHub migration. We've been digging through the svn authors, to make for a clean transfer to github
[20:06] <KevinVdV> Ditto to mdiggory on the audio part can't do it today I'm afraid
[20:06] <aschweer> same here about the audio
[20:06] <hpottinger> I do have my headset all ready, as per Graham's suggestion that it's "only as disruptive as a phone call to coworkers"
[20:07] <PeterDietz> (I've officially switched my feelings on what to do with the github/dspace/dspace repository)
[20:07] <PeterDietz> I think its in our best interest to make a proper author mapping
[20:07] <PeterDietz> So, Mark and Tim have been trying to find any emeritus developers to see if they'd like to make github accounts
[20:08] <PeterDietz> ..and I found out from GitHub, that even if you don't have a github user account, we'll enter an email address for your svn account, and if/when you make a github account, it will link posts to you
[20:09] <PeterDietz> just got an error message from a practice transfer that we didn't map azeckoski
[20:10] <mdiggory> hes probibly in github
[20:10] <mdiggory> yep
[20:10] <mdiggory> https://github.com/azeckoski
[20:10] <kompewter> [ azeckoski (Aaron Zeckoski) · GitHub ] - https://github.com/azeckoski
[20:10] <PeterDietz> Any questions as to why its probably best to redo the dspace/dspace github repo? I'm aware it breaks compatibility with anyone who's already forked the "unofficial" one I put up there
[20:10] <mdiggory> azeckoski@vt.edu
[20:11] <mdiggory> does moving the existing DSpace/DSpace repo to a new name disrupt existing forks of that repo?
[20:11] <PeterDietz> Early adopters hopefully haven't invested too much into the dspace/dspace github repo, and could be able to scoop their commits over to the future.. i.e. cherry-pick your important commits
[20:11] <PeterDietz> We could re-name dspace/dspace to dspace/deprecated-dspace-names-are-not-matched
[20:12] <PeterDietz> you enter the "Danger Zone".
[20:12] <mhwood> README: These are not the commits you're looking for.
[20:13] <PeterDietz> yep. Think sci-fi, disruption to the space-time continuum
[20:14] <mhwood> Continuing development is over at [link]. So long, and thanks for all the fish!
[20:14] <PeterDietz> yep.. I'll probably completely redo my DSpace-OSUKB repo, clean it up. So that it will be a pretty position to potentially give pull-requests etc to future dspace/dspace
[20:15] <aschweer> out of the 20 forks of DSpace/DSpace, it looks like most people know what's going on. Like PeterDietz, I'm planning to re-do my fork and the other lconz-irr ones. So that's 4 out of the 20
[20:16] <PeterDietz> Claudia asked today if she can begin making changes to dspace-api-lang, and dspace-xmlui-lang that over on github. I told her we are still in the un-official no-man's-land, but that those two are fully migrated, so my personal feeling is that she can begin making commits directly to those
[20:17] <mdiggory> We really need to determine our strategy for lang still.
[20:18] <mdiggory> a.) intesgrate back into dspace-api and dspace-xmlui
[20:18] <mdiggory> b.) make separate repo
[20:19] <mhwood> Does that really need to hold up "going live"?
[20:19] <PeterDietz> So, we're planning to "go live" on Friday, just giving the emeritus a chance to opt-in I guess
[20:19] <mhwood> What goes live on Friday? Everything?
[20:20] <PeterDietz> but.. the lang things, they are currently exactly as they were, but migrated to github. If Mark / others figure out a proper strategy for folding that into a better place, then that becomes post-github-go-live tasks
[20:20] <richardrodgers> PeterDietz: could those authors be relinked post go-live?
[20:21] <PeterDietz> go-live, is when we'll git svn clone scm.dspace.org/.../dspace --authors authors.txt
[20:21] <PeterDietz> From: Tekkub (GitHub Staff)
[20:21] <PeterDietz> Subject: Importing Project from Subversion, author mapping
[20:21] <PeterDietz> The commits will just show the name used on them if they can't find a matching account. If the user creates an account using that email in the future the commits will link up, once they've fallen out of the server's cache of course.
[20:22] <PeterDietz> so, we're mapping legacy authors (lcs = Larry Stone <lcs@users.sourceforge.net>)
[20:22] <PeterDietz> If he wants to go-github, then he'll have to add an email alias, then the commits will magically link to him
[20:23] <PeterDietz> otherwise, they'll be unlinked but say "Larry Stone"
[20:23] <richardrodgers> seems pretty good
[20:25] <PeterDietz> So.. we have the dspace-*-lang repo's up. By friday we should have dspace/dspace up. Any other /module type things, people should either migrate, or put a tombstone into them
[20:25] <PeterDietz> see: https://wiki.duraspace.org/display/DSPACE/Migration+to+GitHub
[20:25] <kompewter> [ Migration to GitHub - DSpace - DuraSpace Wiki ] - https://wiki.duraspace.org/display/DSPACE/Migration+to+GitHub
[20:25] <mhwood> Sounds good. I think it's worth a bit of effort to link as many as possible, but my strongest inclination is to get this behind us before much longer.
[20:26] <PeterDietz> yeah.. its kinda sucky to have an overwhelming positive dspace-developer response, and for us to all say git/github is wicked-sweet-awesome... and then insert a solid month of commit-freeze
[20:26] <PeterDietz> So, I'm pretty optimistic about how things will turn out.
[20:28] <PeterDietz> So.. other things I'm thinking about: gsoc-2012 ---- elastic search
[20:28] <PeterDietz> I think Tim already mentioned this.. But we didn't get accepted into GSoC 2012
[20:29] * hpottinger grumbles
[20:29] <scottatm> wow... that is alot of projects I heard have not been accepted to GSoC this year.
[20:29] <richardrodgers> I presume thats final?
[20:29] <scottatm> WordPress was rejected as well.
[20:30] <PeterDietz> I imagine we could beg/plead/appeal, but starvation could lead to some innovations.. i.e. I was thinking that at OSU, I could figure out how to get some students in the Computer Science department to do some contract work
[20:30] <PeterDietz> Students are pretty eager to get something real-world-ish to work on in their spare time
[20:31] <hpottinger> we have CS students in the library labs asking for project work
[20:31] <mdiggory> there is a meeting on friday where rejected orgs can ask questions.
[20:31] <mdiggory> in IRC
[20:32] <richardrodgers> maybe we could learn something
[20:32] <PeterDietz> So, to figure out smaller-than-GSoC projects, tasks really, get some specs written, and then farm it out
[20:33] <PeterDietz> There is a significant line-change in the orgs accepted, so I'm sure it wasn't anything personal, just spreading the love
[20:33] <PeterDietz> hi kshepherd
[20:33] <mhwood> There was IRC talk the other day about Duraspace doing its own SoC sort of thing, in exchange for formal recognition or T-shirts or something.
[20:33] <kshepherd> morning all, sorry for lateness
[20:35] <PeterDietz> mdiggory: Are you going to monitor the GSoC rejectee's meeting?
[20:35] <ryscher> I still have a DSpace-related project available through my org -- see http://bit.ly/GED8ZX -- I would love to get some other mentors signed on.
[20:36] <kompewter> [ Phyloinformatics Summer of Code 2012 - NESCent Informatics Wiki ] - http://bit.ly/GED8ZX
[20:37] <PeterDietz> hmm, thanks for the info ryscher.. I'm sure some of us will be available to assist with DSpace related parts
[20:38] <kshepherd> duraspace didn't get into gsoc this year?
[20:39] <PeterDietz> correct kshepherd
[20:39] <mdiggory> I sent it to Tim, and was hoping he might sit in.
[20:39] <richardrodgers> ryscher: if its of any help, there is code to serialize/deserialize DSpace objects to Bagit in the Replication Task Suite
[20:39] <kshepherd> hm
[20:40] <PeterDietz> "Unfortunately, we were unable to accept your organization's application at this time. We received many more applications for the program than we are able to accommodate, and we would encourage you to reapply for future instances of the program."
[20:40] <PeterDietz> I'm thinking its because I added twitter and facebook feeds, but not Google Plus share buttons
[20:41] <hpottinger> ryscher: adding a Baggit Package SWORD ingest (based on Replication Task Suite) is on my to do list for this year
[20:43] <ryscher> Thanks for the info, hpottinger/richardrodgers. If we find a student, I'll definitely point them towards the existing code. We'll try to keep it as general as possible.
[20:44] <mhwood> Actually it's because we picked GitHub over Google Code.
[20:45] <PeterDietz> In other news, I've been working on integrating Elastic Search into statistics
[20:46] <PeterDietz> Its listening to DSpace's UsageEvents, firing off posts.. And I've got a decent statistics view portal from this information.
[20:47] <PeterDietz> My only downfall is that I haven't yet built a tool to convert old dspace.log files, or solr data into elastic-search data.. So my reports look funny with 27 hits, as opposed to 14M
[20:48] <richardrodgers> So no scalability measures yet
[20:48] <KevinVdV> PeterDietz is your code for this public ? Since I am interested as I am building an elastic search plugin for discovery.
[20:48] <PeterDietz> nope, and I imagine that proper a A/B test between solr and elastic search will be valuable to those making decisions
[20:50] <PeterDietz> KevinVdV: yep, https://github.com/osulibraries/dspace-stats-elasticsearch thats the module..
[20:50] <kompewter> [ osulibraries/dspace-stats-elasticsearch · GitHub ] - https://github.com/osulibraries/dspace-stats-elasticsearch
[20:50] <PeterDietz> and, the viewer = https://github.com/osulibraries/DSpaceOSUKB/blob/rebase-implement-elasticsearch/dspace-xmlui/dspace-xmlui-api/src/main/java/org/dspace/app/xmlui/aspect/dashboard/ElasticSearchStatsViewer.java
[20:50] <kompewter> [ DSpaceOSUKB/dspace-xmlui/dspace-xmlui-api/src/main/java/org/dspace/app/xmlui/aspect/dashboard/ElasticSearchStatsViewer.java at rebase-implement-elasticsearch · osulibraries/DSpaceOSUKB · GitHub ] - https://github.com/osulibraries/DSpaceOSUKB/blob/rebase-implement-elasticsearch/dspace-xmlui/dspace-xmlui-api/src/main/java/org/dspace/app/xmlui/aspect/dashboard/ElasticSearchStatsViewer.java
[20:51] <KevinVdV> Thanks PeterDietz I'll be sure to steal some of your code for my own project ;-) (located here https://github.com/KevinVdV/Discovery-Elastic-Search-Plugin)
[20:52] <PeterDietz> one problem / question I have.. is how can I generalize statistics, so that the data returned by ElasticSearch, or the data returned by SOLR can be used without having to do lots of work to get there
[20:53] <PeterDietz> i.e. make an interface for UsageStatistics, implement some methods, etc...
[20:54] <PeterDietz> But.. they work slightly differently.. In solr, I think you make a query for each facet that you want. Where as in elastic search, you can batch all your facets into one big super-query, which has a single cost of client-connection- http transfer, etc
[20:55] <mhwood> Stay close to the data? Statistics is basically 1000 different ways of looking at the same big blob of raw data. You can't cover them all; the stat. module would be bigger than the thing it plugs into.
[20:56] <PeterDietz> So, I've given up (for now) one how to do "the Right Thing", and instead am focusing on http://en.wikipedia.org/wiki/Worse_is_better
[20:56] <kompewter> [ Worse is better - Wikipedia, the free encyclopedia ] - http://en.wikipedia.org/wiki/Worse_is_better
[20:56] <hpottinger> I'm googling around a bit, it looks like there is an ES plugin that gives ES a Solr-like interface, which might help with data migration?
[20:57] <hpottinger> https://github.com/mattweber/elasticsearch-mocksolrplugin
[20:57] <kompewter> [ mattweber/elasticsearch-mocksolrplugin · GitHub ] - https://github.com/mattweber/elasticsearch-mocksolrplugin
[20:57] <PeterDietz> My student has been building a tool in Clojure to connect to both solr, and elastic-search, and pull the data out of one, and into the other. But OSU's on spring break now, so no progress on that front for a week or two
[20:59] <PeterDietz> So... recap: We're destroying/renaming Github/DSpace/DSpace.. Reimporting from SVN with proper author names.. DuraSpace didn't get accepted to this summer's GSoC.. And Peter and Kevin are working on things in Elastic Search
[21:00] <PeterDietz> and then in #dspace, scottatm has been discussing image thumbnails from jpeg2000 etc / Java JAI
[21:00] <scottatm> (and on the mailing list)
[21:01] <PeterDietz> ..and ryscher might have a GSoC student to transfer data between repositories (SWORD, BagIT, OAI-ORE)
[21:05] <PeterDietz> So, meeting is concluded, thank you all. Perhaps we can plan to have a future #dev meeting on skype, or have a G+ Hangout..
[21:05] <richardrodgers> good summary. Thanks PeterDietz for leading today. Gotta run
[21:05] <mhwood> Yes, thanks, and I need to go too.
[21:05] <hpottinger> would be OK by me, and yes, thanks, PeterDeitz
[21:07] <hpottinger> I'm mainly interested in BagIt because our archivist is. He wants to ensure there are good tools to let him gather up the bits he wants to store in a repository, and then ensure that the bits that are in the repository are actually the same as the bits he has gathered
[21:08] <hpottinger> The BagIt toolset provides all of that, just DSpace SWORD doesn't speak BagIt (yet)
[21:10] <ryscher> hpottinger: but BagIt doesn't represent relationships between files, hence the need for ORE
[21:14] * hpottinger makes a note to bug his archivist about this...
[21:45] * bradmc (~bradmc@va-65-40-217-246.sta.embarqhsd.net) has joined #duraspace
These logs were automatically created by DuraLogBot on irc.freenode.net using the Java IRC LogBot.